PROBLEMS OF FORMING A LINGUISTIC BASE WHEN CREATING A CORPUS
Keywords:
Linguistic databases, corpus linguistics, language data, text corpora, general corpora, specialized corpora, annotated corpora, data collection, metadata, linguistic research, language variationAbstract
The development of a linguistic foundation for corpus construction poses a number of important difficulties that may have an impact on the final dataset's quality and usability. The ambiguity in defining the corpus's scope and purpose is one of the main problems, which might cause the texts chosen to be out of alignment. This could lead to a corpus that is not representative enough to capture the variety of language use across various groups and circumstances.
Another difficulty is gathering data, especially when it comes to accessibility and copyright limitations that restrict the variety of texts that can be included. Additionally, if the corpus is unduly concentrated on particular genres or linguistic variants while ignoring others, sampling bias may result.
References
Louw, B. (1997). The role of corpora in critical literary appreciation. In A. Wichman, S. Fligelstone, T. McEnery & G. Knowles (Eds.), Teaching and Language Corpora, (pp. 240-251). Harlow: Longman.
D. Z. Olimova, & M. D. Mahmudova. (2022). POLITICAL DISCOURSE AND TRANSLATION. RESEARCH AND EDUCATION, 1(3), 176–179. 2022 [3] Saidov Akmal Azimovich, Mahmudova Dildora Murodilloyevna. Anticipation strategy in simultaneous interpretation of political discourse. Spanish journal of Innovation and integrity Volume:12,110-116. November-2022.
Mahlberg, M. (2007). A corpus stylistic perspective on Dickens’ Great Expectations. In M. Lambrou and P. Stockwell (Eds.), Contemporary Stylistics, (pp. 19-31). London: Continuum.
O’Halloran, K. A. (2007). The subconscious in James Joyce’s ‘Eveline’: a corpus stylistic analysis which chews on the ‘Fish hook’. Language and Literature, 16(3), 227-244.
Mahmudova, D. (2023). CORPUS LINGUISTICS. В ACADEMIC RESEARCH IN MODERN SCIENCE (Т. 2, Выпуск 23, сс. 104–106). Zenodo. https://doi.org/10.5281/zenodo.10025001
Mahmudova, D. (2023). CORPORA AND LITERATURE. Current approaches and new research in modern sciences, 2(10), 63-64. Zenodo. https://doi.org/10.5281/zenodo.10013233
Karimov Rustam Abdurasulovich, Mengliev Bakhtiyor Rajabovich (2019). The Role of the Parallel Corpus in Linguistics, the Importance and the Possibilities of Interpretation. International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-8, Issue-5S3 July 2019.
Mahmudova , D. . (2025). QIYOSLANUVCHI IKKI TILLI SINXRON KORPUSDA O’ZBEKCHA VA INGLIZCHA LINGVOMADANIY BIRLIKLAR VOQELANISHI TADBIQI. Инновационные исследования в современном мире: теория и практика, 4(21), 133–135. извлечено от https://inlibrary.uz/index.php/zdit/article/view/108892
Mahmudova Dildora Murodilloyevna. (2025). The Current State Of World Corpus Linguistics: National Corpora. American Journal of Philological Sciences, 5(03), 70–72. https://doi.org/10.37547/ajps/Volume05Issue03-18
Dildora Murodilloyevna Mahmudova (2025). Sinxron va diaxron korpuslarning farqli xususiyatlari. Science and Education, 6 (6), 887-891.






Azerbaijan
Türkiye
Uzbekistan
Kazakhstan
Turkmenistan
Kyrgyzstan
Republic of Korea
Japan
India
United States of America
Kosovo