Developing a research corpus in China: from a frequency dictionary to a linguistic corpus
Abstract
Frequency dictionaries are published in China to popularize “civic education.” At the end of the 20th century, computers reduced the time needed to collect and organize various types of language material. At the same time, a machine-readable corpus of Chinese came into being. Today, there are three large corpora in China, each containing more than 100 million real materials. The ability to use the corpus is considered one of the most important skills to research the language.
References
Golovin B.N. Yazyk i statistika [Language and statistics]. Moscow: Prosveshchenie, 1971. 189 p. (In Russian).
Nelyubin L.L. Tolkovyj perevodcheskij slovar’ [Explanatory dictionary of translation]. 3-e izd., pererab. Moscow: Flinta: Nauka, 2003. 320 p. (In Russian).
Case BCC. URL: mode of access: http://bcc.blcu.edu.cn/ (In Russian).
Case CCL. URL: mode of access: http://ccl.pku.edu.cn:8080/ccl_corpus/ (In Russian).
Case online. URL: mode of access: http://www.cncorpus.org/ (In Russian). 冯志伟,胡凤国 《数理语言学》,商务印书馆,2012, 491 页.
冯志伟 《中国语料库研究与现状》,语言文字应用,2002, 43–62 页
Received: 04/01/2019
Accepted: 05/01/2019
Accepted date: 30.06.2019
Keywords: corpus, frenquecy dictionary, computatuional linguistics
Available in the on-line version with: 30.03.2019
-
To cite this article: