Developing a research corpus in China: from a frequency dictionary to a linguistic corpus

Cai W. (Lomonosov Moscow State University)

Abstract

Frequency dictionaries are published in China to popularize “civic education.” At the end of the 20th century, computers reduced the time needed to collect and organize various types of language material. At the same time, a machine-readable corpus of Chinese came into being. Today, there are three large corpora in China, each containing more than 100 million real materials. The ability to use the corpus is considered one of the most important skills to research the language.

References

Golovin B.N. Yazyk i statistika [Language and statistics]. Moscow: Prosveshchenie, 1971. 189 p. (In Russian).

Nelyubin L.L. Tolkovyj perevodcheskij slovar’ [Explanatory dictionary of translation]. 3-e izd., pererab. Moscow: Flinta: Nauka, 2003. 320 p. (In Russian).

Case BCC. URL: mode of access: http://bcc.blcu.edu.cn/ (In Russian).

Case CCL. URL: mode of access: http://ccl.pku.edu.cn:8080/ccl_corpus/ (In Russian).

Case online. URL: mode of access: http://www.cncorpus.org/ (In Russian). 冯志伟，胡凤国《数理语言学》，商务印书馆，2012, 491 页.

冯志伟《中国语料库研究与现状》，语言文字应用，2002, 43–62 页

Received: 04/01/2019

Accepted: 05/01/2019

Accepted date: 30.06.2019

Keywords: corpus, frenquecy dictionary, computatuional linguistics

Available in the on-line version with: 30.03.2019

To cite this article:
- Cai W. Developing a research corpus in China: from a frequency dictionary to a linguistic corpus. Moscow University Translation Studies Bulletin. 2019. N 2. p.130-135