The corpus zho-simp_news_2010 is a Chinese (simplified script) news corpus based on material from 2010.
It contains 19,421,893 sentences and 517,982,852 tokens.
Details
Leipzig Corpora Collection: Chinese (simplified script) news corpus based on material from 2010. Leipzig Corpora Collection. Dataset. https://corpora.uni-leipzig.de?corpusId=zho-simp_news_2010.
BibTeX
@misc{zho-simp_news_2010,
author = {Leipzig Corpora Collection},
title = {Chinese (simplified script) news corpus based on material from 2010},
howpublished = {https://corpora.uni-leipzig.de?corpusId=zho-simp_news_2010},
note = {Accessed: 2024-10-07}
}