Leipzig Corpora Collection

Search in 1018 Corpus-Based Monolingual Dictionaries for 290 Languages.

Selected language: Tamil Web 2019 (India)

Search suggestions: தேனி · சுய · சுய · ஆரோக்கியமான · சொன்னால்

More information about: Tamil Web 2019 (India) Change corpus

The corpus tam-in_web_2019 is a Tamil Web text corpus (India) based on material from 2019. It contains 2,613,049 sentences and 26,209,008 tokens. Details

DOWNLOADS

Download parts of this corpus.

STATISTICS

More details about this corpus on our corpus and language statistics page.

Description

Tamil Web text corpus (India) based on material from 2019

Details

Name	tam-in_web_2019	Sentences	2,613,049
Language	Tamil ()	Types	1,783,041
Genre	Web	Tokens	26,209,008
Year	2019
Location	India

Link to the corpus

https://corpora.uni-leipzig.de?corpusId=tam-in_web_2019

Annotations

coocSim
GDEX

Cite this corpus

Leipzig Corpora Collection: Tamil Web text corpus (India) based on material from 2019. Leipzig Corpora Collection. Dataset. https://corpora.uni-leipzig.de?corpusId=tam-in_web_2019. BibTeX

@misc{tam-in_web_2019,
    author = {Leipzig Corpora Collection},
    title = {Tamil Web text corpus (India) based on material from 2019},
    howpublished = {https://corpora.uni-leipzig.de?corpusId=tam-in_web_2019},
    note = {Accessed: 2024-07-27}
}