Leipzig Corpora Collection

Search in 1019 Corpus-Based Monolingual Dictionaries for 291 Languages.

Selected language: Arabic Newscrawl 2013

Search suggestions: بهذا · القضية · وعبر · الصحراء · تشير

More information about: Arabic Newscrawl 2013 Change corpus

The corpus ara_newscrawl_2013_1M is a Arabic news subcorpus based on material crawled in 2013 (1,000,000 sentences). It contains 1,000,000 sentences and 20,759,565 tokens. Details

DOWNLOADS

Download parts of this corpus.

STATISTICS

More details about this corpus on our corpus and language statistics page.

Description

Arabic news subcorpus based on material crawled in 2013 (1,000,000 sentences)

Details

Name	ara_newscrawl_2013_1M	Sentences	1,000,000
Language	Arabic ()	Types	871,269
Genre	Newscrawl	Tokens	20,759,565
Year	2013

Link to the corpus

https://corpora.uni-leipzig.de?corpusId=ara_newscrawl_2013_1M

Annotations

wordsLevenshteinSim

Cite this corpus

Leipzig Corpora Collection: Arabic news subcorpus based on material crawled in 2013 (1,000,000 sentences). Leipzig Corpora Collection. Dataset. https://corpora.uni-leipzig.de?corpusId=ara_newscrawl_2013_1M. BibTeX

@misc{ara_newscrawl_2013_1M,
    author = {Leipzig Corpora Collection},
    title = {Arabic news subcorpus based on material crawled in 2013 (1,000,000 sentences)},
    howpublished = {https://corpora.uni-leipzig.de?corpusId=ara_newscrawl_2013_1M},
    note = {Accessed: 2024-04-27}
}