The corpus oci_wikipedia_2007 is a Occitan (post 1500) Wikipedia corpus based on material from 2007. It contains 16,985 sentences and 301,449 tokens. Details

