Opera del Vocabolario Italiano

Istituto del Consiglio Nazionale delle Ricerche


Updated versions of OVI Corpora

Updated versions of the OVI Corpus, the TLIO Corpus, and the OVI Corpus of Early Italian have been released: they are aligned with the structure developed in the RENOVO Project (Regenerating the OVI Corpus: Renewal and Optimization of Methods, Contents and Tools  - PRIN 2017), aimed at the philological and textual renewal of the two corpora in continuity with CoVo Project (The corpus of the Italian vocabulary of the origins: philological update and interoperability - PRIN 2015). 

See the criteria for updating.

TLIO Corpus

It is a lemmatized corpus and includes the texts of reference for the TLIO. The editions of 55 texts already present in outdated editions in the TLIO Corpus have been updated (see list here ) ​and 39 new texts hitherto absent have been inserted (see list here ). The new version of the TLIO Corpus that today is published online includes 2.991 texts (with an increase of 43 units compared to the previous version), for a total of 23.496.746 occurrences (with an increment of 61.301 occurrences), 488.227 distinct graphic forms, 124.736 lemmas, and 4.448.764 lemmatized occurrences (with an increment of 48.0476 occurrences).

OVI Corpus

It is a non-lemmatized corpus (but searchable with the “lemmi muti ” of the GATTOWeb function), which includes the TLIO Corpus and extends it to include all the published texts dating before the end of the XIV Century: it is the corpus that aims to allow the interrogation of the entire textual heritage of early Italian.  
The new version of the OVI Corpus that today is published online includes 3,261 texts (with an increase of 43 units compared to the previous version), for a total of 29.987.740 occurrences (with an increment of 61.302 occurrences), and 548.826 distinct graphic forms.