Opera del Vocabolario Italiano

Istituto del Consiglio Nazionale delle Ricerche

The Textual Corpus

The TLIO textual corpus is the largest database available today of the Italian language prior to 1375. Divided into two databases, at September 18, 2023 it contains 30,245,108 words (occurrences) from ​​3,477 texts in the OVI Corpus of early Italian; ​​​23,814,549 occurrences from ​3,210 texts in the TLIO Corpus, which is the lemmatized database used to compile the dictionary.
The corpus is implemented in GATTO (Gestione degli Archivi Testuali del Tesoro delle Origini - Management of the TLIO Textual Archives) in a local network version to undertake lemmatization; the lemmatized corpus can then be accessed online thanks to the GattoWeb software. Since 1998, a version of the lemmatized corpus has been available online in the ItalNet version. This version was well-known to scholars up until October 2005, when the GattoWeb software was first published.

To report any errors or simply make recommendations, please send an email message to the responsibles for lemmatization, Elena Artale e Diego Dotto.

The ItalNet version of the OVI Database of early Italian has no longer been updated since 2005, and therefore it is no longer accurate or reliable.

Valentina Pollidori created the database of early Italian underpinning TLIO on the basis of materials previously prepared by the Accademia della Crusca. She managed it in collaboration with Franca Bertini until 2004. Since her untimely and tragic death, she has been replaced by Pär Larson as database manager.
The database has been implemented in GATTO by Andrea Boccellari, since the retirement in 2014 of Domenico Iorio-Fili, the author of the software.
The searchable ItalNet version, based on texts prepared by the OVI, is a product of the Italnet consortium (with Theodore J. Cachey Jr. as executive director; Mark Olsen as chief database programmer; and Christian Dupont as assistant programmer and web programming designer).
Until 2006, Roberta Cella was in charge of the lemmatization (with the collaboration of Patricia Frosini until 2002). Following Cella’s appointment to an academic postion, Elena Artale and Diego Dotto took over. For methods of lemmatization see: Piero Esperti, “Grammatichetta della lingua italiana ad uso del calcolatore”, in Al servizio del vocabolario della lingua italiana, Firenze, Accademia della Crusca, ed. by d’Arco Silvio Avalle (Firenze: Accademia della Crusca,1979), pp. 123-87. A first version of the database was implemented in DBT by Eugenio Picchi in collaboration with Elisabetta Marinai and made searchable online with a client-server system created by Lisa Biagini.
The retrieval of encoded information was performed by Rosalba Cigliana and Valentina Pollidori with the collaboration of Rita Marinelli and, for the IT part, by Joseph Camuglia, Manuela Sassi, Elisabetta Marinai, and Antonio Sapuppo. For the first working phase that led up to the database see, Avalle, Al servizio del vocabolario della lingua, op. cit .; Aldo Duro, “L'impianto del nuovo vocabolario: profilo storico” in La Crusca nella tradizione letteraria e linguistica italiana (Firenze: Accademia della Crusca, 1985), pp. 431-42; and Domenico De Robertis, “L'ufficio filologico dell'Opera del vocabolario, il suo impianto, il suo lavoro” in La Crusca nella tradizione letteraria e linguistica italiana, op. cit., pp. 443-51. We wish to thank all the scholars who contacted OVI to submit published texts or who have provided materials in electronic format, thus facilitating the development of the corpus. The bibliography of the cited literature includes “provisional” or “internal” editions prepared specifically for the corpus, as indicated on the individual bibliographic records.