Opera del Vocabolario Italiano

Istituto del Consiglio Nazionale delle Ricerche

Lemmatisation Environment

CORPUS LEMMATISATION ENVIRONMENT

Within GATTO you can lemmatise the texts included in a corpus. For the purposes of GATTO, lemmatising means linking a certain lemma with a specific occurrence of a specific word form in a specific text. If the lemma used is not in the lemma index of the corpus, it will automatically be added. 
The completion of a lemmatisation has two consequences. Clearly, it creates a connection between the lemma and the occurrence which it has been linked to. However, it also creates a general connection between the lemma and the word form itself. 
The first stage in lemmatisation is to search for contexts. This is carried out in exactly the same way as in the Search environment. The search can be carried out by word form, by lemma, by grammatical category or by disambiguator, and can be extended to the whole corpus or limited to one or more corpus subsets. The contexts obtained from this search can then be lemmatised. Each lemmatisation links a lemma with one or more contexts. You can link a lemma with all the contexts obtained from the search using a single command. The lemmatisations carried out can be corrected or removed at a later date. 
During lemmatisation, you can link an occurrence to a hyperlemmas as well as to a lemma. 
Every time that a lemma or hyperlemma that is not present in the corpus is used in lemmatisation, it will automatically be added. More precisely, the lemma becomes part of the word form lemma list of lemmatised word forms, and, as such, is available to the user for future lemmatisations of that word form, even within different texts of the corpus. The same applies to hyperlemmas. 
Lemmatisation takes place using a word form lemma list and a table of homographs which operate at corpus level. These in turn are automatically updated in real time according to the lemmatisations taking 
place. 
In this environment you can change or remove lemmatisations previously carried out.