Lemmatisation Environment
Within GATTO you can lemmatise the texts included in a corpus. For the purposes of GATTO, lemmatising means linking a certain lemma with a specific occurrence of a specific word form in a specific text. If the lemma used is not in the lemma index of the corpus, it will automatically be added.
The completion of a lemmatisation has two consequences. Clearly, it creates a connection between the lemma and the occurrence which it has been linked to. However, it also creates a general connection between the lemma and the word form itself.
The first stage in lemmatisation is to search for contexts. This is carried out in exactly the same way as in the Search environment. The search can be carried out by word form, by lemma, by grammatical category or by disambiguator, and can be extended to the whole corpus or limited to one or more corpus subsets. The contexts obtained from this search can then be lemmatised. Each lemmatisation links a lemma with one or more contexts. You can link a lemma with all the contexts obtained from the search using a single command. The lemmatisations carried out can be corrected or removed at a later date.
During lemmatisation, you can link an occurrence to a hyperlemmas as well as to a lemma.
Every time that a lemma or hyperlemma that is not present in the corpus is used in lemmatisation, it will automatically be added. More precisely, the lemma becomes part of the word form lemma list of lemmatised word forms, and, as such, is available to the user for future lemmatisations of that word form, even within different texts of the corpus. The same applies to hyperlemmas.
Lemmatisation takes place using a word form lemma list and a table of homographs which operate at corpus level. These in turn are automatically updated in real time according to the lemmatisations taking
place.
In this environment you can change or remove lemmatisations previously carried out.
This second method of lemmatisation is more specifically aimed at producing extensive or exhaustive lemmatisations of single texts.
The lemmatisation performed can either be standard or sequential. In either case, it will be carried out
on a single text at a time.
Standard lemmatisation involves choosing the word form and, if necessary the specific occurrence that
you wish to lemmatise, and moving on from there to other occurrences of the same word form or occurrences
of those word forms which are alphabetically adjacent to it.
Sequential lemmatisation can be used to display and lemmatise the words in the text in the order in
which they appear within it. This is probably the most useful function available in this environment,
especially if it is used in conjunction with standard lemmatisation. (For more on this, see Lesson 20).
During lemmatisation you can link occurrences to hyperlemmas as well as to lemmas.
All comments made in connection with the Corpus lemmatisation environment are equally valid here,
including the use of the word form lemma list and table of homographs to assist in the process.