Lemmatization of compound tenses in English
Abstract:We generalize the process of lemmatization of verbs to their compound tenses. Usually, lemmatization is limited on verbs conjugated by means of suffixes; tense auxiliaries and modal verbs (e.g. I have left, I am leaving, I could leave) are ignored. We have constructed a set of 83 finite-state grammars which parse auxiliary verbs and thus recognizes the ‘head verb’, that is, the lemma.
We generalize the notion of auxiliary verb to verbs with sentential complements which have transformed constructions (e.g. I want to go) that can be parsed in exactly the same way as tense auxiliaries or modal verbs.
Ambiguities arise, in particular because adverbial inserts occur inside the compound verbs,. We show how local grammars describing nominal contexts can be used to reduce the degree of ambiguity.
Document Type: Research Article
Affiliations: Laboratoire d’Automatique Documentaire et Linguistique UMR N°7546 du CNRS, University Paris 7
Publication date: October 1, 2000