Skip to main content

INTEX pour l’annotation semi-automatique d’un corpus d’anaphores

Buy Article:

$36.18 plus tax (Refund Policy)

Abstract:

Anaphors constitute a well-known problem in automatic text generation and natural language understanding. Using corpora to deal with such phenomena could help to develop robust processing techniques. Building such resources is, though, a tedious and time-consuming task and could more easily be accomplished by partial automation.

In this paper, we show how the intex system can be used for this task. We show that in a newspaper corpus (in this case, le Monde Diplomatique), discursive grammatical anaphors can easily be located via associated linguistic features. A series of transducers generating tags for categories and functions can thus be built, and constitutes an efficient pre-processing stage (though manual checking remains necessary). The heuristics, quickly and easily developed, are specific to the task. The study goes on to show, however, that discarding non-anaphoric pronouns is not straightforward in the case of non-referential personal pronouns or indefinite pronouns, and that the tagging of the grammatical function seems limited in the absence of real syntactic processing.

Document Type: Research Article

DOI: https://doi.org/10.1075/li.22.11tut

Affiliations: Équipe CRISTAL-GRESEC, Université Stendhal - Grenoble 3

Publication date: 2000-10-01

  • Access Key
  • Free ContentFree content
  • Partial Free ContentPartial Free content
  • New ContentNew content
  • Open Access ContentOpen access content
  • Partial Open Access ContentPartial Open access content
  • Subscribed ContentSubscribed content
  • Partial Subscribed ContentPartial Subscribed content
  • Free Trial ContentFree trial content
Cookie Policy
X
Cookie Policy
Ingenta Connect website makes use of cookies so as to keep track of data that you have filled in. I am Happy with this Find out more