The hidden dimension: a paradigmatic view of data-driven NLP
Many tasks in language analysis are described as the maximally economic mapping of one level of linguistic representation onto another such level. Over the past decade, many different machine-learning strategies have been developed to automatically induce such mappings directly from data. In this paper, we contend that the way most learning algorithms have been applied to problems of language analysis reflects a strong bias towards a compositional (or biunique) model of interlevel mapping. Although this is justified in some cases, we contend that biunique inter-level mapping is not a jack of all trades. A model of analogical learning, based on a paradigmatic reanalysis of memorized data, is presented here. The methodological pros and cons of this approach are discussed in relation to a number of germane linguistic issues and illustrated in the context of three case studies: word pronunciation, word analysis, and word sense disambiguation. The evidence produced here seems to suggest that the brain is not designed to carry out the logically simplest and maximally economic way of relating form and function in language. Rather we propose a radical shift of emphasis in language learning from syntagmatic inter-level mapping to paradigmatically-constrained intra-level mapping.
No Reference information available - sign in for access.
No Citation information available - sign in for access.
No Supplementary Data.
No Article Media
Document Type: Research Article
Publication date: July 1, 1999