Careful abstraction from instance families in memory-based language learning
Empirical studies in inductive language learning point at pure memory-based learning as a successful approach to many language learning tasks, often performing better than learning methods that abstract from the learning material. The possibility is left open, however, that limited, careful abstraction in memory-based learning may be harmless to generalization, as long as the disjunctivity of language data is preserved. We test this hypothesis, by comparing empirically a range of careful abstraction methods, focusing particularly on methods that (i) generalize instances and (ii) perform oblivious (partial) decision-tree abstraction. These methods are applied to a selection of language learning tasks, and their generalization performance as well as memory item compression rates are collected. On the basis of the results we conclude that when combined with feature weighting or value distance metrics, careful abstraction equals or outperforms pure memory-based learning, yet mainly on small data sets. In the concluding case study involving large data sets, we find that the FAMBL algorithm, a new careful abstractor which merges families of instances, performs close to pure memory-based learning, though it equals it only on three of the six tasks. On the basis of the gathered empirical results, we discuss the incorporation of the notion of instance families, i.e. carefully generalized instances, in memory-based language learning.
No Reference information available - sign in for access.
No Citation information available - sign in for access.
No Supplementary Data.
No Article Media
Document Type: Research Article
Publication date: July 1, 1999