Skip to main content

Free Content A corpus-based connectionist architecture for large-scale natural language parsing

We describe a deterministic shift-reduce parsing model that combines the advantages of connectionism with those of traditional symbolic models for parsing realistic sub-domains of natural language. It is a modular system that learns to annotate natural language texts with syntactic structure. The parser acquires its linguistic knowledge directly from pre-parsed sentence examples extracted from an annotated corpus. The connectionist modules enable the automatic learning of linguistic constraints and provide a distributed representation of linguistic information that exhibits tolerance to grammatical variation. The inputs and outputs of the connectionist modules represent symbolic information which can be easily manipulated and interpreted and provide the basis for organizing the parse. Performance is evaluated using labelled precision and recall. (For a test set of 4128 words, precision and recall of 75% and 69%, respectively, were achieved.) The work presented represents a significant step towards demonstrating that broad coverage parsing of natural language can be achieved with simple hybrid connectionist architectures which approximate shift-reduce parsing behaviours. Crucially, the model is adaptable to the grammatical framework of the training corpus used and so is not predisposed to a particular grammatical formalism.

Keywords: CONNECTIONIST NETWORKS; CORPUS LINGUISTICS; DETERMINISTIC SHIFT-REDUCE PARSING; HYBRID ARCHITECTURES; NATURAL LANGUAGE PROCESSING; TREEBANK GRAMMAR

Document Type: Research Article

Publication date: 01 June 2002

More about this publication?
  • Access Key
  • Free content
  • Partial Free content
  • New content
  • Open access content
  • Partial Open access content
  • Subscribed content
  • Partial Subscribed content
  • Free trial content