From n-gram to skipgram to concgram

Authors: Cheng, Winnie; Greaves, Chris; Warren, Martin

Source: International Journal of Corpus Linguistics, Volume 11, Number 4, 2006 , pp. 411-433(23)

Publisher: John Benjamins Publishing Company

Buy & download fulltext article:

OR

Price: $37.41 plus tax (Refund Policy)

Abstract:

Uncovering the extent of word associations and how they are manifested has been an important area of study in corpus linguistics since the 1960s (Sinclair et al. 1970). This paper defines and describes a new way of categorising word association, the concgram, which constitutes all of the permutations of constituency and positional variation generated by the association of two or more words. Concgrams are identified without prior input from the user (other than to set the size of the span) employing a fully automated search that reveals all of the word association patterns that exist in a corpus. This study argues that concgrams represent more fully word associations in a corpus. Most concgrams seem to be non-contiguous, and show both constituency (AB, ACB) and positional (AB, BA) variations. Further studies of concgrams will help in the task of uncovering the full extent of the idiom principle (Sinclair 1987).

Keywords: concgram; constituency and positional variations; contiguous and non-contiguous word associations; corpus linguistics

Document Type: Research article

DOI: http://dx.doi.org/10.1075/ijcl.11.4.04che

Affiliations: 1: The Hong Kong Polytechnic University

Publication date: 2006-01-01

Related content

Tools

Key

Free Content
Free content
New Content
New content
Open Access Content
Open access content
Subscribed Content
Subscribed content
Free Trial Content
Free trial content

Text size:

A | A | A | A
Share this item with others: These icons link to social bookmarking sites where readers can share and discover new web pages. print icon Print this page