Creating and using Web corpora

Author: Thelwall, Mike

Source: International Journal of Corpus Linguistics, Volume 10, Number 4, 2005 , pp. 517-541(25)

Publisher: John Benjamins Publishing Company

Buy & download fulltext article:

OR

Price: $37.41 plus tax (Refund Policy)

Abstract:

<br />The Web has recently been used as a corpus for linguistic investigations, often with the help of a commercial search engine. We discuss some potential problems with collecting data from commercial search engine and with using the Web as a corpus. We outline an alternative strategy for data collection, using a personal Web crawler. As a case study, the university Web sites of three nations (Australia, New Zealand and the UK) were crawled. The most frequent words were broadly consistent with non-Web written English, but with some academic-related words amongst the top 50 most frequent. It was also evident that the university Web sites contained a significant amount of non-English text, and academic Web English seems to be more future-oriented than British National Corpus written English.

Keywords: academic language; web; web corpus

Document Type: Research article

DOI: http://dx.doi.org/10.1075/ijcl.10.4.07the

Affiliations: 1: University of Wolverhampton

Publication date: 2005-01-01

Related content

Tools

Key

Free Content
Free content
New Content
New content
Open Access Content
Open access content
Subscribed Content
Subscribed content
Free Trial Content
Free trial content

Text size:

A | A | A | A
Share this item with others: These icons link to social bookmarking sites where readers can share and discover new web pages. print icon Print this page