Creating and using Web corpora

Author: Thelwall, Mike1

Source: International Journal of Corpus Linguistics, Volume 10, Number 4, 2005 , pp. 517-541(25)

Publisher: John Benjamins Publishing Company

Key:
Free Content - Free Content
New Content - New Content
Subscribed Content - Subscribed Content
Free Trial Content - Free Trial Content

Abstract:

<br />The Web has recently been used as a corpus for linguistic investigations, often with the help of a commercial search engine. We discuss some potential problems with collecting data from commercial search engine and with using the Web as a corpus. We outline an alternative strategy for data collection, using a personal Web crawler. As a case study, the university Web sites of three nations (Australia, New Zealand and the UK) were crawled. The most frequent words were broadly consistent with non-Web written English, but with some academic-related words amongst the top 50 most frequent. It was also evident that the university Web sites contained a significant amount of non-English text, and academic Web English seems to be more future-oriented than British National Corpus written English.

Keywords: academic language; web; web corpus

Document Type: Research article

DOI: 10.1075/ijcl.10.4.07the

Affiliations: 1: University of Wolverhampton

The full text electronic article is available for purchase. You will be able to download the full text electronic article after payment.

$38.34 plus tax      Refund Policy

 

OR

Back to top

Key:
Free Content - Free Content
New Content - New Content
Subscribed Content - Subscribed Content
Free Trial Content - Free Trial Content
Share this item with others: These icons link to social bookmarking sites where readers can share and discover new web pages.
Page Help Click here for Page Help
Shopping cart
Tools
Sign in






Need to register?
Sign up here
Text size: A | A | A | A