Creating and using Web corpora
Author: Thelwall, Mike
Source: International Journal of Corpus Linguistics, Volume 10, Number 4, 2005 , pp. 517-541(25)
Publisher: John Benjamins Publishing Company
Abstract:
<br />The Web has recently been used as a corpus for linguistic investigations, often with the help of a commercial search engine. We discuss some potential problems with collecting data from commercial search engine and with using the Web as a corpus. We outline an alternative strategy for data collection, using a personal Web crawler. As a case study, the university Web sites of three nations (Australia, New Zealand and the UK) were crawled. The most frequent words were broadly consistent with non-Web written English, but with some academic-related words amongst the top 50 most frequent. It was also evident that the university Web sites contained a significant amount of non-English text, and academic Web English seems to be more future-oriented than British National Corpus written English.Keywords: academic language; web; web corpus
Document Type: Research article
DOI: http://dx.doi.org/10.1075/ijcl.10.4.07the
Affiliations: 1: University of Wolverhampton
Publication date: 2005-01-01
- In this: publication
- By this: publisher
- In this Subject: Language & Linguistics
- By this author: Thelwall, Mike

Shopping cart
Receive new issue alert
Get Permissions