Creating and using Web corpora
Author: Thelwall, Mike1
Source: International Journal of Corpus Linguistics, Volume 10, Number 4, 2005 , pp. 517-541(25)
Publisher: John Benjamins Publishing Company
Key:
- Free Content
- New Content
- Subscribed Content
- Free Trial Content
Abstract:
<br />The Web has recently been used as a corpus for linguistic investigations, often with the help of a commercial search engine. We discuss some potential problems with collecting data from commercial search engine and with using the Web as a corpus. We outline an alternative strategy for data collection, using a personal Web crawler. As a case study, the university Web sites of three nations (Australia, New Zealand and the UK) were crawled. The most frequent words were broadly consistent with non-Web written English, but with some academic-related words amongst the top 50 most frequent. It was also evident that the university Web sites contained a significant amount of non-English text, and academic Web English seems to be more future-oriented than British National Corpus written English.Keywords: academic language; web; web corpus
Document Type: Research article
DOI: 10.1075/ijcl.10.4.07the
Key:
- Free Content
- New Content
- Subscribed Content
- Free Trial Content

Click here for Page Help