Skip to main content

Extracting geographic features from the Internet to automatically build detailed regional gazetteers

Buy Article:

$59.35 plus tax (Refund Policy)

Abstract:

The utility of every imaginable application which incorporates a gazetteer hinges on the simple fact that the resulting system will only be as useful, complete, or accurate as the underlying gazetteer itself. A major issue confronting gazetteers utilized in systems today is that they are not complete and measures of their accuracy are largely unknown. In this paper we describe a methodology which addresses this problem by automatically generating highly complete and detailed regional gazetteers from Internet sources. We utilize information extraction and integration techniques to automatically obtain geographic features and associated footprints and feature types from freely and widely available online data which could be applied to create a gazetteer for nearly any area. We discuss the distinguishing characteristics of the generated gazetteer and extend previous work to define measures which can be used to assess the completeness and accuracy of gazetteers. Using these measures, the generated gazetteer is evaluated against the Alexandria Digital Library Gazetteer and the Los Angeles Comprehensive Bibliographic Database. Our results indicate that a gazetteer created by our methods will be at least as complete as any gazetteer currently available for certain feature classes, while falling short in others. We conclude by offering suggestions to address these shortcomings.

Keywords: Gazetteers; Geographic information extraction

Document Type: Research Article

DOI: http://dx.doi.org/10.1080/13658810802577262

Affiliations: 1: Department of Computer Science, University of Southern California, Los Angeles, CA 90089-0255, USA 2: Department of Geography, University of Southern California, Los Angeles, CA 90089-0255, USA 3: Department of Computer Science, University of Southern California, Marina del Rey, CA 90292, USA

Publication date: January 1, 2009

More about this publication?
tandf/tgis/2009/00000023/00000001/art00006
dcterms_title,dcterms_description,pub_keyword
6
5
20
40
5

Access Key

Free Content
Free content
New Content
New content
Open Access Content
Open access content
Subscribed Content
Subscribed content
Free Trial Content
Free trial content
Cookie Policy
X
Cookie Policy
ingentaconnect website makes use of cookies so as to keep track of data that you have filled in. I am Happy with this Find out more