Ontology-driven discovery of geospatial evidence in web pages
Source: GeoInformatica, Volume 15, Number 4, October 2011 , pp. 609-631(23)
Abstract:When users need to find something on the Web that is related to a place, chances are place names will be submitted along with some other keywords to a search engine. However, automatic recognition of geographic characteristics embedded in Web documents, which would allow for a better connection between documents and places, remains a difficult task. We propose an ontology-driven approach to facilitate the process of recognizing, extracting, and geocoding partial or complete references to places embedded in text. Our approach combines an extraction ontology with urban gazetteers and geocoding techniques. This ontology, called OnLocus, is used to guide the discovery of geospatial evidence from the contents of Web pages. We show that addresses and positioning expressions, along with fragments such as postal codes or telephone area codes, provide satisfactory support for local search applications, since they are able to determine approximations to the physical location of services and activities named within Web pages. Our experiments show the feasibility of performing automated address extraction and geocoding to identify locations associated to Web pages. Combining location identifiers with basic addresses improved the precision of extractions and reduced the number of false positive results.
Document Type: Research Article
Affiliations: 1: PRODABEL-Empresa de Informática e Informação do Município de Belo Horizonte, Av. Pres. Carlos Luz, 1275, 31230-000, Belo Horizonte, MG, Brazil, Email: email@example.com 2: Departamento de Ciência da Computação, Universidade Federal de Minas Gerais, Av. Pres. Antônio Carlos, 6627, 31270-010, Belo Horizonte, MG, Brazil, Email: firstname.lastname@example.org 3: Departamento de Ciência da Computação, Universidade Federal de Minas Gerais, Av. Pres. Antônio Carlos, 6627, 31270-010, Belo Horizonte, MG, Brazil, Email: email@example.com 4: Instituto de Informática, Universidade de Campinas, Av. Albert Einstein,1251, 13083-970, Campinas, SP, Brazil, Email: firstname.lastname@example.org
Publication date: October 1, 2011