Improving Geocode Accuracy with Candidate Selection Criteria
Geocoding systems typically use more than one geographic reference dataset to improve match rates and spatial accuracy, resulting in multiple candidate geocodes from which the single “best” result must be selected. Little scientific evidence exists for formalizing this selection process or comparing one strategy to another, leading to the approach used in existing systems which we term the hierarchy-based criterion: place the available reference data layers into qualitative, static, and in many cases, arbitrary hierarchies and attempt a match in each layer, in order. The first non-ambiguous match with suitable confidence is selected and returned as output. This approach assumes global relationships of relative accuracy between reference data layers, ignoring local variations that could be exploited to return more precise geocodes. We propose a formalization of the selection criteria and present three alternative strategies which we term the uncertainty-, gravitationally-, and topologically-based strategies. The performance of each method is evaluated against two ground truth datasets of nationwide GPS points to determine any resulting spatial improvements. We find that any of the three new methods improves on current practice in the majority of cases. The gravitationally- and topologically-based approaches offer improvement over a simple uncertainty-based approach in cases with specific characteristics.
Document Type: Research Article
Affiliations: Department of Preventive MedicineUniversity of Southern California
Publication date: July 1, 2010