Location-based algorithms for finding sets of corresponding objects over several geo-spatial data sets
Abstract:When integrating geo-spatial data sets, a join algorithm is used for finding sets of corresponding objects (i.e., objects that represent the same real-world entity). This article investigates location-based join algorithms for integration of several data sets. First, algorithms for integration of two data sets are presented and their performances, in terms of recall and precision, are compared. Then, two approaches for integration of more than two data sets are described. In one approach, all the integrated data sets are processed simultaneously. In the second approach, a join algorithm for two data sets is applied sequentially, either in a serial manner, where in each join at least one of the joined data sets is a single source, or in a hierarchical manner, where two join results can be joined. For the two approaches, join algorithms are given. The algorithms are designed to perform well even when location of objects are imprecise and each data set represents only some of the real-world entities. Results of extensive experiments with the different approaches are provided and analyzed. The experiments show the differences, in accuracy and efficiency, between the approaches, under different circumstances. The results also show that all our algorithms have much better accuracy than applying the commonly used one-sided nearest-neighbor join.
Document Type: Research Article
Affiliations: 1: Mapping and Geo-Information Engineering, Technion - Israel Institute of Technology, Haifa, Israel 2: Department of Computer Science, Technion - Israel Institute of Technology, Haifa, Israel 3: School of Engineering and Computer Science, The Hebrew University, Jerusalem, Israel
Publication date: 2010-01-01