A Parallel Framework for Processing Massive Spatial Data with a Split–and–Merge Paradigm
Due to high data volume, massive spatial data requires considerable computing power for real‐time processing. Currently, high performance clusters are the only economically viable solution given the development of multicore technology and computer component cost reduction in recent years. Massive spatial data processing demands heavy I/O operations, however, and should be characterized as a data‐intensive application. Data‐intensive application parallelization strategies, such as decomposition, scheduling and load‐balance, are much different from that of traditional compute‐intensive applications. In this article we introduce a Split‐and‐Merge paradigm for spatial data processing and also propose a robust parallel framework in a cluster environment to support this paradigm. The Split‐and‐Merge paradigm efficiently exploits data parallelism for massive data processing. The proposed framework is based on the open‐source TORQUE project and hosted on a multicore‐enabled Linux cluster. A specific data‐aware scheduling algorithm was designed to exploit data sharing between tasks and decrease the data communication time. Two LiDAR point cloud algorithms, IDW interpolation and Delaunay triangulation, were implemented on the proposed framework to evaluate its efficiency and scalability. Experimental results demonstrate that the system provides efficient performance speedup.
Document Type: Research Article
Publication date: December 1, 2012