Representing and reducing error in natural‐resource classification using model combination
Authors: Huang, Z.; Lees, B.
Source: International Journal of Geographical Information Science, Volume 19, Number 5, May 2005 , pp. 603-621(19)
Publisher: Taylor and Francis Ltd
Abstract:Artificial Intelligence (AI) models such as Artificial Neural Networks (ANNs), Decision Trees and Dempster–Shafer's Theory of Evidence have long claimed to be more error-tolerant than conventional statistical models, but the way error is propagated through these models is unclear. Two sources of error have been identified in this study: sampling error and attribute error. The results show that these errors propagate differently through the three AI models. The Decision Tree was the most affected by error, the Artificial Neural Network was less affected by error, and the Theory of Evidence model was not affected by the errors at all. The study indicates that AI models have very different modes of handling errors. In this case, the machine-learning models, including ANNs and Decision Trees, are more sensitive to input errors. Dempster–Shafer's Theory of Evidence has demonstrated better potential in dealing with input errors when multisource data sets are involved. The study suggests a strategy of combining AI models to improve classification accuracy. Several combination approaches have been applied, based on a ‘majority voting system', a simple average, Dempster–Shafer's Theory of Evidence, and fuzzy-set theory. These approaches all increased classification accuracy to some extent. Two of them also demonstrated good performance in handling input errors. Second-stage combination approaches which use statistical evaluation of the initial combinations are able to further improve classification results. One of these second-stage combination approaches increased the overall classification accuracy on forest types to 54% from the original 46.5% of the Decision Tree model, and its visual appearance is also much closer to the ground data. By combining models, it becomes possible to calculate quantitative confidence measurements for the classification results, which can then serve as a better error representation. Final classification products include not only the predicted hard classes for individual cells, but also estimates of the probability and the confidence measurements of the prediction.
Document Type: Research Article
Affiliations: School of Resources, Environment, and Society, Australian National University, ACT 0200, Australia
Publication date: 2005-05-01