Agricultural Case Studies of Classification Accuracy, Spectral Resolution, and Model Over-Fitting
Abstract:This paper describes the relationship between spectral resolution and classification accuracy in analyses of hyperspectral imaging data acquired from crop leaves. The main scope is to discuss and reduce the risk of model over-fitting. Over-fitting of a classification model occurs when too many and/or irrelevant model terms are included (i.e., a large number of spectral bands), and it may lead to low robustness/repeatability when the classification model is applied to independent validation data. We outline a simple way to quantify the level of model over-fitting by comparing the observed classification accuracies with those obtained from explanatory random data. Hyperspectral imaging data were acquired from two crop‐insect pest systems: (1) potato psyllid (Bactericera cockerelli) infestations of individual bell pepper plants (Capsicum annuum) with the acquisition of hyperspectral imaging data under controlled-light conditions (data set 1), and (2) sugarcane borer (Diatraea saccharalis) infestations of individual maize plants (Zea mays) with the acquisition of hyperspectral imaging data from the same plants under two markedly different image-acquisition conditions (data sets 2a and b). For each data set, reflectance data were analyzed based on seven spectral resolutions by dividing 160 spectral bands from 405 to 907 nm into 4, 16, 32, 40, 53, 80, or 160 bands. In the two data sets, similar classification results were obtained with spectral resolutions ranging from 3.1 to 12.6 nm. Thus, the size of the initial input data could be reduced fourfold with only a negligible loss of classification accuracy. In the analysis of data set 1, several validation approaches all demonstrated consistently that insect-induced stress could be accurately detected and that therefore there was little indication of model over-fitting. In the analyses of data set 2, inconsistent validation results were obtained and the observed classification accuracy (81.06%) was only a few percentage points above that obtained using random data (66.7‐77.4%). Thus, our analysis highlights a potential risk of model over-fitting and emphasizes the importance of testing for this important aspect as part of developing reliable and robust classification models.
Document Type: Research Article
Affiliations: University of Western Australia, School of Animal Biology, UWA Institute of Agriculture, 35 Stirling Highway, Crawley, Perth, Western Australia 6009, Australia
Publication date: November 1, 2013
More about this publication?
- The Society publishes the internationally recognized, peer reviewed journal, Applied Spectroscopy, which is available both in print and online. Subscriptions are included with membership or can be purchased by institutional or corporate organizations. Abstracts may be viewed free of charge. Previously published as Bulletin (Society for Applied Spectroscopy)
- Editorial Board
- Information for Authors
- Submit a Paper
- Subscribe to this Title
- Membership Information
- Request copyrighted SAS materials
- Spectroscopic Nomenclature
- Focal Point (Open Access)
- ingentaconnect is not responsible for the content or availability of external websites