A model-assisted k-nearest neighbour approach to remove extrapolation bias
Abstract:In applications of the k-nearest neighbour technique (kNN) with real-valued attributes of interest (Y) the predictions are biased for units with ancillary values of X with poor or no representation in a sample of n units. In this article a model-assisted calibration is proposed that reduces unit-level extrapolation bias. The bias is estimated as the difference in model-based predictions of Y given the X-values of the true k nearest units and the k selected reference units. Calibrated kNN predictions are then obtained by adding this difference to the original kNN prediction. The relationship is modelled between Y and X with decorrelated X-variables, variables scaled to the interval [0,1] and Bernstein basis functions to capture changes in Y as a function of changes in X. Three examples with actual forest inventory data from Italy, the USA and Finland demonstrated that calibrated kNN predictions were, on average, closer to their true values than non-calibrated predictions. Calibrated predictions had a range much closer to the actual range of Y than non-calibrated predictions.
Document Type: Research Article
Affiliations: 1: Natural Resources Canada, Canadian Forest Service, Victoria, BC, Canada 2: Finnish Forest Research Institute, Vantaa, Finland 3: USDA Forest Service, Northern Research Station, Minnesota, MN, USA
Publication date: April 1, 2010