Most Similar Neighbor: An Improved Sampling Inference Procedure for Natural Resource Planning
To model ecosystem functioning for landscape design, analysts would like detailed data about each parcel of land in the landscape. Usually, only information of low resolution is available for the entire area, supplemented by detailed information for a sample of the parcels. These sample data, usually obtained through two-phase sampling, provide initial values of important design elements for dynamic, often nonlinear, models of ecosystem functioning. However, to represent the contribution of the nonsampled portions of the landscape to ecosystem functioning, it would be convenient to be able to operate as if the detailed design information were available for each and every parcel in the analysis. Inference procedures to complete the design information for the unsampled parcels have usually followed the techniques of stratified or regression sampling. These procedures have been developed with regard to their efficiency for estimating population means and totals rather than for their utility to model ecosystem functioning and response to intervention. Stratified sampling or regression estimates therefore do not retain the complex relationships between multivariate design attributes. We present a new multivariate inference procedure for use in such circumstances. In place of estimating design attributes element-by-element in a traditional sense for each first-phase observation, the procedure simply chooses the most similar parcel from the set of parcels with detailed examinations to act as its stand-in. The stand-in is chosen on the basis of a similarity measure that summarizes the multivariate relationships between the set of low resolution indicator attributes and the set of detailed design attributes derived from the second-phase sample. Canonical correlation analysis is used to derive a similarity function for this procedure, which we call "Most Similar Neighbor Inference." We compared most similar neighbor estimates for a multivariate forest inventory to estimates from regression, stratified sampling, and a Swedish National Forest Survey method. The indicator attributes were recorded from stand records, maps, and aerial photographs, while the design attributes were stand yield characteristics derived from on-the-ground inventories. The most similar neighbor estimates have prediction errors that are comparable in magnitude to the traditional estimates for easy-to-predict design attributes. Thus, most similar neighbor inference should be expected to perform almost as well as regression in sampling contexts requiring estimates of population means or totals. More importantly, the most similar neighbor procedure more closely reproduces the covariance structure of the design attributes. Preserving the relationships among design attributes is a vital feature when the purpose of the modeling is to evaluate management options. Furthermore, because most similar neighbor is an exact interpolator, estimates derived from it are consistent in a finite population sense. For. Sci. 41(2):337-359.
No Supplementary Data
No Article Media
Document Type: Journal Article
Affiliations: Principal Mensurationist, Intermountain Research Station, USDA Forest Service, 1221 S. Main, Moscow, ID 83843
Publication date: 01 May 1995