Skip to main content

Predicting species distributions: a critical comparison of the most common statistical models using artificial species

Buy Article:

$51.00 plus tax (Refund Policy)


Abstract Aim 

To test statistical models used to predict species distributions under different shapes of occurrence–environment relationship. We addressed three questions: (1) Is there a statistical technique that has a consistently higher predictive ability than others for all kinds of relationships? (2) How does species prevalence influence the relative performance of models? (3) When an automated stepwise selection procedure is used, does it improve predictive modelling, and are the relevant variables being selected? Location 

We used environmental data from a real landscape, the state of California, and simulated species distributions within this landscape. Methods 

Eighteen artificial species were generated, which varied in their occurrence response to the environmental gradients considered (random, linear, Gaussian, threshold or mixed), in the interaction of those factors (no interaction vs. multiplicative), and on their prevalence (50% vs. 5%). The landscape was then randomly sampled with a large (n = 2000) or small (n = 150) sample size, and the predictive ability of each statistical approach was assessed by comparing the true and predicted distributions using five different indexes of performance (area under the receiver-operator characteristic curve, Kappa, correlation between true and predictive probability of occurrence, sensitivity and specificity). We compared generalized additive models (GAM) with and without flexible degrees of freedom, logistic regressions (general linear models, GLM) with and without variable selection, classification trees, and the genetic algorithm for rule-set production (GARP). Results 

Species with threshold and mixed responses, additive environmental effects, and high prevalence generated better predictions than did other species for all statistical models. In general, GAM outperforms all other strategies, although differences with GLM are usually not significant. The two variable-selection strategies presented here did not discriminate successfully between truly causal factors and correlated environmental variables. Main conclusions 

Based on our analyses, we recommend the use of GAM or GLM over classification trees or GARP, and the specification of any suspected interaction terms between predictors. An expert-based variable selection procedure was preferable to the automated procedures used here. Finally, for low-prevalence species, variability in model performance is both very high and sample-dependent. This suggests that distribution models for species with low prevalence can be improved through targeted sampling.

Keywords: Artificial species; GAM; GARP; GLM; classification trees; conservation biogeography; species-distribution modelling

Document Type: Research Article


Affiliations: Department of Environmental Science and Policy, University of California, 1 Shields Avenue, Davis, CA 95616, USA

Publication date: August 1, 2007

Access Key

Free Content
Free content
New Content
New content
Open Access Content
Open access content
Partial Open Access Content
Partial Open access content
Subscribed Content
Subscribed content
Free Trial Content
Free trial content
Cookie Policy
Cookie Policy
Ingenta Connect website makes use of cookies so as to keep track of data that you have filled in. I am Happy with this Find out more