Logistic regression models for predicting occurrence of terrestrial molluscs in southern Sweden – importance of environmental data quality and model complexity
We compared the performance of four logistic regression models of different complexity with different environmental data quality, in predicting the occurrence of 49 terrestrial mollusc species in southern Sweden. Performance of models derived from an explanatory data set was evaluated on a confirmatory data set. The overall predictive success of our models (>80% for the three best model approaches), is as good as in other studies, despite the fact that we had to transform a text database into quantitative habitat variables. Simple models (no variable interactions), with forward selection, and detailed habitat data (from field visits) showed the best overall predictive success (mean=84.8%). From comparisons of model approaches, we conclude that data quality (map-derived data vs habitat mapping) had a stronger impact than model complexity on model performance. However, most of these models showed relatively low values (mean=0.29) for Kappa (statistic for model evaluation), suggesting that the models need to be improved before they would be applied. Predictive success was strongly associated with species incidence but also Kappa was positively correlated with species incidence in univariate tests. Predictive success for true absences was negatively correlated with predictive success for true presences (R2=0.69) and most models failed to give a good prediction of both categories. Models for species with a high incidence in “Open dry sites” or “Mesic interior forests” had a better performance than expected, suggesting that occurrences of species with preference for “narrow” habitats are most easy to predict. Tree layer variables (openness and species abundance) were included in 48 of the 49 final predictive models, suggesting that these variables were good “indicators” of habitat conditions for ground-living molluscs. Twenty-four species models included distance to coast and altitude, and we interpret these associations as partly being related to differences in climate. In the final models, true presences (36.9% correctly classified) were much more difficult to predict than true absences (89.7% correct). Possible explanations might be that important habitat variables (e.g. chemical variables and site history) were not included. On the other hand, all suitable sites would not be expected to be occupied due to dynamics in local extinctions (meta-population theory).
No Supplementary Data
No Article Media
Document Type: Research Article
Publication date: February 1, 2004