Coefficient sign prediction methods for model selection
We consider a Bayesian model selection strategy based on predicting the signs of the coefficients in a regression model, i.e. we consider identification of coefficients in a full or encompassing model for which we can confidently predict whether they are positive or negative. This is useful when our main purpose in doing model selection is interpretation, since the sign of a coefficient is often of primary importance for this task. In the case of a linear model with standard non-informative prior, we connect our sign coefficient prediction approach to the classical Zheng–Loh procedure for model selection. One advantage of our approach is that only specification of a prior on the full model is required, unlike standard Bayesian variable selection approaches which require specification of prior distributions on parameters in all submodels, and specification of a prior on the model itself. We consider applying our method with proper hierarchical shrinkage priors, which makes the procedure more useful in ‘large p, small n’ regression problems with more predictors than observations and in problems involving multicollinearity. In these problems we may wish to do prediction by using shrinkage methods in the full model, but interpreting which variables are important is also of interest. We compare selection by using our coefficient sign prediction approach with the recently proposed elastic net procedure of Zou and Hastie and observe that our method shares some of the features of the elastic net such as a group selection property. The method can be extended to more complex model selection problems such as selection on variance components in random-effects models. For selection on variance components where the parameter of interest is non-negative and hence prediction of the sign of the parameter not the appropriate way to proceed, we consider instead prediction of the sign of the score component for the parameter at zero, obtaining a method that is related to classical score tests on variance components.