Ordering and selecting components in multivariate or functional data linear prediction
The problem of component choice in regression-based prediction has a long history. The main cases where important choices must be made are functional data analysis, and problems in which the explanatory variables are relatively high dimensional vectors. Indeed, principal component analysis has become the basis for methods for functional linear regression. In this context the number of components can also be interpreted as a smoothing parameter, and so the viewpoint is a little different from that for standard linear regression. However, arguments for and against conventional component choice methods are relevant to both settings and have received significant recent attention. We give a theoretical argument, which is applicable in a wide variety of settings, justifying the conventional approach. Although our result is of minimax type, it is not asymptotic in nature; it holds for each sample size. Motivated by the insight that is gained from this analysis, we give theoretical and numerical justification for cross-validation choice of the number of components that is used for prediction. In particular we show that cross-validation leads to asymptotic minimization of mean summed squared error, in settings which include functional data analysis.
Keywords: Basis choice; Cross-validation; Dimension reduction; Eigenvector; Functional data analysis; High dimensional data; Linear regression; Mean-squared error; Orthogonal series; Principal component analysis
Document Type: Research Article
Publication date: 2010-01-01