Robustness and accuracy of methods for high dimensional data analysis based on Student's t-statistic
Student's t-statistic is finding applications today that were never envisaged when it was introduced more than a century ago. Many of these applications rely on properties, e.g. robustness against heavy-tailed sampling distributions, that were not explicitly considered until relatively recently. We explore these features of the t-statistic in the context of its application to very high dimensional problems, including feature selection and ranking, the simultaneous testing of many different hypotheses and sparse, high dimensional signal detection. Robustness properties of the t-ratio are highlighted, and it is established that those properties are preserved under applications of the bootstrap. In particular, bootstrap methods correct for skewness and therefore lead to second-order accuracy, even in the extreme tails. Indeed, it is shown that the bootstrap and also the more popular but less accurate t-distribution and normal approximations are more effective in the tails than towards the middle of the distribution. These properties motivate new methods, e.g. bootstrap-based techniques for signal detection, that confine attention to the significant tail of a statistic.
Keywords: Bootstrap; Central limit theorem; Classification; Dimension reduction; Higher criticism; Large deviation probability; Moderate deviation probability; Ranking; Second-order accuracy; Skewness; Tail probability; Variable selection
Document Type: Research Article
Publication date: June 1, 2011