Efficient Regression Analysis with Ranked-Set Sampling
This article is motivated by a lung cancer study where a regression model is involved and the response variable is too expensive to measure but the predictor variable can be measured easily with relatively negligible cost. This situation occurs quite often in medical studies, quantitative genetics, and ecological and environmental studies. In this article, by using the idea of ranked-set sampling (RSS), we develop sampling strategies that can reduce cost and increase efficiency of the regression analysis for the above-mentioned situation. The developed method is applied retrospectively to a lung cancer study. In the lung cancer study, the interest is to investigate the association between smoking status and three biomarkers: polyphenol DNA adducts, micronuclei, and sister chromatic exchanges. Optimal sampling schemes with different optimality criteria such as A-, D-, and integrated mean square error (IMSE)-optimality are considered in the application. With set size 10 in RSS, the improvement of the optimal schemes over simple random sampling (SRS) is great. For instance, by using the optimal scheme with IMSE-optimality, the IMSEs of the estimated regression functions for the three biomarkers are reduced to about half of those incurred by using SRS.
Document Type: Research Article
Affiliations: Department of Statistics and Applied Probability, National University of Singapore, 6 Science Drive 2, Singapore 117546, Singapore
Publication date: 2004-12-01