Cutpoint Selection for Categorizing a Continuous Predictor
This article presents a new approach for choosing the number of categories and the location of category cutpoints when a continuous exposure variable needs to be categorized to obtain tabular summaries of the exposure effect. The optimum categorization is defined as the partition that minimizes a measure of distance between the true expected value of the outcome for each subject and the estimated average outcome among subjects in the same exposure category. To estimate the optimum partition, an efficient nonparametric estimate of the unknown regression function is substituted into a formula for the asymptotically optimum categorization. This new approach is easy to implement and it outperforms existing cutpoint selection methods.