Robust Clustering Using Exponential Power Mixtures
Clustering is a widely used method in extracting useful information from gene expression data, where unknown correlation structures in genes are believed to persist even after normalization. Such correlation structures pose a great challenge on the conventional clustering methods, such as the Gaussian mixture (GM) model, k-means (KM), and partitioning around medoids (PAM), which are not robust against general dependence within data. Here we use the exponential power mixture model to increase the robustness of clustering against general dependence and nonnormality of the data. An expectation–conditional maximization algorithm is developed to calculate the maximum likelihood estimators (MLEs) of the unknown parameters in these mixtures. The Bayesian information criterion is then employed to determine the numbers of components of the mixture. The MLEs are shown to be consistent under sparse dependence. Our numerical results indicate that the proposed procedure outperforms GM, KM, and PAM when there are strong correlations or non-Gaussian components in the data.