A latent Gaussian model for compositional data with zeros
Compositional data record the relative proportions of different components within a mixture and arise frequently in many fields. Standard statistical techniques for the analysis of such data assume the absence of proportions which are genuinely zero. However, real data can contain a substantial number of zero values. We present a latent Gaussian model for the analysis of compositional data which contain zero values, which is based on assuming that the data arise from a (deterministic) Euclidean projection of a multivariate Gaussian random variable onto the unit simplex. We propose an iterative algorithm to simulate values from this model and apply the model to data on the proportions of fat, protein and carbohydrate in different groups of food products. Finally, evaluation of the likelihood involves the calculation of difficult integrals if the number of components is more than 3, so we present a hybrid Gibbs rejection sampling scheme that can be used to draw inferences about the parameters of the model when the number of components is arbitrarily large.
Document Type: Research Article
Affiliations: Biomathematics and Statistics Scotland, Edinburgh, UK
Publication date: December 1, 2008