A Bayesian hidden Markov mixture model to detect overexpressed chromosome regions
We propose a hidden Markov mixture model for the analysis of gene expression measurements mapped to chromosome locations. These expression values represent preprocessed light intensities observed in each probe of Affymetrix oligonucleotide arrays. Here, the algorithm BLAT is used to align thousands of probe sequences to each chromosome. The main goal is to identify genome regions associated with high expression values which define clusters composed of consecutive observations. The model proposed assumes a mixture distribution in which one of the components (the one with the highest expected value) is supposed to accommodate the overexpressed clusters. The model takes advantage of the serial structure of the data and uses the distance information between neighbours to infer about the existence of a Markov dependence. This dependence is crucially important in the detection of overexpressed regions. We propose and discuss a Markov chain Monte Carlo algorithm to fit the model. Finally, the methodology proposed is used to analyse five data sets representing three types of cancer (breast, ovarian and brain).
No Supplementary Data
No Article Media