Mixture Modeling for Genome-Wide Localization of Transcription Factors
Chromatin immunoprecipitation followed by DNA microarray analysis (ChIP-chip methodology) is an efficient way of mapping genome-wide protein–DNA interactions. Data from tiling arrays encompass DNA–protein interaction measurements on thousands or millions of short oligonucleotides (probes) tiling a whole chromosome or genome. We propose a new model-based method for analyzing ChIP-chip data. The proposed model is motivated by the widely used two-component multinomial mixture model of de novo motif finding. It utilizes a hierarchical gamma mixture model of binding intensities while incorporating inherent spatial structure of the data. In this model, genomic regions belong to either one of the following two general groups: regions with a local protein–DNA interaction (peak) and regions lacking this interaction. Individual probes within a genomic region are allowed to have different localization rates accommodating different binding affinities. A novel feature of this model is the incorporation of a distribution for the peak size derived from the experimental design and parameters. This leads to the relaxation of the fixed peak size assumption that is commonly employed when computing a test statistic for these types of spatial data. Simulation studies and a real data application demonstrate good operating characteristics of the method including high sensitivity with small sample sizes when compared to available alternative methods.
Document Type: Research Article
Affiliations: Department of Statistics and Department of Biostatistics and Medical Informatics, 1300 University Avenue, 1245B Medical Sciences Center, Madison, Wisconsin 53706, U.S.A., Email: [email protected]
Publication date: March 1, 2007