BM‐Map: Bayesian Mapping of Multireads for Next‐Generation Sequencing Data
Summary Next‐generation sequencing (NGS) technology generates millions of short reads, which provide valuable information for various aspects of cellular activities and biological functions. A key step in NGS applications (e.g., RNA‐Seq) is
to map short reads to correct genomic locations within the source genome. While most reads are mapped to a unique location, a significant proportion of reads align to multiple genomic locations with equal or similar numbers of mismatches; these are called multireads. The ambiguity in mapping
the multireads may lead to bias in downstream analyses. Currently, most practitioners discard the multireads in their analysis, resulting in a loss of valuable information, especially for the genes with similar sequences. To refine the read mapping, we develop a Bayesian model that computes
the posterior probability of mapping a multiread to each competing location. The probabilities are used for downstream analyses, such as the quantification of gene expression. We show through simulation studies and RNA‐Seq analysis of real life data that the Bayesian method yields better
mapping than the current leading methods. We provide a C++ program for downloading that is being packaged into a user‐friendly software.
Document Type: Research Article
Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, U.S.A.
Department of Statistics, Rice University, Houston, Texas 77005, U.S.A.
Department of Statistics, University of Wisconsin–Madison, Wisconsin 53706, U.S.A.
Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, U.S.A.
Publication date: 2011-12-01