Biomarker detection and categorization in ribonucleic acid sequencing meta‐analysis using Bayesian hierarchical models
Meta‐analysis combining multiple transcriptomic studies increases statistical power and accuracy in detecting differentially expressed genes. As the next‐generation sequencing experiments become mature and affordable, increasing numbers of ribonucleic acid sequencing (‘RNA‐seq’) data sets are becoming available in the public domain. Count‐data‐based technology provides better experimental accuracy, reproducibility and ability to detect low expressed genes. A naive approach to combine multiple RNA‐seq studies is to apply differential analysis tools such as edgeR and DESeq to each study and then to combine the summary statistics of p‐values or effect sizes by conventional meta‐analysis methods. Such a two‐stage approach loses statistical power, especially for genes with short length or low expression abundance. We propose a full Bayesian hierarchical model (namely, BayesMetaSeq) for RNA‐seq meta‐analysis by modelling count data, integrating information across genes and across studies, and modelling potentially heterogeneous differential signals across studies via latent variables. A Dirichlet process mixture prior is further applied on the latent variables to provide categorization of detected biomarkers according to their differential expression patterns across studies, facilitating improved interpretation and biological hypothesis generation. Simulations and a real application on multiple brain region human immunodeficiency virus type 1 transgenic rats demonstrate improved sensitivity, accuracy and biological findings of the method.
No Supplementary Data
No Article Media