Nucleosome Positioning Based on the Sequence Word Composition
Abstract:The DNA of all eukaryotic organisms is packaged into nucleosomes (a basic repeating unit of chromatin). A nucleosome consists of histone octamer wrapped by core DNA and linker histone H1 associated with linker DNA. It has profound effects on all DNA-dependent processes by affecting sequence accessibility. Understanding the factors that influence nucleosome positioning has great help to the study of genomic control mechanism. Among many determinants, the inherent DNA sequence has been suggested to have a dominant role in nucleosome positioning in vivo. Here, we used the method of minimum redundancy maximum relevance (mRMR) feature selection and the nearest neighbor algorithm (NNA) combined with the incremental feature selection (IFS) method to identify the most important sequence features that either favor or inhibit nucleosome positioning. We analyzed the words of 53,021 nucleosome DNA sequences and 50,299 linker DNA sequences of Saccharomyces cerevisiae. 32 important features were abstracted from 5,460 features, and the overall prediction accuracy through jackknife cross-validation test was 76.5%. Our results support that sequencedependent DNA flexibility plays an important role in positioning nucleosome core particles and that genome sequence facilitates the rapid nucleosome reassembly instead of nucleosome depletion. Besides, our results suggest that there exist some additional features playing a considerable role in discriminating nucleosome forming and inhibiting sequences. These results confirmed that the underlying DNA sequence plays a major role in nucleosome positioning.
Keywords: DNA flexibility; DNA helix; DNA-dependent processes; IFS; MMPF; MRMR; NNA; Pearson's Chi-squared test; Statistical Analysis Methods; TF; TFBS; chromatin; feature selection; nucleosome positioning; sequence word composition
Document Type: Research Article
Publication date: 2012-01-01
- Protein & Peptide Letters publishes short papers in all important aspects of protein and peptide research, including structural studies, recombinant expression, function, synthesis, enzymology, immunology, molecular modeling, drug design etc. Manuscripts must have a significant element of novelty, timeliness and urgency that merit rapid publication. Reports of crystallisation, and preliminary structure determinations of biologically important proteins are acceptable. Purely theoretical papers are also acceptable provided they provide new insight into the principles of protein/peptide structure and function.