Predicting Protein Structural Class by Incorporating Patterns of Over- Represented k-mers into the General form of Chou's PseAAC
Abstract:Computational prediction of protein structural class based on sequence data remains a challenging problem in current protein science. In this paper, a new feature extraction approach based on relative polypeptide composition is introduced. This approach could take into account the background distribution of a given k-mer under a Markov model of order k-2, and avoid the curse of dimensionality with the increase of k by using a T-statistic feature selection strategy. The selected features are then fed to a support vector machine to perform the prediction. To verify the performance of our method, jackknife cross-validation tests are performed on four widely used benchmark datasets. Comparison of our results with existing methods shows that our method provides satisfactory performance for structural class prediction.
Keywords: GPCRs; Markov model; Structural Classification of Proteins (SCOP) database; T-statistic; domain; homology; protein structural class; regulatory pathways; relative polypeptide composition; support vector machine
Document Type: Research Article
Publication date: February 1, 2012
- Protein & Peptide Letters publishes short papers in all important aspects of protein and peptide research, including structural studies, recombinant expression, function, synthesis, enzymology, immunology, molecular modeling, drug design etc. Manuscripts must have a significant element of novelty, timeliness and urgency that merit rapid publication. Reports of crystallisation, and preliminary structure determinations of biologically important proteins are acceptable. Purely theoretical papers are also acceptable provided they provide new insight into the principles of protein/peptide structure and function.