Using Random Forest Algorithm to Predict β-Hairpin Motifs
Abstract:A novel method is presented for predicting β-hairpin motifs in protein sequences. That is Random Forest algorithm on the basis of the multi-characteristic parameters, which include amino acids component of position, hydropathy component of position, predicted secondary structure information and value of auto-correlation function. Firstly, the method is trained and tested on a set of 8,291 β-hairpin motifs and 6,865 non-β-hairpin motifs. The overall accuracy and Matthew's correlation coefficient achieve 82.2% and 0.64 using 5-fold cross-validation, while they achieve 81.7% and 0.63 using the independent test. Secondly, the method is also tested on a set of 4,884 β-hairpin motifs and 4,310 non-β- hairpin motifs which is used in previous studies. The overall accuracy and Matthew's correlation coefficient achieve 80.9% and 0.61 for 5-fold cross-validation, while they achieve 80.6% and 0.60 for the independent test. Compared with the previous, the present result is better. Thirdly, 4,884 β-hairpin motifs and 4,310 non-β-hairpin motifs selected as the training set, and 8,291 β-hairpin motifs and 6,865 non-β-hairpin motifs selected as the independent testing set, the overall accuracy and Matthew's correlation coefficient achieve 81.5% and 0.63 with the independent test.
Document Type: Research Article
Publication date: June 1, 2011
More about this publication?
- Protein & Peptide Letters publishes short papers in all important aspects of protein and peptide research, including structural studies, recombinant expression, function, synthesis, enzymology, immunology, molecular modeling, drug design etc. Manuscripts must have a significant element of novelty, timeliness and urgency that merit rapid publication. Reports of crystallisation, and preliminary structure determinations of biologically important proteins are acceptable. Purely theoretical papers are also acceptable provided they provide new insight into the principles of protein/peptide structure and function.