Improving the Classification of Nuclear Receptors with Feature Selection
Abstract:Nuclear receptors are involved in multiple cellular signaling pathways that affect and regulate processes. Because of their physiology and pathophysiology significance, classification of nuclear receptors is essential for the proper understanding of their functions. Bhasin and Raghava have shown that the subfamilies of nuclear receptors are closely correlated with their amino acid composition and dipeptide composition . They characterized each protein by a 400 dimensional feature vector. However, using high dimensional feature vectors for characterization of protein sequences will increase the computational cost as well as the risk of overfitting. Therefore, using only those features that are most relevant to the present task might improve the prediction system, and might also provide us with some biologically useful knowledge. In this paper a feature selection approach was proposed to identify relevant features and a prediction engine of support vector machines was developed to estimate the prediction accuracy of classification using the selected features. A reduced subset containing 30 features was accepted to characterize the protein sequences in view of its good discriminative power towards the classes, in which 18 are of amino acid composition and 12 are of dipeptide composition. This reduced feature subset resulted in an overall accuracy of 98.9% in a 5-fold cross-validation test, higher than 88.7% of amino acid composition based method and almost as high as 99.3% of dipeptide composition based method. Moreover, an overall accuracy of 93.7% was reached when it was evaluated on a blind data set of 63 nuclear receptors. On the other hand, an overall accuracy of 96.1% and 95.2% based on the reduced 12 dipeptide compositions was observed simultaneously in the 5-fold cross-validation test and the blind data set test, respectively. These results demonstrate the effectiveness of the present method.
Document Type: Research Article
Publication date: 2009-07-01
More about this publication?
- Protein & Peptide Letters publishes short papers in all important aspects of protein and peptide research, including structural studies, recombinant expression, function, synthesis, enzymology, immunology, molecular modeling, drug design etc. Manuscripts must have a significant element of novelty, timeliness and urgency that merit rapid publication. Reports of crystallisation, and preliminary structure determinations of biologically important proteins are acceptable. Purely theoretical papers are also acceptable provided they provide new insight into the principles of protein/peptide structure and function.