Discrimination of Thermophilic and Mesophilic Proteins Using Reduced Amino Acid Alphabets with n-Grams
Protein thermostabilization has been the focus of recent research due to growing interest in the production of enzymes that can operate at temperatures that are industrially beneficial. Understanding the determinants of thermostabilization at the level of sequence and structure is important to design such enzymes. A bioinformatical approach was used to determine the extent by which reduced amino acid alphabets (RAAA) with n-grams (subsequences of length n) that were subjected to a t-test-based feature selection procedure can be used to discriminate proteins from thermophiles and mesophiles. Classification performance of 65 different protein alphabets with 3 different n-gram sizes was systematically evaluated using support vector machines in a test set that contained 707 proteins from mesophilic Xylella fastidosa and thermophilic Aquifex aeolicus. A classification accuracy of 91.796% was achieved with Hsdm16 RAAA with 13 features: EK-ILV-ST-A-G-F-H-Q-N-R-M-W-Y. The t-test-based feature selection procedure reduced the classification time without significantly affecting classification accuracy. The overall combination of methods in this paper is useful and computationally fast for classifying protein sequences from thermophiles and mesophiles using sequence information alone.
No Supplementary Data
No Article Media
Document Type: Research Article
Affiliations: Biological Sciences and Bioengineering, Sabanci University, Orhanli, Tuzla, Istanbul, Turkey.
Publication date: 01 June 2012
More about this publication?
- Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth reviews written by leaders in the field, covering a wide range of the integration of biology with computer and information science.
The journal focuses on reviews on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.
Current Bioinformatics is an essential journal for all academic and industrial researchers who want expert knowledge on all major advances in bioinformatics.