Sequence-Based Prediction of Enzyme Thermostability Through Bioinformatics Algorithms
Predicting the thermostability of a biomolecule, given its sequence, is one of the big challenges of protein engineering and developing tools to screen thermostable mutants is of great interest. Here we used various screening, clustering, decision tree and generalized rule induction models to search for patterns of thermostability. Arg was solely found as N-terminal amino acid in proteins at temperatures higher than 70°C. Fifty-four protein features were important in feature selection, and the number of peer groups (anomaly index 2.12) declined from 7 to 2 with selected features; no changes were found in K-Means and TwoStep clusters with/without feature selection filtering. Tree depths of decision tree models varied from 14 (in C5.0 with 10-fold cross-validation and with feature selection) to 4 (in CHAID) branches and C5.0 was the best and the Quest model was the worst. No significant difference in the performance of various decision tree models was found with/without feature selection, but the number of peer groups in clustering models was reduced significantly (p<0.05). The frequency of Gln was the most important feature in decision tree rules and for all association rules in antecedent to support the rules. The importance of Gln in protein thermostability is discussed in this paper.
No Supplementary Data
No Article Media
Document Type: Research Article
Publication date: September 1, 2010
More about this publication?
- Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth reviews written by leaders in the field, covering a wide range of the integration of biology with computer and information science.
The journal focuses on reviews on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.
Current Bioinformatics is an essential journal for all academic and industrial researchers who want expert knowledge on all major advances in bioinformatics.
- Editorial Board
- Information for Authors
- Subscribe to this Title
- Ingenta Connect is not responsible for the content or availability of external websites