Predicting Protein Solubility with a Hybrid Approach by Pseudo Amino Acid Composition
Abstract:Protein solubility plays a major role for understanding the crystal growth and crystallization process of protein. How to predict the propensity of a protein to be soluble or to form inclusion body is a long but not fairly resolved problem. After choosing almost 10,000 protein sequences from NCBI database and eliminating the sequences with 90% homologous similarity by CD-HIT, 5692 sequences remained. By using Chou's pseudo amino acid composition features, we predict the soluble protein with the three methods: support vector machine (SVM), back propagation neural network (BP Neural Network) and hybrid method based on SVM and BP Neural Network, respectively. Each method is evaluated by the re-substitution test and 10-fold cross-validation test. In the re-substitution test, the BP Neural Network performs with the best results, in which the accuracy achieves 92.88% and Matthews Correlation Coefficient (MCC) achieves 0.8513. Meanwhile, the other two methods are better than BP Neural Network in 10-fold cross-validation test. The hybrid method based on SVM and BP Neural Network is the best. The average accuracy is 86.78% and average MCC is 0.7233. Although all of the three methods achieve considerable evaluations, the hybrid method is deemed to be the best, according to the performance comparison.
Keywords: Alanine; Amino acid composition; Arg residues; Arginine; Artificial Neural Network; Asparagine; Aspartic acid; CD-HIT; Chou's pseudo amino acid; Cysteine; DNA-binding proteins; Escherichia Coli; GalNAc-transferase; Glutamic acid; Glutamine, Histidine; Glycine; Isoleucine; Leucine; Lysine; Matthews Correlation Coefficient; NCBI database; Phenylalanine; Proline; Serine; Threonine; Valine; back propagation neural network; cross validation test; cysteine fraction; human papillomaviruses; hybrid approach; hybrid method; jackknife test; methionine; neural network; prediction; proline fraction; protein solubility; serine hydrolases; support vector machine
Document Type: Research Article
Publication date: 2010-12-01
- Protein & Peptide Letters publishes short papers in all important aspects of protein and peptide research, including structural studies, recombinant expression, function, synthesis, enzymology, immunology, molecular modeling, drug design etc. Manuscripts must have a significant element of novelty, timeliness and urgency that merit rapid publication. Reports of crystallisation, and preliminary structure determinations of biologically important proteins are acceptable. Purely theoretical papers are also acceptable provided they provide new insight into the principles of protein/peptide structure and function.