Skip to main content

Predicting Protein Solubility with a Hybrid Approach by Pseudo Amino Acid Composition

Buy Article:

$63.00 plus tax (Refund Policy)


Protein solubility plays a major role for understanding the crystal growth and crystallization process of protein. How to predict the propensity of a protein to be soluble or to form inclusion body is a long but not fairly resolved problem. After choosing almost 10,000 protein sequences from NCBI database and eliminating the sequences with 90% homologous similarity by CD-HIT, 5692 sequences remained. By using Chou's pseudo amino acid composition features, we predict the soluble protein with the three methods: support vector machine (SVM), back propagation neural network (BP Neural Network) and hybrid method based on SVM and BP Neural Network, respectively. Each method is evaluated by the re-substitution test and 10-fold cross-validation test. In the re-substitution test, the BP Neural Network performs with the best results, in which the accuracy achieves 92.88% and Matthews Correlation Coefficient (MCC) achieves 0.8513. Meanwhile, the other two methods are better than BP Neural Network in 10-fold cross-validation test. The hybrid method based on SVM and BP Neural Network is the best. The average accuracy is 86.78% and average MCC is 0.7233. Although all of the three methods achieve considerable evaluations, the hybrid method is deemed to be the best, according to the performance comparison.

Keywords: Alanine; Amino acid composition; Arg residues; Arginine; Artificial Neural Network; Asparagine; Aspartic acid; CD-HIT; Chou's pseudo amino acid; Cysteine; DNA-binding proteins; Escherichia Coli; GalNAc-transferase; Glutamic acid; Glutamine, Histidine; Glycine; Isoleucine; Leucine; Lysine; Matthews Correlation Coefficient; NCBI database; Phenylalanine; Proline; Serine; Threonine; Valine; back propagation neural network; cross validation test; cysteine fraction; human papillomaviruses; hybrid approach; hybrid method; jackknife test; methionine; neural network; prediction; proline fraction; protein solubility; serine hydrolases; support vector machine

Document Type: Research Article

Publication date: 2010-12-01

More about this publication?
  • Protein & Peptide Letters publishes short papers in all important aspects of protein and peptide research, including structural studies, recombinant expression, function, synthesis, enzymology, immunology, molecular modeling, drug design etc. Manuscripts must have a significant element of novelty, timeliness and urgency that merit rapid publication. Reports of crystallisation, and preliminary structure determinations of biologically important proteins are acceptable. Purely theoretical papers are also acceptable provided they provide new insight into the principles of protein/peptide structure and function.
  • Access Key
  • Free content
  • Partial Free content
  • New content
  • Open access content
  • Partial Open access content
  • Subscribed content
  • Partial Subscribed content
  • Free trial content
Cookie Policy
Cookie Policy
Ingenta Connect website makes use of cookies so as to keep track of data that you have filled in. I am Happy with this Find out more