Interconnection Between the Protein Solubility and Amino Acid and Dipeptide Compositions
Obtaining soluble proteins in sufficient concentrations helps increase the overall success rate in various experimental studies. Protein solubility is an individual trait ultimately determined by its primary protein sequence. Exploring the interconnection between the protein solubility and the compositions of protein sequence is instrumental for setting priorities on targets in large scale proteomics projects. In this paper, amino acid composition (20 dimensions) and the dipeptide composition (400 dimensions) were extracted to form the total candidate feature pool (420 dimensions), and each feature was selected into the feature vectors one by one, which were sorted by the absolute value of the correlation coefficient. Finally, we evaluated and recorded the 420 results of Support Vector Machine (SVM) as the prediction engine. According to the results of SVM, the first 208 features were chosen from the 420 dimensions, which were considered as the efficient ones. By analyzing the composition of the former 208 features, we found that the protein solubility was significantly influenced by the occurrence frequencies of the acidic amino acids, basic amino acids, non-polar hydrophobic amino acids and the two polar neutral amino acids(C, Q) in the protein sequences. Additionally, we detected that the dipeptides composed by the acidic amino acids (D, E) and basic amino acids (K, R and H), especially the dipeptide composed by the acidic amino acids (D, E), had strong interconnection with the protein solubility.
No Supplementary Data
Document Type: Research Article
Publication date: 2013-01-01
More about this publication?
- Protein & Peptide Letters publishes short papers in all important aspects of protein and peptide research, including structural studies, recombinant expression, function, synthesis, enzymology, immunology, molecular modeling, drug design etc. Manuscripts must have a significant element of novelty, timeliness and urgency that merit rapid publication. Reports of crystallisation, and preliminary structure determinations of biologically important proteins are acceptable. Purely theoretical papers are also acceptable provided they provide new insight into the principles of protein/peptide structure and function.