CRYSpred: Accurate Sequence-Based Protein Crystallization Propensity Prediction Using Sequence-Derived Structural Characteristics
Abstract:Relatively low success rates of X-ray crystallography, which is the most popular method for solving proteins structures, motivate development of novel methods that support selection of tractable protein targets. This aspect is particularly important in the context of the current structural genomics efforts that allow for a certain degree of flexibility in the target selection. We propose CRYSpred, a novel in-silico crystallization propensity predictor that uses a set of 15 novel features which utilize a broad range of inputs including charge, hydrophobicity, and amino acid composition derived from the protein chain, and the solvent accessibility and disorder predicted from the protein sequence. Our method outperforms seven modern crystallization propensity predictors on three, independent from training dataset, benchmark test datasets. The strong predictive performance offered by the CRYSpred is attributed to the careful design of the features, utilization of the comprehensive set of inputs, and the usage of the Support Vector Machine classifier. The inputs utilized by CRYSpred are well-aligned with the existing rules-of-thumb that are used in the structural genomics studies.
Keywords: CRYSpred; MCC; PDB; PSI-BLAST; ROC; SG; TargetDB; X-ray crystallography; benchmark test datasets; crystallization; crystallization propensity prediction; protein structure; rules-of-thumb; structural genomics; target selection
Document Type: Research Article
Publication date: January 1, 2012
- Protein & Peptide Letters publishes short papers in all important aspects of protein and peptide research, including structural studies, recombinant expression, function, synthesis, enzymology, immunology, molecular modeling, drug design etc. Manuscripts must have a significant element of novelty, timeliness and urgency that merit rapid publication. Reports of crystallisation, and preliminary structure determinations of biologically important proteins are acceptable. Purely theoretical papers are also acceptable provided they provide new insight into the principles of protein/peptide structure and function.