An Alignment-Free Method for Classification of Protein Sequences

Authors: Sandeep Deshmukh,; Sanjeet Khaitan,; Debasish Das,; Manish Gupta,; Pramod P. Wangikar,

Source: Protein and Peptide Letters, Volume 14, Number 7, July 2007 , pp. 647-657(11)

Publisher: Bentham Science Publishers

Buy & download fulltext article:

OR

Price: $63.10 plus tax (Refund Policy)

Abstract:

Protein sequences vary in their length and are not readily amenable to conventional data mining techniques that need mapping in a fixed dimensional space. Thus, majority of the current methods for protein sequence classification are based on alignment of the query sequence either with a sequence or a profile of the sequence family. We present a method for mapping of protein sequences in a fixed dimensional descriptor space. The descriptors such as amino acid content and amino acid pair association rules were used along with routinely available classification methods. An experiment on one hundred Pfam families showed classification accuracy of 98% with support vector machines classifier. Information gain based feature selection helped simplify the model and improve accuracy. Interestingly, a large number of the selected features were based on the association rules of Glycine or Aspartic acid residues suggesting their role in the conserved loops among evolutionarily related proteins. Further, in another experiment, the approach was tested for classification of proteins from 39 Pfam families of protein kinases. Support vector machines classifier provided an accuracy of approximately 96%. The method provides an alternative to conventional profile based methods for protein sequence classification.

More about this publication?
  • Protein & Peptide Letters publishes short papers in all important aspects of protein and peptide research, including structural studies, recombinant expression, function, synthesis, enzymology, immunology, molecular modeling, drug design etc. Manuscripts must have a significant element of novelty, timeliness and urgency that merit rapid publication. Reports of crystallisation, and preliminary structure determinations of biologically important proteins are acceptable. Purely theoretical papers are also acceptable provided they provide new insight into the principles of protein/peptide structure and function.
Related content

Tools

Key

Free Content
Free content
New Content
New content
Open Access Content
Open access content
Subscribed Content
Subscribed content
Free Trial Content
Free trial content

Text size:

A | A | A | A
Share this item with others: These icons link to social bookmarking sites where readers can share and discover new web pages. print icon Print this page