Skip to main content

Prediction of Enzyme Subfamily Class via Pseudo Amino Acid Composition by Incorporating the Conjoint Triad Feature

Buy Article:

$63.00 plus tax (Refund Policy)


Predicting enzyme subfamily class is an imbalance multi-class classification problem due to the fact that the number of proteins in each subfamily makes a great difference. In this paper, we focus on developing the computational methods specially designed for the imbalance multi-class classification problem to predict enzyme subfamily class. We compare two support vector machine (SVM)-based methods for the imbalance problem, AdaBoost algorithm with RBFSVM (SVM with RBF kernel) and SVM with arithmetic mean (AM) offset (AM-SVM) in enzyme subfamily classification. As input features for our predictive model, we use the conjoint triad feature (CTF). We validate two methods on an enzyme benchmark dataset, which contains six enzyme main families with a total of thirty-four subfamily classes, and those proteins have less than 40% sequence identity to any other in a same functional class. In predicting oxidoreductases subfamilies, AM-SVM obtains the over 0.92 Matthew's correlation coefficient (MCC) and over 93% accuracy, and in predicting lyases, isomerases and ligases subfamilies, it obtains over 0.73 MCC and over 82% accuracy. The improvement in the predictive performance suggests the AM-SVM might play a complementary role to the existing function annotation methods.

Keywords: (AAC); (EC); (MCC); (NPPC); (PPI); (SVM); AdaBoostSVM; Enzyme subfamily class prediction; Enzymes; GalNAc-transferase; HIV protease; PseAAC; RBF; RBFSVM; conjoint triad; conjoint triad feature; grey theory; imbalance problem; jackknife test; subfamily; support; support vector machine

Document Type: Research Article

Publication date: 2010-11-01

More about this publication?
  • Protein & Peptide Letters publishes short papers in all important aspects of protein and peptide research, including structural studies, recombinant expression, function, synthesis, enzymology, immunology, molecular modeling, drug design etc. Manuscripts must have a significant element of novelty, timeliness and urgency that merit rapid publication. Reports of crystallisation, and preliminary structure determinations of biologically important proteins are acceptable. Purely theoretical papers are also acceptable provided they provide new insight into the principles of protein/peptide structure and function.
  • Access Key
  • Free ContentFree content
  • Partial Free ContentPartial Free content
  • New ContentNew content
  • Open Access ContentOpen access content
  • Partial Open Access ContentPartial Open access content
  • Subscribed ContentSubscribed content
  • Partial Subscribed ContentPartial Subscribed content
  • Free Trial ContentFree trial content
Cookie Policy
Cookie Policy
Ingenta Connect website makes use of cookies so as to keep track of data that you have filled in. I am Happy with this Find out more