Large Scale Protein Sequence Clustering - Not Solved But Solvable

Author: Krause, Antje

Source: Current Bioinformatics, Volume 1, Number 2, May 2006 , pp. 247-254(8)

Publisher: Bentham Science Publishers

Buy & download fulltext article:

OR

Price: $62.88 plus tax (Refund Policy)

Abstract:

Protein sequence clustering is one of the oldest problems addressed in the field of computational biology. Back in the 60s, when the first protein sequence database was published as printed version, Margaret Dayhoff defined the basic principles of this discipline with only a small number of sequences at hand. With up to a million sequences available in public databases nowadays and several well known methods for automatic grouping of proteins into somehow biologically meaningful families, subfamilies and superfamilies, the problem seems to be satisfactorily solved. Nevertheless, apart from the problem of handling such a huge amount of data, several pitfalls have emerged since Dayhoff's times: databases fill up as fast as genomes are sequenced and a great many of these sequences are fragmental or disappear again when identified as being transcripts of wrongly predicted genes or hypothetical products of pseudogenes. This article first reviews the different approaches developed during the last decades. These insights will then be used to point out possible challenges waiting in the future.

Keywords: Protein sequence; protein domain; protein family; protein sequence clustering; protein sequence database

Document Type: Research article

DOI: http://dx.doi.org/10.2174/157489306777011987

Affiliations: 1: Department of Bioinformatics, Technical University of Applied Sciences Wildau, Bahnhofstr., 15745 Wildau, Germany.

Publication date: 2006-05-01

More about this publication?
  • Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth reviews written by leaders in the field, covering a wide range of the integration of biology with computer and information science.

    The journal focuses on reviews on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.

    Current Bioinformatics is an essential journal for all academic and industrial researchers who want expert knowledge on all major advances in bioinformatics.
Related content

Tools

Key

Free Content
Free content
New Content
New content
Open Access Content
Open access content
Subscribed Content
Subscribed content
Free Trial Content
Free trial content

Text size:

A | A | A | A
Share this item with others: These icons link to social bookmarking sites where readers can share and discover new web pages. print icon Print this page