Loose and Strict Repeats in Weighted Sequences of Proteins
Abstract:A weighted sequence is a string in which a set of characters may appear at each position with respective probabilities of occurrence. Weighted sequences are able to summarize poorly defined short sequences, as well as the profiles of protein families and complete chromosome sequences. Thus it is of biological and theoretical significance to design powerful algorithms on weighted sequences. A common task is to identify repetitive motifs in weighted sequences, with presence probability not less than a given threshold. We define two types of repeats in weighted sequences, called the loose repeats and the strict repeats, respectively, and then attempt to locate these repeats. Using an iterative partitioning technique, we present algorithms for computing all the loose repeats and strict repeats of every length, respectively. Each solution costs O(n2)time.
Document Type: Research Article
Publication date: September 1, 2010
More about this publication?
- Protein & Peptide Letters publishes short papers in all important aspects of protein and peptide research, including structural studies, recombinant expression, function, synthesis, enzymology, immunology, molecular modeling, drug design etc. Manuscripts must have a significant element of novelty, timeliness and urgency that merit rapid publication. Reports of crystallisation, and preliminary structure determinations of biologically important proteins are acceptable. Purely theoretical papers are also acceptable provided they provide new insight into the principles of protein/peptide structure and function.