Skip to main content
Log in

Efficient methods for multiple sequence alignment with guaranteed error bounds

  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

Multiple string (sequence) alignment is a difficult and important problem in computational biology, where it is central in two related tasks: finding highly conserved subregions or embedded patterns of a set of biological sequences (strings of DNA, RNA or amino acids), and inferring the evolutionary history of a set of taxa from their associated biological sequences. Several precise measures have been proposed for evaluating the goodness of a multiple alignment, but no efficient methods are known which compute the optimal alignment for any of these measures in any but small cases. In this paper, we consider two previously proposed measures, and given two computationaly efficient multiple alignment methods (one for each measure) whose deviation from the optimal value isguaranteed to be less than a factor of two. This is the novel feature of these methods, but the methods have additional virtues as well. For both methods, the guaranteed bounds are much smaller than two when the number of strings is small (1.33 for three strings of any length); for one of the methods we give a related randomized method which is much faster and which gives, with high probability, multiple alignments with fairly small error bounds; and for the other measure, the method given yields a non-obviouslower bound on the value of the optimal alignment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Literature

  • Altschul, S. 1989. Gap costs for multiple alignment.J. theor. Biol. 138, 297–309.

    MathSciNet  Google Scholar 

  • Altschul, S. and D. Lipman. 1989. Trees, stars, and multiple sequence alignment.SIAM J. appl. Math 49, 197–209.

    Article  MATH  MathSciNet  Google Scholar 

  • Argos, P. and M. Vingren. 1990. Sensitivity comparisons of protein amino acid sequences. InMethods in Enzymology. Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences, R. Doolittle (Ed.), Vol. 183, pp. 352–365. New York: Academic Press.

    Google Scholar 

  • Bacon, D. and W. Anderson. 1986. Multiple sequence alignment.J. molec. Biol. 191, 153–161.

    Article  Google Scholar 

  • Carrillo, H. and D. Lipman. 1988. The multiple sequence alignment problem in biology.SIAM J. appl. Math 48, 1073–1082.

    Article  MATH  MathSciNet  Google Scholar 

  • Doolittle, R. 1986.Of Urfs and Orfs: A Primer on How to Analyze Derived Amino Acid Sequences. Mill Valley, CA: University Science Books.

    Google Scholar 

  • Feng, D. and R. Doolittle. 1987. Progressive sequence alignment as a prerequisite to correct phylogenetic trees.J. molec. Evol. 25, 351–360.

    Google Scholar 

  • Gusfield, D. 1984.The Steiner Tree Problem in Phylogeny. Technical report No. 334. Yale University Computer Science Department, September 1984.

  • Johnson, M. and R. Doolittle. 1986. A method for the simultaneous alignment of three or more amino acid sequences.J. molec. Evol. 23, 267–278.

    Article  Google Scholar 

  • Jukes, T. H. and C. R. Cantor. 1969. Evolution of protein molecules. InMammalian Protein Metabolism, H. N. Munro (Ed.), pp. 21–132. New York: Academic Press.

    Google Scholar 

  • Kou, L., G. Markowsky and L. Berman. 1981. A fast algorithm for steiner trees.ACTA Informatica 15.

  • Lipman, D., S. Altshul and J. Kececioglu. 1980. A tool for multiple sequence alignment.Proc. natn. Acad. Sci., U.S.A. 86, 4412–4415.

    Article  Google Scholar 

  • Murata, M., J. Richardson and Joel Sussman. 1985. Simultaneous comparison of three protein sequences.Proc. natn. Acad. Sci., U.S.A.,82, 3073–3077.

    Article  Google Scholar 

  • Sankoff, D. and R. Cedergren. 1983. Simultaneous comparisons of three or more sequences related by a tree. InTime Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, D. Sankoff and J. Kruskal (Eds), pp. 253–264. Reading, MA: Addison Wesley.

    Google Scholar 

  • Sankoff, D. and J. Kruskal (Eds). 1983.Time Warps, String Edits, and Macromolecules: the Theory and Pactice of Sequence Comparison. Reading, MA: Addison Wesley.

    Google Scholar 

  • Schuler, G. D., S. F. Altschul and D. J. Lipman. 1992. A workbench for multiple alignment construction and analysis. InProteins: Structure, Function and Genetics, in press.

  • Schwarz, R. and M. Dayhoff. 1979. Matrices for detecting distant relationships. InAtlas of Protein Sequences, M. Dayhoff (Ed.), pp. 353–358. National Biomedical Research Foundation.

  • Tarjan, R. E. 1983.Data Structures and Network Algorithms. Philadelphia, PA: SIAM.

    Google Scholar 

  • Waterman, M. 1986. Multiple sequence alignment by consensus.Nucleic Acids Res. 14, 9095–9102.

    MathSciNet  Google Scholar 

  • Wong, R. 1980. Worst-case analysis of network design problem heuristics.SIAM J. Algebraic Discrete Methods 1, 51–63.

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gusfield, D. Efficient methods for multiple sequence alignment with guaranteed error bounds. Bltn Mathcal Biology 55, 141–154 (1993). https://doi.org/10.1007/BF02460299

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02460299

Keywords

Navigation