Note to users. If you're seeing this message, it means that your browser cannot find this page's style/presentation instructions -- or possibly that you are using a browser that does not support current Web standards. Find out more about why this message is appearing, and what you can do to make your experience of our site the best it can be.


Science 8 October 1993:
Vol. 262. no. 5131, pp. 208 - 214
DOI: 10.1126/science.8211139

Articles

Science, Vol 262, Issue 5131, 208-214
Copyright © 1993 by American Association for the Advancement of Science


articles

Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment

CE Lawrence, SF Altschul, MS Boguski, JS Liu, AF Neuwald, and JC Wootton

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894.

A wealth of protein and DNA sequence data is being generated by genome projects and other sequencing efforts. A crucial barrier to deciphering these sequences and understanding the relations among them is the difficulty of detecting subtle local residue patterns common to multiple sequences. Such patterns frequently reflect similar molecular structures and biological properties. A mathematical definition of this "local multiple alignment" problem suitable for full computer automation has been used to develop a new and sensitive algorithm, based on the statistical method of iterative sampling. This algorithm finds an optimized local alignment model for N sequences in N-linear time, requiring only seconds on current workstations, and allows the simultaneous detection and optimization of multiple patterns and pattern repeats. The method is illustrated as applied to helix-turn-helix proteins, lipocalins, and prenyltransferases.


THIS ARTICLE HAS BEEN CITED BY OTHER ARTICLES:
Accurate recognition of cis-regulatory motifs with the correct lengths in prokaryotic genomes.
G. Li, B. Liu, and Y. Xu (2009)
Nucleic Acids Res.
   Abstract »    Full Text »    PDF »
info-gibbs: a motif discovery algorithm that directly optimizes information content during sampling.
M. Defrance and J. van Helden (2009)
Bioinformatics 25, 2715-2722
   Abstract »    Full Text »    PDF »
Mechanisms and Evolution of Control Logic in Prokaryotic Transcriptional Regulation.
S. A. F. T. van Hijum, M. H. Medema, and O. P. Kuipers (2009)
Microbiol. Mol. Biol. Rev. 73, 481-509
   Abstract »    Full Text »    PDF »
Finding sequence motifs in prokaryotic genomes--a brief practical guide for a microbiologist.
J. Mrazek (2009)
Brief Bioinform 10, 525-536
   Abstract »    Full Text »    PDF »
KIRMES: kernel-based identification of regulatory modules in euchromatic sequences.
S. J. Schultheiss, W. Busch, J. U. Lohmann, O. Kohlbacher, and G. Ratsch (2009)
Bioinformatics 25, 2126-2133
   Abstract »    Full Text »    PDF »
How Much Does It Cost?: Optimization of Costs in Sequence Analysis of Social Science Data.
J.-A. Gauthier, E. D. Widmer, P. Bucher, and C. Notredame (2009)
Sociological Methods Research 38, 197-231
   Abstract »    PDF »
Rapid detection, classification and accurate alignment of up to a million or more related protein sequences.
A. F. Neuwald (2009)
Bioinformatics 25, 1869-1875
   Abstract »    Full Text »    PDF »
Identifying regulatory elements in eukaryotic genomes.
L. Narlikar and I. Ovcharenko (2009)
Brief Funct Genomic Proteomic 8, 215-230
   Abstract »    Full Text »    PDF »
Domain Interaction Footprint: a multi-classification approach to predict domain-peptide interactions.
C. Schillinger, P. Boisguerin, and G. Krause (2009)
Bioinformatics 25, 1632-1639
   Abstract »    Full Text »    PDF »
PSI-BLAST pseudocounts and the minimum description length principle.
S. F. Altschul, E. M. Gertz, R. Agarwala, A. A. Schaffer, and Y.-K. Yu (2009)
Nucleic Acids Res. 37, 815-824
   Abstract »    Full Text »    PDF »
Pseudocounts for transcription factor binding sites.
K. Nishida, M. C. Frith, and K. Nakai (2009)
Nucleic Acids Res. 37, 939-944
   Abstract »    Full Text »    PDF »
ARCS-Motif: discovering correlated motifs from unaligned biological sequences.
S. Zhang, W. Su, and J. Yang (2009)
Bioinformatics 25, 183-189
   Abstract »    Full Text »    PDF »
Discovery of phosphorylation motif mixtures in phosphoproteomics data.
A. Ritz, G. Shakhnarovich, A. R. Salomon, and B. J. Raphael (2009)
Bioinformatics 25, 14-21
   Abstract »    Full Text »    PDF »
Evolutionary computation for discovery of composite transcription factor binding sites.
G. B. Fogel, V. W. Porto, G. Varga, E. R. Dow, A. M. Craven, D. M. Powers, H. B. Harlow, E. W. Su, J. E. Onyia, and C. Su (2008)
Nucleic Acids Res. 36, e142
   Abstract »    Full Text »    PDF »
Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training.
V. Ter-Hovhannisyan, A. Lomsadze, Y. O. Chernoff, and M. Borodovsky (2008)
Genome Res. 18, 1979-1990
   Abstract »    Full Text »    PDF »
Position-dependent motif characterization using non-negative matrix factorization.
L. N. Hutchins, S. M. Murphy, P. Singh, and J. H. Graber (2008)
Bioinformatics 24, 2684-2690
   Abstract »    Full Text »    PDF »
Seeder: discriminative seeding DNA motif discovery.
F. Fauteux, M. Blanchette, and M. V. Stromvik (2008)
Bioinformatics 24, 2303-2307
   Abstract »    Full Text »    PDF »
A transdimensional Bayesian model for pattern recognition in DNA sequences.
S. M. Li, J. Wakefield, and S. Self (2008)
Biostat. 9, 668-685
   Abstract »    Full Text »    PDF »
Characteristics and Prediction of RNA Editing Sites in Transcripts of the Moss Takakia lepidozioides Chloroplast.
K. Yura, Y. Miyata, T. Arikawa, M. Higuchi, and M. Sugita (2008)
DNA Res 15, 309-321
   Abstract »    Full Text »    PDF »
Ab initio identification of functionally interacting pairs of cis-regulatory elements.
B. A. Friedman, M. B. Stadler, N. Shomron, Y. Ding, and C. B. Burge (2008)
Genome Res. 18, 1643-1651
   Abstract »    Full Text »    PDF »
GIMSAN: a Gibbs motif finder with significance analysis.
P. Ng and U. Keich (2008)
Bioinformatics 24, 2256-2257
   Abstract »    Full Text »    PDF »
The cis-regulatory map of Shewanella genomes.
J. Liu, X. Xu, and G. D. Stormo (2008)
Nucleic Acids Res. 36, 5376-5390
   Abstract »    Full Text »    PDF »
Efficient representation and P-value computation for high-order Markov motifs.
P. G. S. da Fonseca, K. S. Guimaraes, and M.-F. Sagot (2008)
Bioinformatics 24, i160-i166
   Abstract »    Full Text »    PDF »
Cross-species de novo identification of cis-regulatory modules with GibbsModule: Application to gene regulation in embryonic stem cells.
D. Xie, J. Cai, N.-Y. Chia, H. H. Ng, and S. Zhong (2008)
Genome Res. 18, 1325-1335
   Abstract »    Full Text »    PDF »
Extracting sequence features to predict protein-DNA interactions: a comparative study.
Q. Zhou and J. S. Liu (2008)
Nucleic Acids Res. 36, 4137-4148
   Abstract »    Full Text »    PDF »
W-AlignACE: an improved Gibbs sampling algorithm based on more accurate position weight matrices learned from sequence and gene expression/ChIP-chip data.
X. Chen, L. Guo, Z. Fan, and T. Jiang (2008)
Bioinformatics 24, 1121-1128
   Abstract »    Full Text »    PDF »
AIMIE: a web-based environment for detection and interpretation of significant sequence motifs in prokaryotic genomes.
J. Mrazek, S. Xie, X. Guo, and A. Srivastava (2008)
Bioinformatics 24, 1041-1048
   Abstract »    Full Text »    PDF »
Prediction of Cancer Driver Mutations in Protein Kinases.
A. Torkamani and N. J. Schork (2008)
Cancer Res. 68, 1675-1682
   Abstract »    Full Text »    PDF »
TFBS identification based on genetic algorithm with combined representations and adaptive post-processing.
T.-M. Chan, K.-S. Leung, and K.-H. Lee (2008)
Bioinformatics 24, 341-349
   Abstract »    Full Text »    PDF »
Genome wide screens in yeast to identify potential binding sites and target genes of DNA-binding proteins.
J. Zeng, J. Yan, T. Wang, D. Mosbrook-Davis, K. T. Dolan, R. Christensen, G. D. Stormo, D. Haussler, R. H. Lathrop, R. K. Brachmann, et al. (2008)
Nucleic Acids Res. 36, e8
   Abstract »    Full Text »    PDF »
Integrating quantitative information from ChIP-chip experiments into motif finding.
H. Shim and S. Keles (2008)
Biostat. 9, 51-65
   Abstract »    Full Text »    PDF »
A profile-based deterministic sequential Monte Carlo algorithm for motif discovery.
K.-C. Liang, X. Wang, and D. Anastassiou (2008)
Bioinformatics 24, 46-55
   Abstract »    Full Text »    PDF »
Biclustering as a method for RNA local multiple sequence alignment.
S. Wang, R. R. Gutell, and D. P. Miranker (2007)
Bioinformatics 23, 3289-3296
   Abstract »    Full Text »    PDF »
Modeling the adaptive immune system: predictions and simulations.
C. Lundegaard, O. Lund, C. Kesmir, S. Brunak, and M. Nielsen (2007)
Bioinformatics 23, 3265-3275
   Abstract »    Full Text »    PDF »
Identifying cis-regulatory elements by statistical analysis and phylogenetic footprinting and analyzing their coexistence and related gene ontology.
W. Shi, W. Zhou, and D. Xu (2007)
Physiol Genomics 31, 374-384
   Abstract »    Full Text »    PDF »
Differentiation of core promoter architecture between plants and mammals revealed by LDSS analysis.
Y. Y. Yamamoto, H. Ichida, T. Abe, Y. Suzuki, S. Sugano, and J. Obokata (2007)
Nucleic Acids Res. 35, 6219-6226
   Abstract »    Full Text »    PDF »
C. elegans sequences that control trans-splicing and operon pre-mRNA processing.
J. H. Graber, J. Salisbury, L. N. Hutchins, and T. Blumenthal (2007)
RNA 13, 1409-1426
   Abstract »    Full Text »    PDF »
Melina II: a web tool for comparisons among several predictive algorithms to find potential motifs from promoter regions.
T. Okumura, H. Makiguchi, Y. Makita, R. Yamashita, and K. Nakai (2007)
Nucleic Acids Res. 35, W227-W231
   Abstract »    Full Text »    PDF »
Cross-species microarray analysis with the OSCAR system suggests an INSR->Pax6->NQO1 neuro-protective pathway in aging and Alzheimer's disease.
Y. Lu, X. He, and S. Zhong (2007)
Nucleic Acids Res. 35, W105-W114
   Abstract »    Full Text »    PDF »
Combined experimental and computational approaches to study the regulatory elements in eukaryotic genes.
N. A. Kolchanov, T. I. Merkulova, E. V. Ignatieva, E. A. Ananko, D. Yu. Oshchepkov, V. G. Levitsky, G. V. Vasiliev, N. V. Klimova, V. M. Merkulov, and T. C. Hodgman (2007)
Brief Bioinform
   Abstract »    Full Text »    PDF »
Multiple Controls Regulate the Expression of mobE, an HNH Homing Endonuclease Gene Embedded within a Ribonucleotide Reductase Gene of Phage Aeh1.
E. A. Gibb and D. R. Edgell (2007)
J. Bacteriol. 189, 4648-4661
   Abstract »    Full Text »    PDF »
Nucleotide variation of regulatory motifs may lead to distinct expression patterns.
L. Segal, M. Lapidot, Z. Solan, E. Ruppin, Y. Pilpel, and D. Horn (2007)
Bioinformatics 23, i440-i449
   Abstract »    Full Text »    PDF »
Identification of an OCT4 and SRY regulatory module using integrated computational and experimental genomics approaches.
V. X. Jin, H. O'Geen, S. Iyengar, R. Green, and P. J. Farnham (2007)
Genome Res. 17, 807-817
   Abstract »    Full Text »    PDF »
Detection of DNA structural motifs in functional genomic elements.
J. A. Greenbaum, S. C.J. Parker, and T. D. Tullius (2007)
Genome Res. 17, 940-946
   Abstract »    Full Text »    PDF »
A sequential Monte Carlo EM approach to the transcription factor binding site identification problem.
E. S. Jackson and W. J. Fitzgerald (2007)
Bioinformatics 23, 1313-1320
   Abstract »    Full Text »    PDF »
Genomic characterization of Gli-activator targets in sonic hedgehog-mediated neural patterning.
S. A. Vokes, H. Ji, S. McCuine, T. Tenzen, S. Giles, S. Zhong, W. J. R. Longabaugh, E. H. Davidson, W. H. Wong, and A. P. McMahon (2007)
Development 134, 1977-1989
   Abstract »    Full Text »    PDF »
Connecting protein structure with predictions of regulatory sites.
A. V. Morozov and E. D. Siggia (2007)
PNAS 104, 7068-7073
   Abstract »    Full Text »    PDF »
Position dependencies in transcription factor binding sites.
A. Tomovic and E. J. Oakeley (2007)
Bioinformatics 23, 933-941
   Abstract »    Full Text »    PDF »
Positional clustering improves computational binding site detection and identifies novel cis-regulatory sites in mammalian GABAA receptor subunit genes.
T. E. Reddy, B. E. Shakhnovich, D. S. Roberts, S. J. Russek, and C. DeLisi (2007)
Nucleic Acids Res. 35, e20
   Abstract »    Full Text »    PDF »
Integrating transcription factor binding site information with gene expression datasets.
I. B. Jeffery, S. F. Madden, P. A. McGettigan, G. Perriere, A. C. Culhane, and D. G. Higgins (2007)
Bioinformatics 23, 298-305
   Abstract »    Full Text »    PDF »
Complementary intron sequence motifs associated with human exon repetition: a role for intragenic, inter-transcript interactions in gene expression.
R. J. Dixon, I. C. Eperon, and N. J. Samani (2007)
Bioinformatics 23, 150-155
   Abstract »    Full Text »    PDF »
Systematic variation in mRNA 3'-processing signals during mouse spermatogenesis.
D. Liu, J. M. Brockman, B. Dass, L. N. Hutchins, P. Singh, J. R. McCarrey, C. C. MacDonald, and J. H. Graber (2007)
Nucleic Acids Res. 35, 234-246
   Abstract »    Full Text »    PDF »
SwissRegulon: a database of genome-wide annotations of regulatory sites.
M. Pachkov, I. Erb, N. Molina, and E. van Nimwegen (2007)
Nucleic Acids Res. 35, D127-D131
   Abstract »    Full Text »    PDF »
MUSA: a parameter free algorithm for the identification of biologically significant motifs.
N. D. Mendes, A. C. Casimiro, P. M. Santos, I. Sa-Correia, A. L. Oliveira, and A. T. Freitas (2006)
Bioinformatics 22, 2996-3002
   Abstract »    Full Text »    PDF »
Amino acid residue doublet propensity in the protein-RNA interface and its application to RNA interface prediction.
O. T. P. Kim, K. Yura, and N. Go (2006)
Nucleic Acids Res.
   Abstract »    Full Text »    PDF »
A comparative analysis of genome-wide chromatin immunoprecipitation data for mammalian transcription factors.
H. Ji, S. A. Vokes, and W. H. Wong (2006)
Nucleic Acids Res. 34, e146
   Abstract »    Full Text »    PDF »
Identification of degenerate motifs using position restricted selection and hybrid ranking combination.
C.-H. Peng, J.-T. Hsu, Y.-S. Chung, Y.-J. Lin, W.-Y. Chow, D. F. Hsu, and C. Y. Tang (2006)
Nucleic Acids Res. 34, 6379-6391
   Abstract »    Full Text »    PDF »
Locating mammalian transcription factor binding sites: A survey of computational and experimental techniques.
L. Elnitski, V. X. Jin, P. J. Farnham, and S. J.M. Jones (2006)
Genome Res. 16, 1455-1464
   Abstract »    Full Text »    PDF »
Adding sequence context to a Markov background model improves the identification of regulatory elements.
N.-K. Kim, K. Tharakaraman, and J. L. Spouge (2006)
Bioinformatics 22, 2870-2875
   Abstract »    Full Text »    PDF »
Multiple alignment of protein sequences with repeats and rearrangements.
T. M. Phuong, C. B. Do, R. C. Edgar, and S. Batzoglou (2006)
Nucleic Acids Res. 34, 5932-5942
   Abstract »    Full Text »    PDF »
Bioinformatics-driven, rational engineering of protein thermostability.
M. K. DiTursi, S.-J. Kwon, P. J. Reeder, and J. S. Dordick (2006)
Protein Eng. Des. Sel. 19, 517-524
   Abstract »    Full Text »    PDF »
Using RNA secondary structures to guide sequence motif finding towards single-stranded regions.
M. Hiller, R. Pudimat, A. Busch, and R. Backofen (2006)
Nucleic Acids Res. 34, e117
   Abstract »    Full Text »    PDF »
Finding motifs from all sequences with and without binding sites.
H. C. M. Leung and F. Y. L. Chin (2006)
Bioinformatics 22, 2217-2223
   Abstract »    Full Text »    PDF »
An increased specificity score matrix for the prediction of SF2/ASF-specific exonic splicing enhancers.
P. J. Smith, C. Zhang, J. Wang, S. L. Chew, M. Q. Zhang, and A. R. Krainer (2006)
Hum. Mol. Genet. 15, 2490-2508
   Abstract »    Full Text »    PDF »
Temporal Transcriptomic Analysis as Desulfovibrio vulgaris Hildenborough Transitions into Stationary Phase during Electron Donor Depletion.
M. E. Clark, Q. He, Z. He, K. H. Huang, E. J. Alm, X.-F. Wan, T. C. Hazen, A. P. Arkin, J. D. Wall, J.-Z. Zhou, et al. (2006)
Appl. Envir. Microbiol. 72, 5578-5588
   Abstract »    Full Text »    PDF »
Computational identification of transcriptional regulatory elements in DNA sequence.
D. GuhaThakurta (2006)
Nucleic Acids Res. 34, 3585-3598
   Abstract »    Full Text »    PDF »
Involvement of the Arabidopsis SWI2/SNF2 Chromatin Remodeling Gene Family in DNA Damage Response and Recombination.
H. Shaked, N. Avivi-Ragolsky, and A. A. Levy (2006)
Genetics 173, 985-994
   Abstract »    Full Text »    PDF »
Promoter Analysis of MADS-Box Genes in Eudicots Through Phylogenetic Footprinting.
S. De Bodt, G. Theissen, and Y. Van de Peer (2006)
Mol. Biol. Evol. 23, 1293-1303
   Abstract »    Full Text »    PDF »
Regions of extreme synonymous codon selection in mammalian genes.
P. Schattner and M. Diekhans (2006)
Nucleic Acids Res. 34, 1700-1710
   Abstract »    Full Text »    PDF »
Statistical and Bayesian approaches to RNA secondary structure prediction..
Y. DING (2006)
RNA 12, 323-331
   Abstract »    Full Text »    PDF »
Relaxed template specificity in fowl adenovirus 1 DNA replication initiation..
H. J. Rademaker, F. J. Fallaux, D. J. M. Van den Wollenberg, R. N. De Jong, P. C. Van der Vliet, and R. C. Hoeben (2006)
J. Gen. Virol. 87, 553-562
   Abstract »    Full Text »    PDF »
Statistical significance in biological sequence analysis.
A. Yu. Mitrophanov and M. Borodovsky (2006)
Brief Bioinform 7, 2-24
Bioinformatics of alternative splicing and its regulation.
L. Florea (2006)
Brief Bioinform 7, 55-69
   Abstract »    Full Text »    PDF »



To Advertise     Find Products


Science. ISSN 0036-8075 (print), 1095-9203 (online)