Related Content
Search Google Scholar for:
More Information
Related Jobs from ScienceCareers
|
|
Science 20 March 1998: Vol. 279. no. 5358, p. 1827 DOI: 10.1126/science.279.5358.1827a
|
|
Technical Comments
Patterns of Genome Organization in Bacteria
Frederick R. Blattner et al. (1),
when describing the complete sequence of the Escherichia
coli chromosome, correlated an overall DNA property, "GC skew"
[the quantity (G C)/(G + C) averaged over a sliding window of
arbitrary length 10 kb] with the direction of DNA replication. GC skew
for replichore 1 (rightwards from the origin on the presented strand)
oscillates considerably, yet remains almost entirely positive for its
entire length, while replichore 2 shows the opposite behavior. Kunst
et al. (2) did not present such an analysis for
the sequence of the Bacillus subtilis chromosome, but did
note that the GC skew changes sign at the origin, an observation made
earlier by Lobry (3), who documented it for the replication
origins of E. coli, Haemophilus influenzae,
B. subtilis, and Mycoplasma genitalium and for
the terminus of H. influenzae.
In contrast to GC skew, which is a derivative function of the
base composition along a DNA sequence, we have computed three integral
functions of the sequences of nine complete prokaryotic genomes (Table
1). Composite graphs for three of these genomes are
presented (Fig. 1), and the remainder are available on
a linked website. We define "purine excess" as the sum of all
purines minus the sum of all pyrimidines encountered in a walk along
the sequence up to the point plotted (4). "Keto excess"
is the same function calculated for the keto bases (GT) minus the amino
bases (AC), and "coding-strand excess" is the sum of all
nucleotides encountered along the sequence that are in coding
sequences, minus those that have complements (on the
opposite strand) that are in coding sequences; bases in non-coding
regions add zero to this sum. Graphs of these functions reveal
nonrandom patterns, the most striking of which is the clear correlation
between purine excess and the origins and termini of DNA replication
(Fig. 1). In every case where independent information is
available, the minimum in the purine-excess curve corresponds to the
origin (Table 1). We suggest that this regularity may hold for most
prokaryotic genomes. Conversely, the maxima of the purine-excess curves
(Fig. 1) correlate strongly with known or suspected replication termini
(5). Keto-excess curves reflect the same correlation,
although for most genomes the minima and maxima (thus, predicted
origins and termini) are not as sharply defined as for the
purine-excess functions. Haemophilus influenzae represents a
notable exception to this rule (compare the keto-excess curve in Fig.
1B).
Table 1.
Completely-sequenced bacterial genomes analyzed for
base and coding asymmetries and their origins and termini of
replication.
|
| Species |
Length (Mbp) |
Origin
(bp) |
Ref. |
Terminus (bp) |
Ref. |
|
| Escherichia
coli |
4.64 |
3,923,500 |
(1) |
1,588,800 |
(18) |
| Bacillus
subtilis |
4.21 |
1 |
(2) |
2,017,000 |
(2) |
| Mycoplasma
pneumoniae |
0.82 |
205,000 |
(19) |
n.a.* |
|
| Mycoplasma
genitalium |
0.58 |
1 |
(3, 20) |
n.a. |
| Helicobacter
pylori |
1.67 |
1 |
(21) |
n.a. |
| Haemophilus
influenzae |
1.83 |
603,000 |
(3, 16) |
1,518,000 |
(3, 16) |
| Synechocystis
PCC6803 |
3.57 |
1,351,000? |
(22) |
n.a. |
| Methanococcus
jannaschii |
1.66 |
n.a. |
|
n.a. |
| Methanobacterium
thermoautotrophicum |
1.75 |
n.a. |
|
n.a. |
|
|
*
Data not available.
|
|
Fig. 1.
Purine excess (blue curves), keto excess (red
curves), and coding-strand excess (green curves) for the complete
genomes of (A) E. coli (1),
(B) H. influenzae (16), and
(C) M. jannaschii (17).
Known origins and termini of replication are marked. Abscissa
represents the genomic sequence position from the beginning to the end
of the genome; left ordinate represents the count of purine and keto
excesses; right ordinate represents the Watson coding-strand excess
count at a given position. Green histograms across the bottom of each
graph display the correlation coefficients between purine excess and
coding-strand excess for-25 kb windows. Click on each image to enlarge.
Graphs of six additional genomes (Table 1) can be viewed on the Web at
http://bmerc-www.bu.edu/genomeplot/.
Other genome features stand out in these graphs. The relatively
smooth, featureless curve for E. coli contrasts with the
much rougher patterns displayed by H. influenzae and
Synechocystis PCC6803 (see linked website for data). This
likely reflects a greater tendency of the latter organisms to take up
foreign DNA and integrate it into the chromosome (6, 7), a
point supported by the correlation of the density of DNA-uptake
sequences in H. influenzae (6) with many of the
inflection points of the purine-excess curve (8). Likewise,
the sites of µ prophage integration in H. influenzae
cluster most densely around the pronounced minimum in the
purine-excess curve adjacent to the terminus (Fig.
1B). The larger megaplasmid (pNGR234a) of
Rhizobium sp. NGR234 also displays similar behavior
(8), in keeping with its recognized characteristics as a
"transposon trap" (9).
Examination of the relationship between base-composition and coding
asymmetries at the whole-genome level shows close parallels between
coding-strand and purine excess for seven out of nine genomes. E. coli shows typical behavior (Fig. 1A). Haemophilus influenzae and Synechocystis display much weaker
correlations on this scale. At a finer level of detail, there are
substantial correlations between these functions for all the genomes we
studied, but the results for the two archaebacteria, M. jannaschii and M. thermoautotrophicum, are particularly
striking (Fig. 1C), showing strong correspondence
between coding-strand and purine excess.
What forces might give rise to the long-range patterns of strand
asymmetry in bacterial genomes? There is a prominent correlation between purine excess and replication direction, which suggests as
an explanation asymmetrical errors in DNA synthesis. In the absence
of transpositions and insertions, a bias favors accumulation of purines
in the leading strand. However, this contradicts expectations that
lagging strand synthesis should be more error-prone
(10), and thus that most purine substitutions (the principal
cause of transversions) should occur there. Francino and Ochman
(11) have argued, on the other hand, that
transcriptional effects can account for DNA strand asymmetry
because transcription-coupled repair will remove the most frequent
types of DNA damage (deaminated cytosines and pyrimidine-dimers),
thereby reducing harmful mutations. This only occurs on the transcribed
(that is, template) strand, which therefore will become
pyrimidine-rich. In addition, the template strand is significantly
protected against DNA damage during transcription, whereas the coding
strand is exposed. Under this model, evolutionary selection should
increase the less mutationally vulnerable purine content of the coding
strand.
Mycoplasma genitalium conforms to the predictions of the
transcription-coupled repair model particularly well: in replichore 1, 85% of the open reading frames (ORFs) correspond to the presented (purine-rich) strand up to the putative terminus (maximum in the purine-excess curve). For the other replichore, 77% of the ORFs occur
in the complementary strand. In E. coli, strand preference is less pronounced: only 55% of the genes are aligned with the replication direction (1). However, Francino has analyzed
the codon adaptation index (CAI), a measure strongly associated with the extent of gene expression in E. coli, and finds that
74% of the genes with CAI 0.5 and 84% of those with CAI 0.6 are
situated on the leading strand (11), that is, with the
direction of transcription the same as replication (12). In
addition to favoring transcriptional repair, a major advantage to this
arrangement is that head-on collisions between replication and
transcription complexes will be reduced (13).
Functions like those described here promise to be revealing tools for
whole-genome analysis (4). For example, in the absence of
any other information, the global minimum of the purine
excess locates the probable origin of replication, and its maximum is the likely terminus for prokaryotic genomes. Similar regularities may
emerge from the impending deluge of eukaryotic DNA sequences. We have
already shown that the patterns of purine-excess plots correlate well
with phylogenetic position for mitochondrial DNAs (14), and
graphs of coding-strand excess in the Saccharomyces cerevisiae genome tend to match the purine-excess curves
(15).
James M. Freeman
Biomolecular Engineering Research Center, Boston
University, Boston, MA 02215, USA
Thomas N. Plasterer
Department of Pharmacology, Boston University
Temple F. Smith
Biomolecular Engineering Research Center, Boston University
Scott C. Mohr
Department of Chemistry, Boston University
REFERENCES AND NOTES
-
F. R. Blattner
et al.,
Science
277,
p. 1453
(1997)
[Abstract/Free Full Text].
-
F. Kunst
et al.,
Nature
390,
249
(1997)
[CrossRef] [Medline]
.
-
J. R. Lobry,
Mol. Biol. Evol.
13,
650
(1996)
[Abstract]
; Science 272, 745 (1996); Biochimie
78, 323 (1996).
-
Purine excess:
(l) = [ A,S + G,S T,S C,S], where S is the base present at the current
sequence position (l), the sum is performed over the range 1 to l, and
X,Y = 1 if X = Y; and 0 if X Y. Interchanging the A
and T subscripts in this equation defines the keto excess. A DNA
sequence can be uniquely described as a walk through a
three-dimensional vector space, defined by two orthogonal axes for the
two types of base pair and a third perpendicular axis, that repre-
sents the sequence position (3) (scheme). An A in the
sequence corresponds to movement in the positive x
direction and a T to the opposite. G and C are mapped by
analogous steps along the y axis and sequence position
increases along z.
For example, starting
at the origin of such a coordinate system, if the first base
encountered is G, then the vector trace generates the point (0, +1,
+1), where the indices are the usual Cartesian coordinates. If the
second base is A, the trace extends to (+1, +1, +2), and so forth. The
trace corresponding to GAATTTC continues on
through (+2, +1, +3), (+1, +1, +4), (0, +1, +5), and ( 1, +1, +6) to
( 1, 0, +7). Negative values of sequence position can also be used,
which allows the origin to correspond to any convenient point in the
sequence. As indicated by the scheme, the purine-excess and keto-excess
functions that we have graphed for the nine prokaryotic genomes consist
of steps along one or the other of two diagonal axes in this sequence
space. Alternatively, the functions can be visualized as projections of
the vector sequence trace onto one or the other of two vertical planes
that cut the base-composition plane along the designated axes.
-
The precise locations of the three known termini (Table 1)
actually fall slightly beyond the maximum of the purine excess curve and they coincide in every known case with the end of a
segment that has a sharply negative slope in the coding-strand excess
curve.
-
H. O. Smith,
J.-F. Tomb,
B. A. Dougherty,
R. D. Fleischmann,
J. C. Venter,
Science
269,
538
(1995)
[Abstract/Free Full Text]
.
-
V. A. Dzelzkalns and
L. Bogorad,
EMBO J.
7,
333
(1988)
[ISI] [Medline]
.
-
J. M. Freeman et al., data not shown.
-
C. Freiberg
et al.,
Nature
387,
394
(1997)
[CrossRef] [Medline]
.
-
If DNA damage to the lagging-strand template
dominates over synthesis errors, however, this conclusion would
be reversed because a purine-rich strand is less vulnerable to damage.
-
M. P. Francino and
H. Ochman,
Trends Genet.
13,
240
(1997)
[CrossRef] [ISI] [Medline]
.
-
In the case of E. coli phage
, the
purine-excess plot has a minimum at the replication origin and a major
dip just previous to it. From the origin, there is a rise that has a
continuous run of ORFs coded on the presented strand (61% of total
ORFs in the genome) that are thus transcribed along the phage's
one-way replication direction. The dip region is coded exclusively on the complementary strand (31% of total ORFs). The other 8% alternate between strands at the start of the dip.
-
B. Liu
et al.,
Nature
366,
33
(1993)
[CrossRef] [Medline]
; Proc. Natl. Acad. Sci. U.S.A. 91, 10660 (1994);
A. M. Deshpande and
C. S. Newlon,
Science
272,
1030
(1996)
[Abstract]
.
-
S. C. Mohr et al., Biol. Bull., in
press (1998).
-
J. Graber et al., unpublished
experiments.
-
R. D. Fleischmann
et al.,
Science
269,
496
(1995)
[Abstract/Free Full Text]
.
-
C. J. Bult
et al.,
ibid.
273,
1058
(1996)
[Abstract].
-
G. Plunkett, personal communication.
-
R. Himmelreich, et al.,
Nucleic Acids Res.
24,
4420
(1996)
[Abstract/Free Full Text]
.
-
C. M. Fraser
et al.,
Science
270,
397
(1995)
[Abstract/Free Full Text]
.
-
J.-F. Tomb
et al.,
Nature
388,
539
(1997)
[CrossRef] [Medline]
.
-
T. Kaneko
et al.,
DNA Res.
3,
109
(1996)
[Abstract]
; the origin is tentatively assigned to the position of
dnaA.
-
We thank B. Rogers for helpful discussions regarding
visualizations, the Boston University Office of Information Technology
and the Scientific Computing and Visualization Group for supercomputing
resources, and an anonymous reviewer for helpful suggestions. S.C.M. is
partially supported by a training grant from the U.S. National Human
Genome Research Institute (T32 HG00041-03). Grant
DE-FG02-98ER62558 from the U.S. Department of Energy supported this
research.
9 December 1997; revised 27 February 1998; accepted 9
March 1998
THIS ARTICLE HAS BEEN CITED BY OTHER ARTICLES:
- In Silico and In Vivo Evaluation of Bacteriophage {phi}EF24C, a Candidate for Treatment of Enterococcus faecalis Infections.
- J. Uchiyama, M. Rashel, I. Takemura, H. Wakiguchi, and S. Matsuzaki (2008)
Appl. Envir. Microbiol.
74, 4149-4163
| Abstract »
| Full Text »
| PDF »
- Replication-associated strand asymmetries in mammalian genomes: Toward detection of replication origins.
- M. Touchon, S. Nicolay, B. Audit, Edward-Benedict Brodie of Brodie, Y. d'Aubenton-Carafa, A. Arneodo, and C. Thermes (2005)
PNAS
102, 9836-9841
| Abstract »
| Full Text »
| PDF »
- Where does bacterial replication start? Rules for predicting the oriC region.
- P. Mackiewicz, J. Zakrzewska-Czerwinska, A. Zawilak, M. R. Dudek, and S. Cebrat (2004)
Nucleic Acids Res.
32, 3781-3791
| Abstract »
| Full Text »
| PDF »
- Analyzing DNA Strand Compositional Asymmetry to Identify Candidate Replication Origins of Borrelia burgdorferi Linear and Circular Plasmids.
- M. Picardeau, J. R. Lobry, and B. J. Hinnebusch (2000)
Genome Res.
10, 1594-1604
| Abstract »
| Full Text »
- A Comparative Genomics Approach to DNA Asymmetry.
- M.P. FRANCINO and H. OCHMAN (1999)
Ann. N.Y. Acad. Sci.
870, 428-431
| Full Text »
| PDF »
- How Does Replication-Associated Mutational Pressure Influence Amino Acid Composition of Proteins?.
- 1. Mackiewicz, A. Gierlik, M. Kowalczuk, M. R. Dudek, and S. Cebrat (1999)
Genome Res.
9, 409-416
| Abstract »
| Full Text »
|
|