Jump to: Page Content, Section Navigation, Site Navigation, Site Search, Account Information, or Site Tools.
|
|
Technical Comments
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
(1-1) |
Although we obtain the same correlations for r2 with physical distance as Awadalla et al. (1), we find that, in contrast to their report, only two of their data sets yield significance levels (P) of <0.05, and none of four additional data sets that we have analyzed show a significant relationship between r2 and physical distance (Table 1). Furthermore, neither the data sets analyzed by Awadalla et al. nor the four additional sets analyzed here reach statistical significance when D' is used (one set does yield a D' value for which P < 0.05, but the correlation is positive rather than negative). Most of the D' values are equal to 1.0, and D' does not show a decline with distance in any of the eight data sets (Fig. 1). Most pairs of sites are at the maximum level of disequilibrium allowed by the allele frequencies. Thus, most of the r2 values reported by Awadalla et al. (1) are as high as they can be given the allele frequencies. The apparent decline of r2 with distance would seem to be primarily an artifact of allele frequency dependency.
|
|||||||||||||||||||||||||||||||||
Fig. 1.
The relationship between linkage disequilibrium,
measured by D', and the physical distance between pairs of
polymorphisms, measured in number of base pairs (bp), using the data
from Awadalla et al. [figure 1A of (1)].
Awadalla et al. (1) concluded that their results imply that mtDNA-based conclusions about human evolution should be "reconsidered." Yet comparisons of worldwide mtDNA variation and autosomal variation have generally yielded consistent results (7-10), with the observed differences readily explained by factors such as sex-specific differences in gene flow or the lower effective size of mtDNA. This level of consistency would not be expected if mtDNA variation were affected seriously by recombination.
The possibility of recombination in mtDNA is intriguing and deserves further evaluation. Six of the eight mtDNA data sets examined here fail to show a significant decline of LD with physical distance using the r2 statistic, however, and none show a decline using the more appropriate D' statistic. Thus, LD patterns provide little support for the hypothesis of mtDNA recombination.
L. B. Jorde
Eccles Institute of Human Genetics
University of Utah School of
Medicine
10 North 2030 East
Salt Lake City, UT 84112, USA
E-mail: lbj{at}genetics.utah.edu
M. Bamshad
Department of Pediatrics
University of Utah School of Medicine
E-mail: mike{at}genetics.utah.edu
Awadalla et al. (1) concluded that a negative association between LD and distance between site pairs constitutes evidence for recombination in mtDNA. Our reanalysis of their data reveals major problems with their analysis and indicates that these data are consistent with mutation and linkage rather than with recombination.
First, the r2 measure of LD between two loci, used by Awadalla et al., depends not only on recombination but also on allele frequencies at the two loci (2, 3). Values of r2 do not range from 0 to 1 unless allele frequencies are equal at the two loci (4), and the most frequent alleles at the two loci are positively associated. For example, in sites 4985 and 11251, in the most extensive data set of 45 complete mtDNA sequences they used, there are 34 AA, 6 AG, 5 GA, and 0 GG sequences (haplotypes), and the value of r2 is only 0.02. However, these sites actually have the maximum LD possible for the frequencies of the constituent nucleotides.
It is therefore preferable to use the absolute value of the
standardized LD measure D', which has a range of 0 to 1 for
any set of allele frequencies (5). For sites 4985 and 11251, | D' | = 1, which occurs whenever at least one of the
four possible haplotypes is not present. In the mtDNA sequence data set, 58 of the 91 pairs of the 14 polymorphic sites (63.7%) show the maximum possible LD ( | D' | = 1). In
contrast to the result for r2
(1), no relationship exists between | D' | and physical distance, in base pairs (bp), between sites (Fig. 1A).
Indeed, the average | D' | for the 42 site pairs less
than 3000 bp apart is 0.79 ± 0.04, smaller than 0.86 ± 0.04 for the 49 site pairs greater than 3000 bp apart. These results and
those from the RFLP data (Fig. 1B through 1D) that were also analyzed
by Awadalla et al. (1) provide no support for the
hypothesis of recombination in human mtDNA. Furthermore, there is no
distance-dependent recombination in their mtDNA sequence data set,
because site pairs showing | D' | < 1 are distributed
randomly with distance (Pearson's correlation coefficient,
=
0.13; P = 0.27).
Fig. 1.
Relation between distance between site pairs
and LD as measured by | D' | for (A)
synonymous variants from 45 complete mtDNA sequences (
= 0.06),
and for three RFLP data sets obtained from (1):
(B) Swedish and Finnish (
= 0.25), (C)
Native Siberians (
= -0.09), and (D) Native
Americans (
= 0.04). (E) Fisher's exact-test
probability for the data set in (A) (
= 0.20). Removal of site
pairs showing | D' | = 1 results in
= -0.13
for (A) and
= 0.03 for (E); for other panels, number of
observations was insufficient to compute
. None of the correlation
coefficients are significant at the 5% level.
Another approach to measuring LD is to calculate the probability of observing an equal or more extreme two-locus association by chance alone [Fisher's exact test (6)]. Exact probability values can be used to determine the association of LD with physical distance, because the probability value should be higher for more distant sites than for closer sites if recombination is occurring, and thus a positive slope between physical distance and the exact-test probability is expected. No such significant association is observed for the sequence data (Fig. 1E) or the three RFLP data sets (results not shown).
The relationship of r2 and mutation can
be examined by constructing a phylogenetic tree (Fig. 2) using the 14 variable sites analyzed from the 45 sequences. (Only 22 unique
sequences actually exist, because six of the haplotypes occur multiple
times.) In this tree, internal branches are almost as long as the
external branches, an appearance unlike those of other human mtDNA
trees (7, 8). This probably reflects the removal of sites at
which the nucleotide frequencies were
0.10, which leaves out
any variants represented in less than five sequences. This tree clearly
indicates that the nucleotide differences among haplotypes are
correlated. Mapping nucleotide substitutions on the phylogenetic tree
of 22 unique haplotypes reveals unique transitional changes at four
sites (11251, 12372, 14783, and 15043), parallel transitional changes
at eight sites (4985, 6455, 7028, 9540, 10873, 12705, 13617, and
15301), and backward changes at the other two sites (11467 and 11299).
Pairwise comparisons of sites with unique mutations should not provide
any information about recombination; however,
r2 is spuriously close to zero (0.02 to
0.05) for five of six such pairs, even though each pair shows maximum
| D' |. A similar problem exists for
r2 estimates computed for 28 pairs of
parallel mutation sites (15 pairs show
r2 < 0.05). Therefore, unique and
parallel mutations by themselves in independent lineages may produce
unusually low r2 values. Furthermore,
pairs of sites with either only three of the four possible haplotypes
observed, or the fourth haplotype observed only once, occur with a
frequency of 84.6% (Fig. 1A), 93.3% (Fig. 1B), 90.5% (Fig. 1C), and
70.0% (Fig. 1D) in the four data sets. These observations support the
lack of recombination in human mtDNA, rather than providing evidence
for it.
Fig. 2.
Phylogenetic tree of human mtDNA
haplotypes based on the number of differences observed in the 14 sites
analyzed in (1). The neighbor-joining tree is shown with
branch lengths denoting the actual number of differences per
sequence (13). Wallace sequence is connected to the
tree with a dashed line because the sequence has missing
data in 3 of 14 sites.
Our reanalysis thus contradicts the contention by Awadalla et al. (1) that recombination is occurring in human mtDNA. Extensive family studies have likewise failed to find any exceptions to strict maternal clonal inheritance of human mtDNA (9-12). There is no need to reconsider inferences about human or mtDNA evolution that have assumed that recombination does not occur in human mtDNA.
Sudhir Kumar
Philip Hedrick
Thomas Dowling
Department of Biology
Arizona State University
Tempe, AZ 85287, USA
Mark Stoneking
Max Planck Institute for Evolutionary Anthropology
Inselstrasse
22
D-04103 Leipzig, Germany
E-mail: stoneking{at}eva.mpg.de
Recent studies arguing for significant recombination between maternal and paternal mtDNA in humans (1, 2) have generated considerable debate (3-7). Awadalla et al. (8) have added to that debate a new study presenting a significant decline in LD between mtDNA sites with the distance between sites, for both humans and chimpanzees. Arguing that this effect is difficult to explain in any way other than recombination, Awadalla et al. (8) concluded that inferences about human and mtDNA evolution based on presumed clonal inheritance "will now have to be reconsidered."
In reference to that conclusion, Awadalla et al. cited several important papers that analyzed sequence variation in the mtDNA control region (CR), and that found extreme mutation rate heterogeneity among sites (9-12). Other proponents of mtDNA recombination have also suggested that such apparent mutation rate heterogeneity in the CR may be due instead to patterns of recombination (2). However, the data analyzed by Awadalla et al. come from either the entire mtDNA genome (RFLP data), the entire mtDNA protein-coding region (sequence data, humans), or two widely separated regions (ND2 and the CR, sequence data, chimpanzees). Moreover, their study provided no indication of how frequently recombination would have to occur to produce the negative correlation between linkage disequilibrium and distance they reported. If recombination is causing this effect, it raises an important question: Could that recombination be sufficiently frequent to shape the observed patterns of CR variation, and thereby invalidate the vast body of human evolutionary work that has been based on CR sequences?
To address that question, we repeated the analysis of Awadalla et al. (8), using a database of hypervariable region 1 and hypervariable region 2 sequences from the CR of 1278 individuals representing a range of ethnic groups (103 African-American, 110 Afro-Caribbean, 98 English Caucasian, 536 U.S. Caucasian, 97 Hispanic, 115 African, 57 U.S. Asian, and 162 Japanese). We analyzed sites that were twofold degenerate within this database, and whose minority variant occurred in at least 5% of the individuals in the database. The positions analyzed were 16069, 16126, 16172, 16187, 16189, 16223, 16224, 16278, 16294, 16304, 16311, 16319, 16362, 73, 146, 150, 152, 153, 182, 195, 198, 204, and 295 [relative to the Cambridge Reference Sequence (13)]. For all pairs of sites, we calculated the Hill and Robertson measure of LD (14), and analyzed this against the distance between sites. The calculated Pearson's correlation coefficient was a nonsignificantly positive value of 0.062 [significance determined as in Awadalla et al. (8), with 4129 of 5000 random replicates giving a correlation of 0.062 or lower].
Our analysis shows that in the human mtDNA CR, there is no indication of a negative correlation between LD and distance that would signal the action of recombination. The closer proximity of even the most distant sites in our analysis, 795 bp, may account for why Awadalla et al. detected recombination and we did not. Nonetheless, our results suggest that recombination, if it occurs, is not of a level to leave a trace in a CR data set for which mutation rate heterogeneity among sites is glaringly evident (9-12).
In light of that finding, it seems unlikely that our understanding of the pattern and relative rates of sequence evolution within the mtDNA CR will require substantial revision based on the Awadalla et al. report. Our analysis also suggests that mtDNA forensic testing will be negligibly impacted by recombination; forensic applications already deal successfully with intergenerational mutation (15, 16), clearly a far more significant effect.
Thomas J. Parsons
Jodi A. Irwin
Armed Forces DNA Identification
Laboratory
Armed Forces Institute
of Pathology
1413 Research Blvd.
Rockville, MD 20886, USA
Response: We recently showed that LD declines with increasing distance between sites in human and chimpanzee mtDNA (1), an observation consistent with genetic recombination in hominid mitochondria. Four groups have questioned our findings for a number of different reasons. Before addressing these arguments in detail, however, we take this opportunity to provide corrected probability values (Table 1), which included slight errors in (1). The new figures do not qualitatively affect our conclusions.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Kivisild and Villems argue that our results may stem from errors in the
data or a bias in the restriction sites surveyed. This seems unlikely,
for several reasons. First, we observe a negative correlation between
LD and distance in 8 of the 10 data sets considered by us and by Jorde
and Bamshad in their comment, with the probability being <10% in
five cases [table 1 of (1) and table 1 of Jorde and
Bamshad]. Second, random sequencing errors would not be included in
our analysis because they would generally appear as singletons, and
systematic sequencing errors, though they might increase or decrease
LD, would do so without respect to distance. Although there was an
error at 4985 in the original Cambridge sequence (2), which
was not included in our sample, that does not mean that our
sequence data are incorrect, and we see no reason to believe that 6455 is in error. If we remove either of these sites from our data, the
correlation between LD and distance remains negative, though the
significance level is reduced (excluding 4985,
=
0.176 and
P = 0.151; excluding 6455,
=
0.234 and
P = 0.089; excluding both,
=
0.119 and
P = 0.275). The assertion that sequence 12 from
(3) is incorrect and is "likely a mosaic of
haplogroup T- and H-type mtDNA genomes" simply illustrates a
dogmatic belief in the clonality of mtDNA. Might it not be a
recombinant?
Jorde and Bamshad and Kumar et al. suggest that our results
may be an artifact of the measure of LD we used,
r2, because
r2 is more dependent upon allele
frequencies than other LD statistics such as D'. We believe
that this criticism of r2 is misleading.
First, neither group provides a model of allele frequency variation
that would generate our results, and such a model is difficult to
envisage for a circular chromosome. The decline in
r2 is as expected under a population
genetic model with recombination (4, 5). Second, both
r2 and D' are affected by the
frequency of the alleles (6, 7), because
the denominators of both are simple functions of allele frequency. In
five of the six data sets we analyzed, the reciprocal of the
denominator of r2,
1/pA(1 - pA)pB(1
pB), is positively correlated with distance (Table 1), which suggests that the negative correlation between r2 and distance is not caused by an
unusual pattern in allele frequencies.
We do not find it surprising that the correlation between D' and distance is not significant, because r2 should have more power to detect recombination than D'. D' has an extremely skewed distribution at low recombination rates and allele frequencies (7, 8), and r2 provides a more informative estimate of LD. Consider two populations in which the frequencies of three haplotypes, AB, aB and Ab, are (0.33, 0.33, and 0.33) in the first population and (0.49, 0.50, and 0.01) in the second. The evidence for LD is stronger in the first than in the second because the absence of the fourth haplotype is surprising only in the first sample. This is reflected in the r2 values, which are 0.33 and 0.01, respectively, but not in D', which is one in each case.
The statistic r2 also reflects two other aspects of recombination that D' does not. The value of r2 is greatest when the two sites have similar allele frequencies, and the two rare (common) alleles are in coupling. Sites are more likely to have the same allele frequency, and be in coupling, if they have not recombined, since they then share the same genealogy (9, 10); thus, sites close together are expected to have higher r2 values if recombination is occurring. D' will also tend to have higher values if the alleles are in coupling if all four haplotypes are present. Hence, r2 can potentially detect recombination, even when the fourth haplotype is not present and D' = 1.
Simulations and data have shown that a high proportion of D' values are expected to equal one in recombining sequences, especially when sites exhibit low allele frequencies (7, 8, 11-13). The striking observation in both the data we used and the data used by Kivisild and Villems in their "phylogenetic argument" is not that many of the pairwise comparisons involve only three haplotypes and D' = 1, but that a substantial proportion involve four haplotypes. The fourth haplotype can be produced only by recombination or multiple mutation.
Kumar et al. show that the Fisher's exact test (FET) value is not visibly correlated to distance for our sequence data set, although the correlation is marginally significant (P = 0.085). However, the logarithm of the FET value is positively correlated with distance, a relationship that is significant in three of our data sets and marginally significant in two others (Table 1). This is not surprising: r2 is proportional to the value obtained from a chi-square test for heterogeneity, so the logarithm of the FET value and r2 are very similar statistics.
Kumar et al. present a phylogenetic analysis that they suggest is more consistent with multiple mutation than recombination. If there is recombination, however, a phylogenetic tree represents nothing physical; it is simply an estimate of the "average" genealogy of the sites being considered. In essence, we find a logical error here: Kumar et al. implicitly assume that recombination does not occur in constructing their tree, and then use the tree to argue that there is no recombination. The most obvious property of the tree is the number of homoplasies it contains, which, again, can be produced only by recombination or multiple mutation.
Recombination has not been observed in pedigree analyses of mtDNA, but the sample sizes are relatively small. Partial control region sequences have been obtained from about 1500 generational events (14-16). Given that there are only 100 mitochondria in the sperm, compared with 100,000 in the egg, this sample size represents the lower limit that would be needed to detect recombination if there were no selection against paternal mitochondria, as there appears to be. Paternal inheritance is rarely tested for, or considered, in these analyses; single nucleotide changes that occur in the pedigree are assumed to reflect new mutations and multiple nucleotide changes are assumed to be due to laboratory error or a problem with the genealogy. Both could be due to paternal inheritance, but that is rarely considered as the cause.
Finally, Parsons and Irwin argue that although there may be evidence of recombination in the mitochondrial genome as a whole, there is no evidence of recombination in the control region, because there is no correlation between LD and distance. We think that this is a dangerous argument to make. First, "an absence of evidence is not evidence of an absence"; many sequences that have undergone recombination do not show a significant decline in LD with distance, and this is likely to be particularly true for short sequences. This does not mean that they are not affected by recombination. Second, they analyzed sequences from a variety of ethnic groups, which may introduce LD due to population subdivision and may thus obscure any patterns in LD due to recombination. Third, as we have argued elsewhere (17, 18), there may be independent evidence of recombination in the control region; phylogenetic trees constructed with control region sequences typically show high levels of homoplasy. Although it is commonly assumed that the homoplasies are caused by hypervariability (19), they may be caused by recombination (17, 18).
In short, we believe that the high level of homoplasy in many mitochondrial data sets in both humans and chimpanzees, and the decline in LD with distance in some, provide good evidence that recombination does occur in mtDNA. The next challenge will be to estimate the rate at which recombination occurs, and test whether hominids are unique in allowing it to happen.
Philip Awadalla
Institute of Cell, Animal and
Population Biology
University of
Edinburgh
Edinburgh EH9 1JT, UK
E-mail: p.awadalla{at}ed.ac.uk
Adam Eyre-Walker
John Maynard Smith
Centre for the Study of Evolution and
School of Biological
Sciences
University of Sussex
Brighton BN1 9QG, UK
E-mail:
a.c.eyre-walker{at}sussex.ac.uk
Science. ISSN 0036-8075 (print), 1095-9203 (online)