Note to users. If you're seeing this message, it means that your browser cannot find this page's style/presentation instructions -- or possibly that you are using a browser that does not support current Web standards. Find out more about why this message is appearing, and what you can do to make your experience of our site the best it can be.
Phire Hot Start DNA Polymerase

Site Tools

  • AAAS
  • Subscribe
  • Feedback

Site Search

Search Advanced

Science 20 April 2007:
Vol. 316. no. 5823, p. 370
DOI: 10.1126/science.1137568

Technical Comments

Comment on "Ongoing Adaptive Evolution of ASPM, a Brain Size Determinant in Homo sapiens"

Fuli Yu,1,2 R. Sean Hill,2,3,4 Stephen F. Schaffner,2 Pardis C. Sabeti,2 Eric T. Wang,5,6 Andre A. Mignault,1 Russell J. Ferland,3,4 Robert K. Moyzis,5,6 Christopher A. Walsh,2,3,4 David Reich1,2*

Mekel-Bobrov et al. (Reports, 9 September 2005, p. 1720) suggested that ASPM, a gene associated with microcephaly, underwent natural selection within the last 500 to 14,100 years. Their analyses based on comparison with computer simulations indicated that ASPM had an unusual pattern of variation. However, when we compare ASPM empirically to a large number of other loci, its variation is not unusual and does not support selection.

1 Department of Genetics, Harvard Medical School, New Research Building, 77 Avenue Louis Pasteur, Boston, MA 02115, USA.
2 Broad Institute of Harvard and MIT, 7 Cambridge Center, Cambridge, MA 02142, USA.
3 Division of Neurogenetics and Howard Hughes Medical Institute, Beth Israel Deaconess Medical Center, Boston, MA 02115, USA.
4 Division of Genetics, Children's Hospital Boston, 300 Longwood Avenue, Boston, MA 02115, USA.
5 Department of Biological Chemistry, College of Medicine, University of California, Irvine, CA 92697, USA.
6 Institute of Genomics and Bioinformatics, University of California, Irvine, CA 92697, USA.

* To whom correspondence should be addressed. E-mail: reich{at}genetics.med.harvard.edu

Mekel-Bobrov et al. (1) presented evidence that the ASPM (abnormal spindle-like microcephaly associated) gene has been subject to positive natural selection in European populations in the past ~6000 years. The authors noted a haplotype of ~40% frequency, which they argued had arisen too recently to be explained by genetic drift alone. They simulated a range of demographic histories and identified none that could produce a haplotype with such a high homozygote frequency. Because the detection of selection solely by comparison with simulated data has had a mixed record (2), we decided to assess the evidence empirically.

We sequenced ~19 kb of ASPM in 16 European Americans (CEU) and 16 West Africans (YRI). We identified single-nucleotide polymorphisms (SNPs) using automated software (3) and assayed all the SNPs we discoveredin 30CEU and 30YRI triosfromthe International Haplotype Map (HapMap) project (4). This was identical to the SNP discovery strategy—using the same sequencing technique, SNP-identification software, and genotyping protocol—that had been used to study 2.5 megabases (Mb) for the Encyclopedia of DNA Elements (ENCODE) project (4). Thus, we could use ENCODE as a near-perfect empirical comparison data set. We could also integrate the data with HapMap to determine whether the pattern of long-range variation around ASPM was unusual.

To test for selection, we first carried out standard tests of the allele frequency spectrum (57), comparing with the ENCODE data to determine statistical significance. Comparison regions were matched in genetic distance and number of segregating sites, using a range of possible recombination rates. No test showed significant evidence for selection (Table 1 and table S1). The single test that gave the strongest signal across ASPM as a whole was Tajima's D, with a nominal significance of P = 0.07 when we used the recombination rate of 1.9 cM/Mb from (1). Significance was even less (P = 0.18 to 0.22) when we reestimated recombination rates based on more recent data sets (table S2). We also calculated the test statistics individually for each of the three regions within ASPM (table S1). After correction for multiple hypothesis testing, again no test was statistically significant.


Table 1. Empirical tests for selection based on comparing ~19 kb of ASPM with ~2.5 Mb of the ENCODE regions and testing for unusual skews in SNP allele frequencies. For each summary statistic, we report {sigma}, defined as the number of standard deviations from the empirical mean in the ENCODE regions, and an empirical P value. To provide a single assessment of statistical significance corrected for having carried out six tests, we recorded the minimum P value for Tajima's D test, the four Fu and Li's tests, and the Fay and Wu's H test (min-P) and then compared empirically with the proportion of matched ENCODE regions that had such an extreme minimum P value.
Frequency-spectrum—based tests

Test for an FST value as extreme as the largest in the region (P value)
Recombination rate used for matching (cM/Mb)* Tajima's D ({sigma}, P value) Fu and Li's D ({sigma}, P value) Fu and Li's D* ({sigma}, P value) Fu and Li's F ({sigma}, P value) Fu and Li's F* ({sigma}, P value) Fay and Wu's H ({sigma}, P value) P value corrected for multiple hypothesis testing Test for excess of homozygotes (P value)

ASPM = -0.26

ASPM = 0.90

ASPM = 0.53

ASPM = 0.55

ASPM = 0.27

ASPM = -8.66



ASPM = 0.41

ASPM = 7/60

0.12 -1.35, 0.20 -0.75, 0.30 -1.26, 0.18 -1.29, 0.21 -1.61, 0.14 -1.33, 0.26 0.41 0.31 0.25
0.5 -1.35, 0.22 -0.78, 0.30 -1.25, 0.17 -1.29, 0.22 -1.60, 0.14 -1.44, 0.20 0.38 0.37 0.21
1 -1.40, 0.18 -0.72, 0.30 -1.25, 0.17 -1.27, 0.22 -1.60, 0.13 -1.49, 0.19 0.31 0.43 0.16
1.9 -1.79, 0.071 -0.71, 0.31 -1.22, 0.17 -1.38, 0.20 -1.71, 0.12 -1.74, 0.14 0.21 0.47 0.12

* We compared ~19 kb of ASPM with ~2.5 Mb from the ENCODE regions, dividing the ENCODE regions into sections that were matched to the ASPM data with regard to the number of segregating sites and genetic distance span. When more sites were available in the matched ENCODE region, we averaged the statistic over 10 random subselections. We matched the genetic distance span, using information from the Oxford linkage disequilibrium—based genetic map (9). To test for robustness to errors in the recombination rate estimate for the ASPM region, we considered a range of estimates for the recombination rate, from 0.12 cM/Mb (the upper bound from table S2) to 1.9 cM/Mb, the value used by Mekel-Bobrov et al. (1). The windows used for comparison were defined by nucleating at each ENCODE SNP and then assessing whether there was a window of matched genetic distance with enough segregating sites for comparison, extending in the 3' direction. This procedure for empirical matching induces some correlation among the windows (due to overlapping spans and linkage disequilibrium), but there is no expected bias in the P values, which are nonsignificant for all comparisons.

We next assessed whether the allele frequency differentiation between CEU/YRI at ASPM supported selection. The SNPA44871G showed an FST = 0.41 between Europeans and West Africans, putting it in the 95th percentile of ENCODE SNPs. After correcting for the number of SNPs in the region, >31% of matched ENCODE regions had at least one SNP with an FST as large, so this observation is not surprising (Table 1). Moreover, the worldwide frequency distribution seems to be in a direct conflict with the suggestion that the G allele arose ~6000 years ago. The allele exists at >50% frequency in Papau New Guinea Highlanders (1), thought to have diverged from Europeans ~40,000 years ago (8).

Next, we repeated the primary analysis of (1), testing for an excess of individuals with two identical copies of any haplotype (3) across the region. Confirming the original report, the haplotype marked by the G allele was the most common in CEU (Fig. 1A). However, this haplotype did not stand out strikingly from the rest as in (1) (compare with fig. S2). The significance of a homozygote excess depends on the regional recombination rate, because unbroken haplotypes are more surprising if the recombination rate is high. Even when applying the high recombination rate used by Mekel-Bobrov et al. (1.9 cM/Mb), the homozygote excess is not significant compared with empirical data (P = 0.12). The evidence becomes even weaker (P = 0.25) when we instead use updated and much lower recombination rate estimates for the region (table S2). The fact that there are well-supported recombination rates that decrease the strength of the signal greatly weakens the evidence for selection.


Figure 1 Fig. 1. Linkage disequilibrium decay around A44871G in European Americans. (A) Haplotype frequency in European Americans (CEU). Blue bars are the derived haplotypes marked by the G allele. (B) Decay of extended haplotype homozygosity (EHH) around A44871G. (C) The significance of the LRH test at each marker is evaluated empirically by comparing with the genome-wide data from HapMap, matched with regard to breakdown of homozygosity. The most extreme P value of 0.03 is not striking when compared against the lowest P value seen in 1000 comparison regions, 90% of which show stronger evidence for selection at some distance. (D) The extent of the haplotype around the G allele (red dot, defined as the span for which EHH > 0.35), in comparison with alleles of matched frequency in CEU from HapMap on chromosome 1. This is well within the 95% central range of HapMap, whether plotted by physical distance (this figure) or genetic distance (fig. S3). To match the marker density of HapMap Phase I, we randomly dropped SNPs from ASPM until we had 1 SNP every 5 kb. With this lower density, the span of the G haplotype is 285 kb. [View Larger Version of this Image (16K GIF file)]
 

We also assessed evidence for selection at ASPM by carrying out the long-range haplotype (LRH) test (9). This test assesses whether a haplotype is too young to have risen to its frequency without selection. The LRH test is not affected by uncertainty in recombination rate estimates. We compared LRH results for the A44871G polymorphism to SNPs of matched frequency in HapMap CEU (3, 10) (Fig. 1C). We observed at least as strong a signal for selection at 90% of the regions examined (3, 11). Several genome-wide surveys using similar methods also failed to find evidence for selection at ASPM in European-derived populations (4, 12, 13). The one survey that did find a signal near ASPM did so only in individuals of Chinese ancestry (13), failing to support the contention of (1) of recent selection in European history. Based on linkage disequilibrium (LD) breaking down within ~100 kb on either side (Fig. 1B), we estimate that the G allele arose in European history at least tens of thousands of years ago and possibly more than 100,000 years ago (14) (table S3 and SOM Text). These dates are difficult to reconcile with selection ~6000 years ago, as suggested in (1).

One explanation for the differences between our results and (1) is that we assessed significance through comparison with empirical data. Empirical comparisons are robust to difficult-to-model features of real data, such as failure to detect real polymorphisms in a sample, or to fully understand the complexity of population history. Methodologically, these results are also important, demonstrating that one should not only compare with computer simulations but also show that a region stands out empirically compared with data collected in the same way, to build a compelling case for natural selection (2, 15).


References and Notes

  • 1. N. Mekel-Bobrov et al., Science 309, 1720 (2005).[Abstract/Free Full Text]
  • 2. P. C. Sabeti et al., Science 312, 1614 (2006).[Abstract/Free Full Text]
  • 3. Materials and methods are available as supporting material on Science Online.
  • 4. The International HapMap Consortium, Nature 437, 1299 (2005). [CrossRef] [Medline]
  • 5. F. Tajima, Genetics 123, 585 (1989).[Abstract/Free Full Text]
  • 6. Y. X. Fu, W. H. Li, Genetics 133, 693 (1993).[Abstract]
  • 7. J. C. Fay, C. I. Wu, Genetics 155, 1405 (2000).[Abstract/Free Full Text]
  • 8. L. L. Cavalli-Sforza, M. W. Feldman, Nat. Genet. 33, 266 (2003). [CrossRef] [ISI] [Medline]
  • 9. P. C. Sabeti et al., Nature 419, 832 (2002). [CrossRef] [Medline]
  • 10. We carried out the LRH analysis using A44871G as the core SNP. We calculated the proportion of pairs of CEU chromosomes carrying the G allele at A11487G that have identical alleles up to a given SNP, and divided by the same quantity for the A allele. A high proportion of identical haplotypes spanning a large recombination distance, compared with the internal control of the other allele, indicates a recent age. For empirical comparison, we carried out the LRH test and obtained the same statistic for each HapMap SNP with matched allele frequency (35% ± 1%). Distant markers were matched for genetic distance (±1%) to allow empirical comparisons to the analogous markers in the ASPM gene.
  • 11. A problem with assessing statistical significance in the LRH test is that there are many distances from the core SNP at which one can measure statistical significance. To obtain a single statistic, we calculate P values at each distance and record the lowest value at any distance up to the point where the probability that two randomly chosen haplotypes are identical is 10% of that at the central SNP. To empirically assess statistical significance, we randomly sampled 1000 SNPs that were frequency-matched to A44871G, assessed P values at each distance around them, and recorded the minimum P value; 90% of randomly chosen regions show a minimum P value more extreme than ASPM (P = 0.90).
  • 12. B. F. Voight, S. Kudaravalli, X. Wen, J. K. Pritchard, PLoS Biol. 4, e72 (2006). [CrossRef] [Medline]
  • 13. E. T. Wang, G. Kodama, P. Baldi, R. K. Moyzis, Proc. Natl. Acad. Sci. U.S.A. 103, 135 (2006).[Abstract/Free Full Text]
  • 14. The prediction of a long haplotype around the derived G allele at A44871G is equivalent to a prediction about the presence of long-range LD. If the haplotype associated with the G allele at A44871G was driven to high frequency by selection within the past ~6000 years (1), then the haplotype will have had no more than 200 to 300 generations to break down by recombination; thus, the predicted LD length would be 1/3 to 1/2 centimorgans (cM). Even if the estimate of 1.9 cM/Mb is assumed to be correct, LD should extend hundreds of kilobases in either direction. Strong LD breaks down within 100 kb in our data (Fig. 1B), corresponding to 1/100 to 1/5 cM, depending on the recombination rate estimate. This corresponds to several tens or several hundreds of thousands of years.
  • 15. K. M. Teshima, G. Coop, M. Przeworski, Genome Res. 16, 702 (2006).[Abstract/Free Full Text]
  • 16. D.R. was supported by a Burroughs Wellcome Career Development Award in the Biomedical Sciences. C.A.W. is an Investigator of the Howard Hughes Medical Institute. We thank G. McDonald, C. Montague, S. Myers, N. Patterson, D. Richter, and L. Ziaugra for experimental and bioinformatic support. We thank M. Bernstein, G. Coop, A. Keinan, S. Myers, and A. Price for critical comments about the work.

Supporting Online Material

www.sciencemag.org/cgi/content/full/316/5823/370b/DC1

Materials and Methods

Figs. S1 to S3

Tables S1 to S3

References

Data Files S1 and S2


Received for publication 14 November 2006. Accepted for publication 12 March 2007.






ADVERTISEMENT
Click Me!

ADVERTISEMENT
Click Me!

To Advertise     Find Products

ADVERTISEMENT

Featured Jobs

Science. ISSN 0036-8075 (print), 1095-9203 (online)