Note to users. If you're seeing this message, it means that your browser cannot find this page's style/presentation instructions -- or possibly that you are using a browser that does not support current Web standards. Find out more about why this message is appearing, and what you can do to make your experience of our site the best it can be.
Comment on "Ongoing Adaptive Evolution of ASPM, a Brain Size Determinant in Homo sapiens"
Fuli Yu,1,2R. Sean Hill,2,3,4Stephen F. Schaffner,2Pardis C. Sabeti,2Eric T. Wang,5,6Andre A. Mignault,1Russell J. Ferland,3,4Robert K. Moyzis,5,6Christopher A. Walsh,2,3,4David Reich1,2*
Mekel-Bobrov et al. (Reports, 9 September 2005, p. 1720) suggestedthat ASPM, a gene associated with microcephaly, underwent naturalselection within the last 500 to 14,100 years. Their analysesbased on comparison with computer simulations indicated thatASPM had an unusual pattern of variation. However, when we compareASPM empirically to a large number of other loci, its variationis not unusual and does not support selection.
1 Department of Genetics, Harvard Medical School, New Research Building, 77 Avenue Louis Pasteur, Boston, MA 02115, USA. 2 Broad Institute of Harvard and MIT, 7 Cambridge Center, Cambridge, MA 02142, USA. 3 Division of Neurogenetics and Howard Hughes Medical Institute, Beth Israel Deaconess Medical Center, Boston, MA 02115, USA. 4 Division of Genetics, Children's Hospital Boston, 300 Longwood Avenue, Boston, MA 02115, USA. 5 Department of Biological Chemistry, College of Medicine, University of California, Irvine, CA 92697, USA. 6 Institute of Genomics and Bioinformatics, University of California, Irvine, CA 92697, USA.
* To whom correspondence should be addressed. E-mail: reich{at}genetics.med.harvard.edu
Mekel-Bobrov et al. (1) presented evidence that the ASPM (abnormalspindle-like microcephaly associated) gene has been subjectto positive natural selection in European populations in thepast 6000 years. The authors noted a haplotype of 40% frequency,which they argued had arisen too recently to be explained bygenetic drift alone. They simulated a range of demographic historiesand identified none that could produce a haplotype with sucha high homozygote frequency. Because the detection of selectionsolely by comparison with simulated data has had a mixed record(2), we decided to assess the evidence empirically.
We sequenced 19 kb of ASPM in 16 European Americans (CEU) and16 West Africans (YRI). We identified single-nucleotide polymorphisms(SNPs) using automated software (3) and assayed all the SNPswe discoveredin 30CEU and 30YRI triosfromthe International HaplotypeMap (HapMap) project (4). This was identical to the SNP discoverystrategyusing the same sequencing technique, SNP-identificationsoftware, and genotyping protocolthat had been used tostudy 2.5 megabases (Mb) for the Encyclopedia of DNA Elements(ENCODE) project (4). Thus, we could use ENCODE as a near-perfectempirical comparison data set. We could also integrate the datawith HapMap to determine whether the pattern of long-range variationaround ASPM was unusual.
To test for selection, we first carried out standard tests ofthe allele frequency spectrum (57), comparing with theENCODE data to determine statistical significance. Comparisonregions were matched in genetic distance and number of segregatingsites, using a range of possible recombination rates. No testshowed significant evidence for selection (Table 1 and tableS1). The single test that gave the strongest signal across ASPMas a whole was Tajima's D, with a nominal significance of P= 0.07 when we used the recombination rate of 1.9 cM/Mb from(1). Significance was even less (P = 0.18 to 0.22) when we reestimatedrecombination rates based on more recent data sets (table S2).We also calculated the test statistics individually for eachof the three regions within ASPM (table S1). After correctionfor multiple hypothesis testing, again no test was statisticallysignificant.
Table 1. Empirical tests for selection based on comparing 19 kb of ASPM with 2.5 Mb of the ENCODE regions and testing for unusual skews in SNP allele frequencies. For each summary statistic, we report , defined as the number of standard deviations from the empirical mean in the ENCODE regions, and an empirical P value. To provide a single assessment of statistical significance corrected for having carried out six tests, we recorded the minimum P value for Tajima's D test, the four Fu and Li's tests, and the Fay and Wu's H test (min-P) and then compared empirically with the proportion of matched ENCODE regions that had such an extreme minimum P value.
Frequency-spectrumbased tests
Test for an FST value as extreme as the largest in the region (P value)
* We compared 19 kb of ASPM with 2.5 Mb from the ENCODE regions, dividing the ENCODE regions into sections that were matched to the ASPM data with regard to the number of segregating sites and genetic distance span. When more sites were available in the matched ENCODE region, we averaged the statistic over 10 random subselections. We matched the genetic distance span, using information from the Oxford linkage disequilibriumbased genetic map (9). To test for robustness to errors in the recombination rate estimate for the ASPM region, we considered a range of estimates for the recombination rate, from 0.12 cM/Mb (the upper bound from table S2) to 1.9 cM/Mb, the value used by Mekel-Bobrov et al. (1). The windows used for comparison were defined by nucleating at each ENCODE SNP and then assessing whether there was a window of matched genetic distance with enough segregating sites for comparison, extending in the 3' direction. This procedure for empirical matching induces some correlation among the windows (due to overlapping spans and linkage disequilibrium), but there is no expected bias in the P values, which are nonsignificant for all comparisons.
We next assessed whether the allele frequency differentiationbetween CEU/YRI at ASPM supported selection. The SNPA44871Gshowed an FST = 0.41 between Europeans and West Africans, puttingit in the 95th percentile of ENCODE SNPs. After correcting forthe number of SNPs in the region, >31% of matched ENCODEregions had at least one SNP with an FST as large, so this observationis not surprising (Table 1). Moreover, the worldwide frequencydistribution seems to be in a direct conflict with the suggestionthat the G allele arose 6000 years ago. The allele exists at>50% frequency in Papau New Guinea Highlanders (1), thoughtto have diverged from Europeans 40,000 years ago (8).
Next, we repeated the primary analysis of (1), testing for anexcess of individuals with two identical copies of any haplotype(3) across the region. Confirming the original report, the haplotypemarked by the G allele was the most common in CEU (Fig. 1A).However, this haplotype did not stand out strikingly from therest as in (1) (compare with fig. S2). The significance of ahomozygote excess depends on the regional recombination rate,because unbroken haplotypes are more surprising if the recombinationrate is high. Even when applying the high recombination rateused by Mekel-Bobrov et al. (1.9 cM/Mb), the homozygote excessis not significant compared with empirical data (P = 0.12).The evidence becomes even weaker (P = 0.25) when we insteaduse updated and much lower recombination rate estimates forthe region (table S2). The fact that there are well-supportedrecombination rates that decrease the strength of the signalgreatly weakens the evidence for selection.
Fig. 1. Linkage disequilibrium decay around A44871G in European Americans. (A) Haplotype frequency in European Americans (CEU). Blue bars are the derived haplotypes marked by the G allele. (B) Decay of extended haplotype homozygosity (EHH) around A44871G. (C) The significance of the LRH test at each marker is evaluated empirically by comparing with the genome-wide data from HapMap, matched with regard to breakdown of homozygosity. The most extreme P value of 0.03 is not striking when compared against the lowest P value seen in 1000 comparison regions, 90% of which show stronger evidence for selection at some distance. (D) The extent of the haplotype around the G allele (red dot, defined as the span for which EHH > 0.35), in comparison with alleles of matched frequency in CEU from HapMap on chromosome 1. This is well within the 95% central range of HapMap, whether plotted by physical distance (this figure) or genetic distance (fig. S3). To match the marker density of HapMap Phase I, we randomly dropped SNPs from ASPM until we had 1 SNP every 5 kb. With this lower density, the span of the G haplotype is 285 kb.
[View Larger Version of this Image (16K GIF file)]
We also assessed evidence for selection at ASPM by carryingout the long-range haplotype (LRH) test (9). This test assesseswhether a haplotype is too young to have risen to its frequencywithout selection. The LRH test is not affected by uncertaintyin recombination rate estimates. We compared LRH results forthe A44871G polymorphism to SNPs of matched frequency in HapMapCEU (3, 10) (Fig. 1C). We observed at least as strong a signalfor selection at 90% of the regions examined (3, 11). Severalgenome-wide surveys using similar methods also failed to findevidence for selection at ASPM in European-derived populations(4, 12, 13). The one survey that did find a signal near ASPMdid so only in individuals of Chinese ancestry (13), failingto support the contention of (1) of recent selection in Europeanhistory. Based on linkage disequilibrium (LD) breaking downwithin 100 kb on either side (Fig. 1B), we estimate that theG allele arose in European history at least tens of thousandsof years ago and possibly more than 100,000 years ago (14) (tableS3 and SOM Text). These dates are difficult to reconcile withselection 6000 years ago, as suggested in (1).
One explanation for the differences between our results and(1) is that we assessed significance through comparison withempirical data. Empirical comparisons are robust to difficult-to-modelfeatures of real data, such as failure to detect real polymorphismsin a sample, or to fully understand the complexity of populationhistory. Methodologically, these results are also important,demonstrating that one should not only compare with computersimulations but also show that a region stands out empiricallycompared with data collected in the same way, to build a compellingcase for natural selection (2, 15).
10. We carried out the LRH analysis using A44871G as the core SNP. We calculated the proportion of pairs of CEU chromosomes carrying the G allele at A11487G that have identical alleles up to a given SNP, and divided by the same quantity for the A allele. A high proportion of identical haplotypes spanning a large recombination distance, compared with the internal control of the other allele, indicates a recent age. For empirical comparison, we carried out the LRH test and obtained the same statistic for each HapMap SNP with matched allele frequency (35% ± 1%). Distant markers were matched for genetic distance (±1%) to allow empirical comparisons to the analogous markers in the ASPM gene.
11. A problem with assessing statistical significance in the LRH test is that there are many distances from the core SNP at which one can measure statistical significance. To obtain a single statistic, we calculate P values at each distance and record the lowest value at any distance up to the point where the probability that two randomly chosen haplotypes are identical is 10% of that at the central SNP. To empirically assess statistical significance, we randomly sampled 1000 SNPs that were frequency-matched to A44871G, assessed P values at each distance around them, and recorded the minimum P value; 90% of randomly chosen regions show a minimum P value more extreme than ASPM (P = 0.90).
12. B. F. Voight, S. Kudaravalli, X. Wen, J. K. Pritchard, PLoS Biol.4, e72 (2006). [CrossRef] [Medline]
13. E. T. Wang, G. Kodama, P. Baldi, R. K. Moyzis, Proc. Natl. Acad. Sci. U.S.A.103, 135 (2006).[Abstract/Free Full Text]
14. The prediction of a long haplotype around the derived G allele at A44871G is equivalent to a prediction about the presence of long-range LD. If the haplotype associated with the G allele at A44871G was driven to high frequency by selection within the past 6000 years (1), then the haplotype will have had no more than 200 to 300 generations to break down by recombination; thus, the predicted LD length would be 1/3 to 1/2 centimorgans (cM). Even if the estimate of 1.9 cM/Mb is assumed to be correct, LD should extend hundreds of kilobases in either direction. Strong LD breaks down within 100 kb in our data (Fig. 1B), corresponding to 1/100 to 1/5 cM, depending on the recombination rate estimate. This corresponds to several tens or several hundreds of thousands of years.
16. D.R. was supported by a Burroughs Wellcome Career Development Award in the Biomedical Sciences. C.A.W. is an Investigator of the Howard Hughes Medical Institute. We thank G. McDonald, C. Montague, S. Myers, N. Patterson, D. Richter, and L. Ziaugra for experimental and bioinformatic support. We thank M. Bernstein, G. Coop, A. Keinan, S. Myers, and A. Price for critical comments about the work.
Received for publication 14 November 2006. Accepted for publication 12 March 2007.
The editors suggest the following Related Resources on Science sites:
In Science Magazine
REPORTS
Nitzan Mekel-Bobrov, Sandra L. Gilbert, Patrick D. Evans, Eric J. Vallender, Jeffrey R. Anderson, Richard R. Hudson, Sarah A. Tishkoff, and Bruce T. Lahn (9 September 2005) Science309 (5741), 1720.
[DOI: 10.1126/science.1116815] |Abstract »|Full Text »|PDF »|Supporting Online Material »