Note to users. If you're seeing this message, it means that your browser cannot find this page's style/presentation instructions -- or possibly that you are using a browser that does not support current Web standards. Find out more about why this message is appearing, and what you can do to make your experience of our site the best it can be.
Response to Comment on "Genetic Structure of Human Populations"
Estimates of genetic variance components depend on the typeof marker used, the definitions of geographic regions, the populationssampled within these regions, the relative sample sizes fromthe populations, and the way in which information is combinedacross loci. For microsatellite markers, estimates also dependon whether the quantity whose variance is partitioned is anallele-size variable or an indicator variable for allelic presenceor absence. A main purpose of our variance component estimationwas to provide insight into the fine-scale population structureanalysis in (1). Because the structure algorithm uses only identityand nonidentity of alleles, descriptive statistics that employallelic indicator variables are more appropriate for understandingthe dependence of structure-based inference on the "level ofdifference" among groups than are statistics that use allelesize.
Excoffier and Hamilton (2) performed a complementary variancecomponent analysis, demonstrating that when a subset of ourdata corresponding to (3) is studied using allele sizes, aswas done in (3), similar estimates to (3) are obtained. Theirsmaller within-population variance component compared with thatin (1) is consistent with the smaller estimate of (3) in relationto microsatellite studies that used indicator variables (47).However, because previous indicator-based studies of microsatellitesand other markers have not all been in full agreement (1, 411),a difference in the nature of the variable cannot be the solesource of differing estimates. First, the homogenizing effectof the higher mutation rates of microsatellites, in contrastwith those of other markers, probably explains some of the differenceof our results from nonmicrosatellite indicator-based studies(12). Second, consistent with past observations (13), the highfraction of tetranucleotide loci in our data contributes tohigher within-population variance component estimates (Table 1)than are seen in dinucleotide studies (3, 4, 7). Third, theestimates vary considerably across sampling schemes within regions,and in several cases (3, 6, 7), past microsatellite samplesthat included multiple groups per region used populations thatare among the most differentiated of the 52 groups in our data(Fig. 1). Any estimate computed with the well-separated populationsthat contribute to the 83.4% within-population variance componentobtained by Excoffier and Hamilton (2) should be regarded asa lower bound.
Fig. 1 . Effect of sampling scheme on variance component estimates. Using the five-region design, variance components were estimated (17) as in (1) for each of 100,000 subsets of populations, sampled randomly from among the 3 x 1015 subsets of the 52 populations in (1) for which all five regions were represented. Variance component estimates for a 14-population subsample corresponding to Barbujani et al. (3), a 9-population subsample corresponding to Calafell et al. (7), and the full 52-population data of Rosenberg et al. (1) are marked (B), (C), and (R), respectively. Subsample (B) is the same as subsample (B97) in (1). Subsample (C) includes Biaka, Druze, Han, Japanese, Maya, Mbuti, Melanesian, Surui, and Yakut. For the within-population and among-population within-region components, (B) had more extreme values than all but four of the subsets. Similar results were obtained for random subsets that included at least two populations per region.
[View Larger Version of this Image (15K GIF file)]
Table 1. Analysis of molecular variance for 45 di-, 58 tri- and 274 tetranucleotide loci from (1). The samples and estimation procedure (17) used are the same as in (1).
Sample
No. of regions
No. of populations
Repeat size
Variance components and 95% confidence intervals (%)
Within populations
Among populations within regions
Among regions
World
1
52
2
92.2 (91.5, 92.8)
7.8 (7.2, 8.5)
3
92.6 (91.8, 93.2)
7.4 (6.8, 8.2)
4
95.4 (95.1, 95.6)
4.6 (4.4, 4.9)
World
5
52
2
90.1 (89.2, 90.9)
2.9 (2.7, 3.2)
7.0 (6.1, 7.8)
3
90.5 (89.5, 91.3)
2.7 (2.4, 3.0)
6.8 (6.0, 7.8)
4
94.3 (93.9, 94.6)
2.3 (2.2, 2.4)
3.4 (3.1, 3.7)
World
7
52
2
91.4 (90.7, 92.1)
2.8 (2.5, 3.1)
5.8 (5.1, 6.6)
3
91.8 (90.9, 92.5)
2.5 (2.3, 2.8)
5.7 (5.0, 6.5)
4
95.0 (94.7, 95.2)
2.3 (2.2, 2.4)
2.8 (2.5, 3.0)
World-B97
5
14
2
85.9 (84.7, 86.8)
5.7 (4.9, 6.7)
8.4 (7.2, 9.7)
3
86.2 (84.8, 87.5)
4.9 (4.2, 5.7)
8.9 (7.7, 10.3)
4
91.2 (90.7, 91.7)
4.9 (4.6, 5.2)
3.9 (3.4, 4.4)
Allele sizes are important in microsatellite analysis, and typicalstudies, including our use of the data from (1) to investigatepopulation divergence and expansion (14), employ both sizesand indicator variables. However, although they are often useful,stepwise mutation models with length-independent transitionprobabilities, which underlie the approach used in (2), poorlypredict microsatellite allele size distributions in the humangenome compared with length-dependent models (15). Because ofthis issue and the frequent occurrence of multistep mutations(16), the model of Excoffier and Hamilton cannot be regardedas the "right mutation model," and the "minimum number of mutationsseparating the alleles" need not actually be minimal.
Finally, the main finding from studies of genetic variance components,supported by diverse analyses whose exact estimates have differed,is that the within-population variance component is much largerthan the other components. The relative importance of variousinfluences on the estimates could potentially be evaluated byfurther statistical analysis of the variation in the variancecomponent estimates themselves.
Noah A. Rosenberg
Molecular and Computational Biology University of Southern California 1042 West 36th Place DRB 289 Los Angeles, CA 90089, USA E-mail: noahr{at}usc.edu
Jonathan K. Pritchard
Department of Human Genetics University of Chicago 920 East 58th Street, CLSC 507 Chicago, IL 60637, USA
James L. Weber
Center for Medical Genetics Marshfield Medical Research Foundation Marshfield, WI 54449, USA
Howard M. Cann
Foundation Jean Dausset-CEPH 27 rue Juliette Dodu 75010 Paris, France
Kenneth K. Kidd
Department of Genetics Yale University School of Medicine 333 Cedar Street New Haven, CT 06520, USA
Lev A. Zhivotovsky
Vavilov Institute of General Genetics Russian Academy of Sciences 3 Gubkin Street Moscow 117809, Russia
Marcus W. Feldman
Department of Biological Sciences Stanford University Stanford, CA 94305, USA
12. L. Jin, R. Chakraborty, Heredity74, 274 (1995).
13. A. Ruiz Linares in Microsatellites: Evolution and Applications, D. B. Goldstein, C. Schlötterer, Eds. (Oxford University Press, Oxford, 1999), pp. 183197.
17. Variance components were estimated assuming independence of alleles within individuals, using the framework in chapter 5 of (18).
18. B. S. Weir, Genetic Data Analysis II (Sinauer, Sunderland, MA, 1996).
19. We thank E. Minch for clarifying ambiguities in (3).
Received for publication 18 March 2003. Accepted for publication 14 April 2003.
The editors suggest the following Related Resources on Science sites:
In Science Magazine
TECHNICAL COMMENTS
Laurent Excoffier and Grant Hamilton (20 June 2003) Science300 (5627), 1877b.
[DOI: 10.1126/science.1083411] |Full Text »|PDF »
REPORTS
Noah A. Rosenberg, Jonathan K. Pritchard, James L. Weber, Howard M. Cann, Kenneth K. Kidd, Lev A. Zhivotovsky, and Marcus W. Feldman (20 December 2002) Science298 (5602), 2381.
[DOI: 10.1126/science.1078311] |Abstract »|Full Text »|PDF »|Supporting Online Material »