Jump to: Page Content, Section Navigation, Site Navigation, Site Search, Account Information, or Site Tools.
|
|
Technical Comments
|
| 1. |
J. A. Bailey,
et al.,
Science
297,
1003
(2002)
|
| 2. | M. Giordano, C. Marchetti, E. Chiorboli, G. Bona, P. Momigliano Richiardi, Hum. Genet. 100, 249 (1997) [CrossRef] [ISI] [Medline]. |
| 3. |
S. Aradhya,
et al.,
Hum. Mol. Genet.
10,
2557
(2001)
|
| 4. |
P. Blanco,
et al.,
J. Med. Genet.
37,
752
(2000)
|
| 5. |
L. L. Han,
M. P. Keller,
W. Navidi,
P. F. Chance,
N. Arnheim,
Hum. Mol. Genet.
9,
1881
(2000)
|
| 6. | M. E. Hurles, BMC Genom. 2, 11 (2001) . |
| 7. |
T. Nagylaki,
Proc. Natl. Acad. Sci. U.S.A.
81,
3796
(1984)
|
Response: Hurles raises an excellent point: Assembly errors may not be the sole basis for the observed "SNP" enrichment. There are at least two possible explanations: (i) duplication-induced collapse of paralogous sequence variants (PSVs) (1), and (ii) gene conversion events among the duplicated segments (2). Both events likely contribute--but which is more probable in light of the current state of the genome assembly within duplicated regions?
In previous analyses (1, 3), we found that duplicated regions were in fact underrepresented (by 30 to 40%) within public assemblies. There were fewer copies in the sequence assembly than could be shown by experimental methods (1, 4). The large size of the duplication (100 kilobases) and the high degree of sequence identity between many duplications have led to such sequences being considered as allelic copies rather than representing independent loci. In this respect, it is noteworthy that "overlap" SNPs, which were largely determined by electronic comparison of Genbank sequences, contributed more significantly (2.6 times) to the enrichment compared with SNPs assigned randomly (1.28 times). In addition to collapse, subsequent examination of dbSNP has revealed that many other "overlap" SNPs are annotated as "ambiguously mapped" and are in fact assigned to more than one location (5). Thus, although gene conversion remains a likely source for some of the "SNP" abundance, this effect cannot be satisfactorily addressed without concomitant elimination of the artifacts. We think that these artifacts of our genome provide the most prosaic explanation for this increase. Further experimental validation is required. The regions that we have identified as being increased in SNP density and at the transition of unique and duplicated sequence provide logical targets to assess this effect, especially as the genome nears completion and its quality substantially improves within these areas.
Finally, it was not our intention to intimate that the 100,000 variants underlying these duplicated regions were "useless." The variants are, in fact, incredibly important from a practical and evolutionary perspective. Such variants have proved valuable in resolving the structure of these duplicated regions (6, 7) and in providing a baseline to begin to address such issues as positive selection and gene conversion (8). However, for the average user of dbSNP interested in using SNPs in association-based mapping studies, there is the tacit assumption that the SNP maps to a unique region in the genome. The increased density of SNPs within duplicated regions, whether they arise from errors in assembly or gene conversion, will certainly obfuscate and frustrate these types of analyses. We believe that acknowledging this potential contaminant within dbSNP, and precisely demarcating the positions of these regions which associate with duplications, constitutes a useful--indeed, an essential--first step.
Jeff Bailey
Evan Eichler
Department of Genetics,
Center for Computational Genomics, and
Center for Human Genetics
Case Western Reserve University
School of Medicine and University
Hospitals of
Cleveland
Cleveland, OH 44060, USA
| 1. |
J. A. Bailey,
A. M. Yavor,
H. F. Massa,
B. J. Trask,
E. E. Eichler,
Genome Res.
11,
1005
(2001)
|
| 2. | M. E. Hurles, BMC Genom. 2, 11 (2001) . |
| 3. | The International Human Genome Sequencing Consortium, Nature 409, 860 (2001). |
| 4. | V. G. Cheung, et al., Nature 409, 953 (2001) [CrossRef] [Medline]. |
| 5. |
X. Estivill,
et al.,
Hum. Mol. Genet.
11,
1987
(2002)
|
| 6. |
J. Horvath,
S. Schwartz,
E. Eichler,
Genome Res.
10,
839
(2000)
|
| 7. | T. Kuroda-Kawaguchi, et al., Nature Genet. 29, 279 (2001) [CrossRef] [ISI] [Medline]. |
| 8. | M. E. Johnson, et al., Nature 413, 514 (2001) [CrossRef] [Medline]. |
Science. ISSN 0036-8075 (print), 1095-9203 (online)