Note to users. If you're seeing this message, it means that your browser cannot find this page's style/presentation instructions -- or possibly that you are using a browser that does not support current Web standards. Find out more about why this message is appearing, and what you can do to make your experience of our site the best it can be.
Evolutionary and Biomedical Insights from the Rhesus Macaque Genome
Rhesus Macaque Genome Sequencing and Analysis Consortium: *Richard A. Gibbs,1,2Jeffrey Rogers,3Michael G. Katze,4Roger Bumgarner,4George M. Weinstock,1,2Elaine R. Mardis,5Karin A. Remington,6Robert L. Strausberg,6J. Craig Venter,6Richard K. Wilson,5Mark A. Batzer,7Carlos D. Bustamante,8Evan E. Eichler,9Matthew W. Hahn,10Ross C. Hardison,11Kateryna D. Makova,11Webb Miller,11Aleksandar Milosavljevic,1,2Robert E. Palermo,4Adam Siepel,8James M. Sikela,12Tony Attaway,1,2Stephanie Bell,1,2Kelly E. Bernard,5Christian J. Buhay,1,2Mimi N. Chandrabose,1,2Marvin Dao,1,2Clay Davis,1,2Kimberly D. Delehaunty,5Yan Ding,1,2Huyen H. Dinh,1,2Shannon Dugan-Rocha,1,2Lucinda A. Fulton,5Ramatu Ayiesha Gabisi,1,2Toni T. Garner,1,2Jennifer Godfrey,5Alicia C. Hawes,1,2Judith Hernandez,1,2Sandra Hines,1,2Michael Holder,1,2Jennifer Hume,1,2Shalini N. Jhangiani,1,2Vandita Joshi,1,2Ziad Mohid Khan,1,2Ewen F. Kirkness,6Andrew Cree,1,2R. Gerald Fowler,1,2Sandra Lee,1,2Lora R. Lewis,1,2Zhangwan Li,1,2Yih-shin Liu,1,2Stephanie M. Moore,1,2Donna Muzny,1,2Lynne V. Nazareth,1,2Dinh Ngoc Ngo,1,2Geoffrey O. Okwuonu,1,2Grace Pai,6David Parker,1,2Heidie A. Paul,1,2Cynthia Pfannkoch,6Craig S. Pohl,5Yu-Hui Rogers,6San Juana Ruiz,1,2Aniko Sabo,1,2Jireh Santibanez,1,2Brian W. Schneider,1,2Scott M. Smith,5Erica Sodergren,1,2Amanda F. Svatek,1,2Teresa R. Utterback,1,2Selina Vattathil,1,2Wesley Warren,5Courtney Sherell White,1,2Asif T. Chinwalla,5Yucheng Feng,5Aaron L. Halpern,6LaDeana W. Hillier,5Xiaoqiu Huang,13Pat Minx,5Joanne O. Nelson,5Kymberlie H. Pepin,5Xiang Qin,1,2Granger G. Sutton,6Eli Venter,6Brian P. Walenz,6John W. Wallis,5Kim C. Worley,1,2Shiaw-Pyng Yang,5Steven M. Jones,14Marco A. Marra,14Mariano Rocchi,15Jacqueline E. Schein,14Robert Baertsch,16Laura Clarke,17Miklós Csürös,18Jarret Glasscock,5R. Alan Harris,1,2Paul Havlak,1,2Andrew R. Jackson,1,2Huaiyang Jiang,1,2Yue Liu,1,2David N. Messina,5Yufeng Shen,1,2Henry Xing-Zhi Song,1,2Todd Wylie,5Lan Zhang,1,2Ewan Birney,17Kyudong Han,7Miriam K. Konkel,7Jungnam Lee,7Arian F. A. Smit,19Brygg Ullmer,20Hui Wang,7Jinchuan Xing,7,21Richard Burhans,11Ze Cheng,9John E. Karro,11Jian Ma,22Brian Raney,22Xinwei She,9Michael J. Cox,12Jeffery P. Demuth,10Laura J. Dumas,12Sang-Gook Han,10Janet Hopkins,12Anis Karimpour-Fard,23Young H. Kim,24Jonathan R. Pollack,24Tomas Vinar,8Charles Addo-Quaye,11Jeremiah Degenhardt,8Alexandra Denby,8Melissa J. Hubisz,25Amit Indap,8Carolin Kosiol,8Bruce T. Lahn,25,26Heather A. Lawson,11Alison Marklein,8Rasmus Nielsen,27Eric J. Vallender,25,26Andrew G. Clark,28Betsy Ferguson,29Ryan D. Hernandez,8Kashif Hirani,1,2Hildegard Kehrer-Sawatzki,30Jessica Kolb,30Shobha Patil,1,2Ling-Ling Pu,1,2Yanru Ren,1,2David Glenn Smith,3David A. Wheeler,1,2Ian Schenck,11Edward V. Ball,31Rui Chen,1,2David N. Cooper,31Belinda Giardine,11Fan Hsu,22W. James Kent,22Arthur Lesk,11David L. Nelson,2William E. O'Brien,2Kay Prüfer,32Peter D. Stenson,31James C. Wallace,4Hui Ke,33Xiao-Ming Liu,34Peng Wang,33Andy Peng Xiang,33Fan Yang,33Galt P. Barber,22David Haussler,35,16Donna Karolchik,22Andy D. Kern,22Robert M. Kuhn,22Kayla E. Smith,22Ann S. Zwieg22
The rhesus macaque (Macaca mulatta) is an abundant primate speciesthat diverged from the ancestors of Homo sapiens about 25 millionyears ago. Because they are genetically and physiologicallysimilar to humans, rhesus monkeys are the most widely used nonhumanprimate in basic and applied biomedical research. We determinedthe genome sequence of an Indian-origin Macaca mulatta femaleand compared the data with chimpanzees and humans to revealthe structure of ancestral primate genomes and to identify evidencefor positive selection and lineage-specific expansions and contractionsof gene families. A comparison of sequences from individualanimals was used to investigate their underlying genetic diversity.The complete description of the macaque genome blueprint enhancesthe utility of this animal model for biomedical research andimproves our understanding of the basic biology of the species.
1 Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA. 2 Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA. 3 Department of Genetics, Southwest Foundation for Biomedical Research, San Antonio, TX 78227, USA. 4 Department of Microbiology, University of Washington, Seattle, WA 98195, USA. 5 Genome Sequencing Center, Washington University, St. Louis, MO 63108, USA. 6 J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA. 7 Department of Biological Sciences, Biological Computation and Visualization Center, Center for BioModular Multi-scale Systems, Louisiana State University, Baton Rouge, LA 70803, USA. 8 Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA. 9 Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA. 10 Department of Biology and School of Informatics, Indiana University, Bloomington, IN 47405, USA. 11 Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, University Park, PA 16802, USA. 12 Human Medical Genetics and Neuroscience Programs, Department of Pharmacology, University of Colorado at Denver and Health Sciences Center, Aurora, CO 80045, USA. 13 Department of Computer Science, Iowa State University, Ames, IA 50011, USA. 14 Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, Canada. 15 Department of Genetics and Microbiology, University of Bari, Bari, Italy. 16 Department of Bioinformatics, University of California Santa Cruz, Santa Cruz, CA 95060, USA. 17 The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK. 18 Département d'Informatique et de Recherche Opérationnelle, Université de Montréal, Montréal, QC H3C 3J7, Canada. 19 Institute for Systems Biology, 1441 North 34th Street, Seattle, WA 981038904, USA. 20 Center for Computation and Technology, Department of Computer Sciences, Louisiana State University, Baton Rouge, LA 70803, USA. 21 Eccles Institute of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA. 22 Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA. 23 Department of Preventative Medicine and Biometrics, University of Colorado at Denver and Health Sciences Center, Aurora, CO 80045, USA. 24 Department of Pathology, Stanford University, Stanford, CA 94305, USA. 25 Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA. 26 Howard Hughes Medical Institute, Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA. 27 Institute of Biology, University of Copenhagen, Copenhagen DK-1017, Denmark. 28 Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA. 29 Genetics Research and Informatics Program, Oregon National Primate Research Center, Beaverton, OR 97006, USA. 30 Institute of Human Genetics, University of Ulm, Ulm, 89081, Germany. 31 Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK. 32 Department Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, 04103, Germany. 33 Centre for Stem Cell Biology and Tissue Engineering, Sun Yat-sen University, Guangzhou 510080, China. 34 South-China Primate Research and Development Center, Guangzhou 510080, China. 35 Howard Hughes Medical Institute, Santa Cruz, CA 95060, USA.
All authors with their contributions and affiliations appearat the end of this paper.
* To whom correspondence should be addressed. Richard A. Gibbs, E-mail: agibbs{at}bcm.edu
Rhesus macaques (Macaca mulatta) (1) are one of the most frequentlyencountered and thoroughly studied of all nonhuman primates(table S1.1). They have a broad geographic distribution thatreaches from Afghanistan and India across Asia to the Chineseshore of the Pacific Ocean. As an Old World monkey (superfamilyCercopithecoidea, family Cercopithecidae), this species is closelyrelated to humans and shares a last common ancestor from about25 million years ago (Mya) (2). The two species often live inclose association, and macaques exhibit complex and intenselysocial behavioral repertoires.
The relationship between humans and macaques is even more importantbecause biomedical research has come to depend on these primatesas animal models. Compared with rodents, which are separatedfrom humans by more than 70 million years (2, 3), macaques exhibitgreater similarity to human physiology, neurobiology, and susceptibilityto infectious and metabolic diseases. Critical progress in biomedicineattributed to macaques includes the identification of the "rhesusfactor" blood groups and advances in neuroanatomy and neurophysiology.Most important, their response to infectious agents relatedto human pathogens, including simian immunodeficiency virusand influenza, has made macaques the preferred model for vaccinedevelopment. Lesser-known contributions of these animals includetheir early use in the U.S. space programa rhesus monkeywas launched into space more than a dozen years before any chimpanzee.
The cynomolgus macaque (M. fascicularis), pigtailed macaque(M. nemestrina), and Japanese macaque (M. fuscata) have allcontributed to research, but the rhesus macaque has been usedmost widely. Taxonomists recognize six M. mulatta subspecies(1), which differ substantially in their geographical range,body size, and a variety of morphological, physiological, andbehavioral characteristics. North American research coloniesinclude animals representing both Indian and Chinese subspecies,although India ended the exportation of these animals in the1970s.
With the advent of whole-genome sequencing, a highly accuratehuman genome sequence and a draft of the chimpanzee genome havebeen generated and compared. The chimpanzee shared a commonancestor with humans approximately 6 Mya (4, 5), and the majorimpact of the chimpanzee genome sequence data has been in theirdirect comparison with data from the human genome. However,the chimpanzee data have major limitations. First, because thealignable sequence is only 1 to 2% different from that of thehuman, there is no informative "signal" to distinguish conservedelements from the overall high background level of conservation.This is exacerbated by the fact that the chimpanzee genome wasan incomplete draft, containing sequence errors that could potentiallymask true divergence. Second, the differences that are foundbetween humans and chimpanzees are difficult to assign as specificto either the chimpanzee or the human. As a result, the chimpanzeeanalyses have on their own provided relatively few answers tothe fundamental question of the nature of the specific molecularchanges that make us human.
By contrast, the genome of the rhesus macaque has diverged fartherfrom our own, with an average human-macaque sequence identityof 93%. Figure 1 shows the inferred common ancestor for allthree species, as well as a common ancestor that predated thehuman-chimpanzee divergence. A characteristic that is foundin humans but not in the chimpanzee can be recognized as a lossin the chimpanzee if it is present in the macaque, or it canbe recognized as a gain in the human if it is absent in macaque.In principle, this three-way comparison should make it possibleto pinpoint many changes and identify specific underlying mutationalmechanisms, which could have been critically important duringthe past 25 million years in shaping the biology of the threeprimate species.
Fig. 1. Evolutionary triangulation in the human, chimpanzee and rhesus macaque lineages (lineage-specific breaks), showing a summary of chromosomal breakpoints on a microscopic scale (Fig. 3) (7). Circled numbers indicate numbers of lineage-specific breaks.
[View Larger Version of this Image (47K GIF file)]
We examined the basic elements of the rhesus macaque genomeand undertook reconstruction of the major changes in the human-chimpanzeerhesusmacaque (HCR) trio. The regions of the genome that were duplicatedin macaque were then identified and correlated with other genomefeatures. Individual macaque genes were studied, and the orthologousgenes in the HCR trio were aligned to reveal evidence for theaction of selection on individual loci. Additional animals fromother populations were also sampled by DNA sequencingtostudytheir genetic diversity. Throughout, complementary methods wereapplied and the different results combined in order to representthe most complete picture of macaque biology. For a visual representationof some of the insights gained from the genome and more informationabout the importance of the macaque as a model organism, seethe poster in this issue (6).
Sequencing the Genome
To generate a draft genome sequence for the rhesus macaque,whole-genome shotgun sequences were assembled. The bulk of thesequencing used DNA from a single M. mulatta female, whereasDNA from an unrelated male was used to construct a bacterialartificial chromosome (BAC) library to provide BAC end sequencesand to aid in selective finishing. We used several whole-genomeshotgun libraries with different insert sizes (3.0, 10, 35,and 180 kb) to generate a total of 18.4 Gb of raw DNA sequencethrough standard fluorescent Sanger sequencing technologies.Initial assemblies to the intermediate scaffold stage were carriedout by the three different assembly methods: Atlaswhole-genomeshotgun, parallel contig assembly program (PCAP), and the CeleraAssembler (7). These were compared by means of more than 200metrics, including gross sequence statistics, agreement withfinished sequence, utility for gene predictions in the Ensemblpipeline, and accuracy of alignment to the human genome. Thethree unpolished assemblies were found to be largely similarand of high quality, so all were used in combination with othergenome data for the subsequent assembly and placement of longsequence segments on the macaque chromosomes (tables S2.1 toS2.4).
To produce an optimal representation of the genome, the threeintermediate assemblies were merged (Fig. 2). Melding the assembliesinvolved mapping the Atlaswhole-genome shotgun and PCAPdata to the Celera Assembler output, which had longer contiguitythan the other two data sets at this stage of the process. Therewas little difference between assemblies at the sequence contiglevel, at which robust sequence alignments guide the reconstructions,so we focused our attention instead on contigs that were joinedinto scaffolds. Additional pairs of Celera Assembler scaffoldswere joined based on their mapping to the other two macaqueassemblies. Analysis of the output showed that this compositeassembly was superior to any of its components (table S2.4).
Fig. 2. Assembly by three methods of the rhesus macaque genome. WGS, whole-genome shotgun. BCM-HGSC, Baylor College of Medicine Human Genome Sequencing Center; WashU-GSC, Washington University Genome Sequencing Center; JCVI, J. Craig Venter Institute. QA/QC, quality assurance and quality control.
[View Larger Version of this Image (27K GIF file)]
During assembly, a comparison with the human genome sequence[National Center for Biotechnology Information (NCBI) accessioncode bld35] identified a small number (<100) of obvious inconsistencies,such as improper joins of different chromosomes. These scaffoldswere therefore split at the misassembly point. The human mapwas also used to help place large merged scaffolds onto themacaque chromosomes (8, 9) [the chromosome numbering of Rogerset al. (8) was used] at the highest level of the assembly process.Given that the human data were only used to split scaffoldsand that de novo macaque assemblies were always given precedenceover the mapping to the human genome in the macaque assemblymerging and chromosome assignment process, the final productshould not be regarded as a "humanized assembly."
The total length of the combined genome assembly was approximately2.87 Gb (Table 1). This incorporated 14.9 Gb of raw sequence,which represents about a 5.2-fold coverage of the macaque genome.Comparison with expressed sequence tag (EST) sequence data andapproximately 1.8 Mb of finished sequence (see "Selected sequencefinishing," below) indicated that 98% of the available genomewas represented. No misassemblies were identified in that comparison.Contigs showed an N50 (minimum length of contigs representinghalf of the total length of the assembly) of >25 kb; theN50 for sequence scaffolds was >24 Mb. GenBank accessioncodes are available online (table S2.5).
Table 1.M. mulatta assembly statistics. Total bases, excluding gaps, number 2,871,189,834.
Contigs
Scaffolds
Total number
301,039
122,580
N50 size in bp
25,707
24,345,431
Number to N50
32,114
36
Largest in bp
219,335
98,200,701
Selected sequence finishing. The rhesus macaque genome assemblyis a draft DNA sequence, and it contains many gaps. A higherdata quality with greater contiguity was desired at severalgenomic regions that attracted additional interest. In thesecases, individual BAC clones were isolated, and data qualitywas improved by sequence "finishing." Many of these BACs werein regions of pronounced genome duplication, whereas otherswere gene-rich. All finished BACs, their gene content, and theirgenome coordinates are listed in table S2.6.
Overview of Genome Features
General organization and content. The macaque genome is organizedinto 20 autosomes and the XY sex chromosomes. With the exceptionof 48 breakpoints (Fig. 1)including three fusions, onefission, and breakpoints induced by inversions that are eachdetectable through chromosome staining, by radiation hybridmapping, or by comparative linkage mappingthere is asuperficial similarity between the macaque and human chromosomes(811). Several chromosomes in the macaque are also moreacrocentric than their human counterparts, but many from thetwo species are difficult to distinguish.
Nucleotide sequences that aligned between the human and rhesusaverage 93.54% identity. If, however, small insertions and deletionsare included in the calculation, identity is reduced to 90.76%.Considering regions that are difficult to align, such as lineage-specificinterspersed repeat elements, would further decrease the levelof computed identity. Moreover, evolutionary distances exhibitlocal fluctuations, as in other mammals (3), and less divergencewas observed in chromosome X (94.26% identity of aligned bases).The GC-content of the rhesus in aligned bases was not notablylower than that of the human (40.71% versus 40.74%).
Gene content. A human-centric approach was used to generatenew macaque gene sets (table S3.1 and fig. S3.1). These setsinclude (i) Ensembl (12) gene models based primarily on thealignment of the human Uniprot and RefSeq resources with thecurrent assembly to define the overall gene model, followedby the introduction of the macaque-specific sequences (mainlyas lineage-specific paralogs) in that framework; (ii) Gnomen(NCBI) models that include the consideration of the available(50,000) macaque ESTs along with the human RefSeq; and (iii)Nscan data that include multiple-species alignments along withcDNA alignments (13). Overall, 20,000 loci were predicted byour methods in which at least one exon was found by two additionalpredictors. An additional 5000 loci were each predicted by asingle method, but manual inspection of a subset of these locishows that they are enriched in gene-prediction errors, mainlydue to mis-classification of evidence (e.g., cDNAs from untranslatedregions that were classified as containing protein coding).On average, high-confidence orthologs have 97.5% identity betweenthe human and macaque at both the nucleotide and amino acidsequence levels. (The nucleotide and amino acid percentagesagree because roughly one-third of nucleotide differences withincoding regions change an amino acid.)
Overall repetitive landscape. Repeat elements account for 50%of the genomes of all sequenced primates (14) (Table 2). Similarto the human, the rhesus macaque contains about 320,000 recognizablecopies from more than 100 different families of DNA transposonsand more than half a million recognizable copies of endogenousretroviruses (ERVs). In general, the DNA transposons show nonew lineages, but the ERVs demonstrate a complex phylogeny andmany examples of new and expanded family members, some resultingfrom horizontal transmission. In addition, we conservativelyestimate that 20,000 L1s [a family of long interspersed elements(LINEs)], and 110,000 Alu elements [a primate-specific familyof short interspersed elements (SINEs)], were specifically acquiredin the Old World monkey lineage. These two retrotransposon familiesaccounted for most lineage-specific insertions and have playeda major role in shaping genomic architecture. Among them, rhesusmacaquespecific subsets (derived from the L1PA5 lineageand AluY) are frequently polymorphic and can be assayed by polymerasechain reaction (PCR) genotyping analyses for genetic studies(15).
Table 2. Summary of repeat content of the rhesus macaque genome compared with the human and chimpanzee genomes. hg18, human genome version 18; panTro2, Pan troglodytes version 2; rheMac2, rhesus macaque version 2; LTR, long terminal repeat; MIR, mammalian interspersed repeat. SVA is a composite repetitive element named after its main components, SINE, variable number of tandem repeats, and Alu; includes SVA precursor elements.
Species
DNA
LTR/ERV
LINE
SINE
SVA
L1
L2
Alu
MIR
hg18
355,000
506,000
572,000
363,000
1,144,000
584,000
3400
panTro2
305,000
453,000
558,000
315,000
1,111,000
553,000
4400
rheMac2
327,000
432,000
531,000
298,000
1,094,000
539,000
150
Determining Ancestral Genome Structure
Cytogenetically visible rearrangements. The most notable genomicdifferences among the HCR trio are the presence of cytogeneticallyvisible rearrangements. The human and chimpanzee karyotypesare distinguishable by one chromosome fusion and nine cytogeneticallyvisible pericentric inversions (16); with the use of the macaqueas an outgroup, all of these breakpoints (except those inducedby two inversions) have now been characterized at the DNA sequencelevel (17). Analysis of genomic sequence confirms that 14 breakpoints,corresponding to seven inversions, occurred in the chimpanzeelineage, as indicated in Fig. 1. (Five of the inversions aresummarized in table S4.1.) The pericentric inversions of humanchromosomes 1 and 18 and the fusion creating human chromosome2 are specific to the human. Comparison of the reconstructedhuman-chimpanzee ancestral genome and the rhesus genome reveals43 breakpoints on the microscopic scale (Figs. 1 and 3).
Fig. 3. Chromosomal breakpoints between rhesus macaque and the human-chimpanzee ancestor. Each chromosome is represented by a white bar (left) and a colored bar (right). A total of 820 thin horizontal lines in the white bars represent submicroscopic breakpoints (10-kbp to 4-Mbp range) detected by genomic triangulation (19), and 43 thick black lines in the colored bars represent breakpoints on a microscopic scale (>4 Mbp) (7). Numbers above each bar show the total lines within the bar.
[View Larger Version of this Image (27K GIF file)]
Submicroscopic rearrangements. Previous analyses [reviewed in(14)] have indicated that primate genomes harbor more structuraldifferences than visible by cytogenetic staining. Analysis ofthese events is complicated by two issues: the draft state ofthe genomes and the presence of extensive segmental duplications.We analyzed these structural rearrangements by using the distancebetween orthologous blocks in each species to infer the ancestralgenome structure and determine where rearrangements occurredon the phylogenetic tree. We excluded events smaller than 10kilobase pairs (kbp), which are mostly due to retroposon insertions,and focused on cytogenetically undetectable breakpoints inducedby insertions, deletions, inversions, and complex rearrangementsof sizes between 10 kbp and 4 Mbp. Data were combined from inversiondetection and ancestral reconstructions by the contiguous ancestralregions method (18) and gap detection by the genomic triangulationmethod (19), which further integrates data from genomic sequencecomparisons (20) and comparative maps (8, 9, 21). The analysisrevealed more than 1000 rearrangement-induced breakpoints throughthe HCR lineages, of which 820 occur between rhesus and thereconstructed human-chimpanzee ancestor (Fig. 3 and fig. S4.1).Each chromosome therefore constitutes a complex mosaic, withmultiple changes introduced to orthologous counterparts. Whenrhesus macaque is compared with the human-chimpanzee ancestor,the X chromosome exhibits three times more rearrangements permegabase than the autosomes. This is both statistically significantand consistent with a slightly more than threefold differenceobserved in the human lineage following the branching off ofchimpanzee (19). Given that a slower rate of variability atthe single-nucleotide level in the X chromosome compared withautosomes has been interpreted as support for speciation models,this difference is worthy of further investigation (22).
Duplications in the Genome and Gene Family Expansions
Genomic Duplications. Segmental duplication of genomic regionsand the genes they contain are well known in mammals and arepostulated to drive fundamental processes, including the birthof new genes and the subsequent expansion of gene families (23).To discover duplications in the macaque genome, we used a batteryof different complementary approaches. Two of these, whole-genomeassembly comparison (24) and BLASTZ (25) analysis of segmentalduplications, depended directly on the assembly. We used a thirdmethod, whole-genome shotgun sequence detection (26), that calculateddepth of coverage of the raw shotgun sequence reads relativeto the assembly. A fourth procedure was created on the basisof BAC end sequence reads combined with BACs that were directlymapped by means of the pooled genomic indexing method (21).The common interspersed repeat families were not consideredin any of these analyses.
The first two approaches identified approximately 35.0 Mb ofa recently duplicated sequence in the macaque assembly. A further15 Mb were collapsed in the assembly and discovered by whole-genomeshotgun sequence detection (fig. S5.1 and table S5.1). Adjustingfor these collapsed duplications and the overall assembly coverage,we estimate that approximately 66.7 Mb or 2.3% of the macaquegenome consists of segmental duplication (Fig. 4)thisproportion is substantially lower than that of either the humanor chimpanzee genome (5 to 6%) (26, 27).
Fig. 4. Global pattern of macaque segmental duplications. The statistics are based on all WGAC duplications (> 90%, >1 kb in length), whereas the figure displays only those between 90 and 95% sequence identity and >10 kb in length for simplicity. Red lines indicate interchromosomal (Inter) duplications, blue ticks show intrachromosomal (Intra) events, and purple bars show centromeric, acrocentric, and/or large-gap regions. WGAC, whole-genome assembly comparison. nr, nonredundant.
[View Larger Version of this Image (56K GIF file)]
The pooled genomic indexing and BAC end sequence read methodssuggested slightly higher levels of overall duplication, onthe basis of fluorescence in situ hybridization analysis ofrandomly selected large-insert BAC clones (28). However, thisestimate was still less than the 4.8% recently estimated forthe baboon genome (28). Overall, we consider 2.3% to be thelower bound of duplicated genomic DNA in the macaque genome.
As with the human and chimpanzee, the analysis of the macaqueassembly revealed an enrichment of segmental duplications neargaps, centromeres, and telomeres (14, 29). The study also identifiedsegmental duplications that contain genes of high biologicalsignificance. For example, the CCL3L1-CCL4 gene region [forwhich copy-number variation in humans is correlated with susceptibilityto HIV infection (30)], cytochrome P450 (associated with toxicityresponse), KRAB-C2H2 zinc finger (a developmental regulatorytranscription factor), olfactory receptor (smell), human leukocyteantigen (HLA), and other immune and autoantigen gene familieswere all observed in regions of genome duplication.
Expansion of gene families. Two approaches were used to studygene family structure directly within the draft genome sequence:(i) a statistical approach, based on a likelihood model of genegain and loss across the mammalian tree (31) and (ii) hybridizationof whole genomic DNA to cDNA arrays [a variation of array-basedcomparative genomic hybridization (array CGH)] to observe changesin gene content directly (32). The results are shown in Tables3 and 4.
Table 3. Gene families with significant copy-number expansions (P < 0.0001) in the human and the identical statistic for the rhesus macaque. Gene family ID, identification numbers from Ensembl version 41. Family size, number of gene copies in the current genome assemblies. Gains and losses, number of genes gained and lost since the human's split with chimpanzee or the macaque's split with human-chimpanzee lineage. IG, immunoglobulin; IGE, immunoglobulin E; Pre, precursor; MHC, major histocompatibility complex; TCR, T cell receptor; ENV, envelope; ATP, adenosine 5'-triphosphate.
Gene family ID
Description
Family size
Gains
Losses
Expanded in human
ENSF00000000020
IG heavy chain V region
42
10
0
ENSF00000000073
Receptor
56
16
0
ENSF00000000233
Peptidyl prolyl cis trans isomerase
38
9
0
ENSF00000000312
Histone H2b
28
7
0
ENSF00000000597
Golgin subfamily A
49
26
0
ENSF00000000664
Ankyrin repeat domain
33
9
0
ENSF00000000822
Unknown
15
9
0
ENSF00000000841
Tripartite motif
21
7
1
ENSF00000000936
Centaurin gamma
15
9
0
ENSF00000001036
Cold inducible RNA binding
22
8
0
ENSF00000001546
Ubiquitin carboxyl terminal hydrolase
16
13
2
ENSF00000001599
Leucine-rich repeat
14
7
0
ENSF00000001665
DNA mismatch repair PMS2
12
5
0
ENSF00000001738
Unknown
15
7
0
ENSF00000001920
40S ribosomal S26
13
7
1
ENSF00000001974
Unknown
17
3
0
ENSF00000002160
Double homeobox
15
13
0
ENSF00000002570
Keratin associated 5
7
2
0
ENSF00000003683
Unknown
5
3
0
ENSF00000004835
Ambiguous
13
9
0
Expanded in macaque
ENSF00000000014
HLA class I
17
12
0
ENSF00000000037
HLA class I
16
10
0
ENSF00000000070
Keratin type I
65
30
0
ENSF00000000077
Histone H3
32
11
0
ENSF00000000085
IG kappa chain V region
47
22
2
ENSF00000000138
Keratin type II
39
10
0
ENSF00000000150
Taste receptor type 2
23
9
0
ENSF00000000178
Aldo keto reductase family 1
19
9
0
ENSF00000000397
Ral guanine nucleotide dissoc stim.
19
10
1
ENSF00000000432
Killer cell IG receptor Pre MHC class I
9
3
0
ENSF00000000630
TCR beta chain V region Pre
18
9
0
ENSF00000000705
ENV polyprotein
13
11
0
ENSF00000000766
60S ribosomal l7A
26
17
1
ENSF00000000773
Ribosomal l7
23
12
0
ENSF00000000826
60S ribosomal l23A
20
6
0
ENSF00000001027
60S ribosomal l17
12
3
0
ENSF00000001077
Nucleoplasmin
17
9
0
ENSF00000001211
67-kD laminin
18
10
0
ENSF00000001235
Nonhistone chromosomal HMG 17
24
12
0
ENSF00000001236
60S ribosomal l31
23
11
0
ENSF00000001249
60S ribosomal l12
16
8
0
ENSF00000001359
USP6 N terminal
14
10
0
ENSF00000001460
Prohibitin
7
4
0
ENSF00000001671
60S ribosomal l32
10
6
0
ENSF00000001861
40S ribosomal S10
9
5
0
ENSF00000002239
60S ribosomal l19
8
5
0
ENSF00000002279
40S ribosomal S17
8
4
0
ENSF00000002476
60S ribosomal l18
7
4
0
ENSF00000002633
IGE binding
19
14
0
ENSF00000003321
Argininosuccinate synthase
9
6
0
ENSF00000003395
10-kD heat shock protein
11
8
0
ENSF00000004083
ATP synthase subunit G
4
3
0
ENSF00000007347
Unknown
7
3
0
Table 4. Genes identified as expanded in copy number in the macaque, relative to the human, by the array CGH method. The leftmost column represents IMAGE cDNA clones that show array CGHpredicted copy number increases in the rhesus macaque relative to the human. The middle two columns list corresponding gene names and array CGH log2 macaque-to-human ratios. The rightmost column presents BLAT-predicted copy numbers based on rheMac2 and hg18 genome assemblies.
* Consistent with computational analysis of gene family gains and losses.
BLAT-based copy-number estimates of rheMac2 and hg18 genome assemblies that are consistent with array CGH predictions.
The statistical approach revealed that 1358 genes were gainedby duplication along the macaque lineage. This method simultaneouslyestimates rates of change along individual lineages and generatesa quantitative assessment of confidence in rate differencesamong lineages. Iterative modeling revealed higher rates inprimates, relative to other mammals. The rates are similar tothose obtained by independent methods in both humans (33) androdents (3).
We identified 108 gene families, computationally predicted tohave changed in size among the primates, evolving at a significantlyhigher rate than the overall primate rates of gene gain andloss (all P < 0.0001, Table 3). More than 60% of the macaque-specificexpansions display evidence of positive selection in their codingsequences, supporting the notion that this rate disparity maybe driven by natural selection.
Gene copy-number estimates by genomic hybridization (cDNA arrayCGH) (32) identified 51 genes (124 cDNAs) with copy-number increasesin the macaque, relative to the human (Table 4 and table S5.2).Of these array CGH-predicted macaque-specific increases, 33%(17 out of 51) were also found by computational analysis ofgene family gains and losses. A separate analysis found that55% (28 out of 51) are increased in copy number as estimatedby BLAST-like Alignment Tool (BLAT)based (34) predictionsfrom the rheMac2 assembly. In contrast, when random sets ofgenes (cDNAs) were chosen for BLAT queries, only 1.45% suggestcopy-number increases (P < 0.0001).
The genome-wide acceleration identified in primates may be dueto an explosion in the number of Alu transposable elements inthe primate ancestor, which may have allowed an increase inthe rates of nonallelic homologous recombination, leading tohigher rates of both duplication and deletion (35). Alternatively,the rates of duplicate gene fixation may be due to the smallpopulation size in primates (36) relative to rodents.
Particular expanded gene families. Expansion of individual genefamilies may help to identify processes that distinguish biologicalfeatures among organisms. One example in humans is the preferentiallyexpressed antigen of melanoma (PRAME) gene family that consistsof a single gene on chromosome 22q11.22 and a cluster of severaldozen genes on chromosome 1p36.21. PRAME and PRAME-like genesare actively expressed in cancers but normally manifest testis-specificexpression and may thus have a role in spermatogenesis. Thegenomic organization is complicated; the cluster on human chromosome1 exhibits copy-number variation in human populations (37, 38)and, together with a similar orthologous cluster on mouse chromosome4, apparently arose by translocation not long before the divergenceof primates and rodents, about 85 Mya (39) (Fig. 5 and fig S5.2).After that translocation event, the human and mouse gene clustersexpanded independently. Evidence for positive selection hasbeen found in these genes, and two segmental duplications postdatinghuman-chimpanzee divergence added about a dozen genes to thehuman cluster.
Fig. 5. Organization of the PRAME gene cluster in the HCR lineages. (A) Maximum-likelihood phylogeny for PRAME-like genes in the human (H), chimpanzee (P), and rhesus macaque (M) genomes. Colored circles indicate inferred duplication events, partial genes are shown in italics, and branches showing significant evidence of positive selection are colored orange (P values are shown above orange lines). Scale bar, 0.05 substitutions per site. (B) Another view of the same phylogeny, showing the duplication history in the context of the species tree (7).
[View Larger Version of this Image (23K GIF file)]
To properly resolve evolutionary changes in the PRAME gene family,we further sequenced six macaque BAC clones to achieve a higherdata quality, and we assembled them into a single contig (tableS2.6). These eight PRAME genes were compared with human andchimpanzee genes identified from the latest assemblies for bothspecies. We estimated a phylogeny for all identified genes,designating the mouse gene cluster and the human PRAME geneon chromosome 22 as outgroups. We then reconciled this genetree with the species tree by maximum parsimony. Our reconstructionreveals extensive duplication early in primate evolution (Fig. 5B,branch a), in recent chimpanzee evolution (Fig. 5B, branch d),and, most notably, in recent human evolution (Fig. 5B, branche). The PRAME gene cluster appears to have been much less dynamicon the macaque lineage (Fig. 5B, branch b) and in early hominins(the human and chimpanzee branch, Fig. 5B, branch c). A largeinverted tandem duplication occurred on the macaque lineageshortly after divergence from the human lineage, but no additionallarge-scale rearrangements are evident. The relative quiescencein macaque allows us to identify older duplications that aredifficult to discern in the exceedingly complex human self-alignments(7).
The inferred PRAME gene tree shows pronounced differences inevolutionary rates across branches, as well as some quite longbranches that suggest bursts of adaptive change. Using maximumlikelihood methods, we found evidence of positive selectionon several of these branches (Fig. 5A). This positive selection,combined with the highly variable pattern of gene duplicationand expansion, suggests that the PRAME gene family has playeda key role in species evolution.
We identified a second segment of extensive genomic duplicationsconcentrated at the telomere of macaque chromosome 9, orthologousto a human locus at 10p15.3 and observed by multiple approachesto be distributed throughout the macaque genome. The genes phosphofructokinase-plateletform (PFKP) and DIP2C were expanded in this region and yieldedthe highest array CGH macaque-to-human ratios in the genome(average log2 ratios of 3.30 and 2.54, respectively). DIP2Cis implicated in segmentation patterning, although its relevanceto macaque evolution is currently obscure. PFKP is importantin sugar (fructose) metabolism, raising the possibility thatthe pronounced copy-number expansion in macaque may be relevantto the high-fruit diet common among macaques. As with otherarray CGH copy-number estimates, the functional status of theadditional copies is not known. Six of the individual macaqueBACs that mapped to the region revealed related duplicated sequenceson rhesus chromosome 3, which formed from the fusion of orthologsof human chromosomes 7 and 21, suggesting that these genes mayhave played a role in this expansion.
Another macaque-specific increase involves the 22 HLA-relatedgenes located in the region orthologous to human chromosome6p21 (table S5.4). A previous study found that HLA gene copynumber was higher in the macaque than in the human (40), andour results confirm and extend this finding, demonstrating thatthe macaque HLA copy number is greater than that found for thehuman as well as all four great ape species (fig S5.3). Thisfinding also suggests that, although the macaque has been extensivelyused to model the human immune response, there may be substantialand previously unappreciated differences in HLA function betweenthese species. Notably, the copy number of another immune systemrelatedgene cluster, immunoglobulin lambda-like (IGL) at 22q11.23,is also predicted to be increased in the macaque (table S5.4).Members of the IGL locus encode light chain subunits that arepart of the PreB cell receptor; do not undergo rearrangements;and, when mutated, can result in B cell deficiency and agammaglobulinemia.Additional known genes predicted by array CGH to have markedlyincreased copy numbers in the macaque relative to the humaninclude DHFR, ATP5J2, DNAJC8, ADFP, and MAT2B. Overall, themain characteristics of the set of amplified genes were theirdiversity and the wide variety of genomic regions they occupied.
Orthologous Relationships
The macaque genome has also allowed for a detailed study ofmore subtle changes that have accumulated within orthologousprimate genes. The average human gene differs from its orthologin the macaque by 12 nonsynonymous and 22 synonymous substitutions,whereas it differs from its ortholog in the chimpanzee by fewerthan three nonsynonymous and five synonymous substitutions.Similarly, 89% of human-macaque orthologs differ at the aminoacid level, as compared with only 71% of human-chimpanzee orthologs.Thus, the chimpanzee and human genomes are in many ways toosimilar for characterizing protein-coding evolution in primates,but the added divergence of the macaque helps substantiallyin clarifying the signatures of natural selection.
General characteristics of orthologous genes. We developed anautomatic pipeline to identify 10,376 trios of HCR genes towhich we could assign a high confidence of 1:1:1 orthology.For comparison, we also identified 6762 human, macaque, mouse,and rat quartets; 5641 HCR, mouse, and rat quintets; and 5286HCR, mouse, and dog quintets. Because the human gene modelsare by far the best characterized for primates, we first identifieda set of 21,256 known human protein-coding genes derived froma union of the RefSeq (41), Vega (42), and University of CaliforniaSantaCruz Known Genes (43) collections. These genes were then mappedto synteny-based genome-wide multiple alignments (44, 45) andsubjected to a series of rigorous filters to eliminate spuriousannotations, paralogous alignments, genes that have become pseudogenizedin one or more species, and genes with incompletely conservedexon-intron structures (7). The genes that pass all filtersrepresent 1:1:1 orthologs in which aligned protein-coding basesare highly likely to encode proteins in all species, with identicalreading frames.
Despite the draft quality of the chimpanzee and macaque assemblies,the majority of human genes mapped through syntenic alignmentsto the chimpanzee (93% of genes) and macaque (89%) genomes (Fig. 6)(7), and most of these genes were completely alignable in theircoding regions. Fairly large fractions of human genes, however,were discarded because of apparent frame-shift insertions anddeletions (indels) or nonconserved exon-intron structures withrespect to their putative chimpanzee or macaque orthologs. Onthe basis of 81 finished BACS covering 294 genes, we estimatethat, out of 5526 genes failing the filters for alignment completeness,frame-shift indels, and conserved exon-intron structure, 2138(39%) were discarded completely because of flaws in the macaqueassembly; the remaining 3388 (61%) were discarded either becauseof genuine changes to genes or because of annotation or alignmenterrors (7). Another 2261 genes passed the human-macaque filtersbut failed the human-chimpanzee filters, and a large majorityof these failures were probably due to flaws in the chimpanzeeassembly. Altogether, we estimate that finished genomes forthe macaque and chimpanzee would allow the number of genes inhigh-confidence orthologous trios to be increased by at least23%, to 12,800 (7). Notably, our conservative ortholog setsmay create a bias against fast-evolving genes and thereforemay lead to underestimates of average levels of divergence andthe prevalence of positive selection.
Fig. 6. Numbers of human genes passing successive filters in the orthology analysis pipeline. Genes are required to fall in regions of large-scale synteny between genomes, to have completely aligned coding regions, not to have frame-shift indels or altered gene structures, and not to show signs of recent duplication.
[View Larger Version of this Image (21K GIF file)]
Alignments of the 10,376 orthologous trios were used to estimatethe ratio of the rates of nonsynonymous and synonymous substitutionsper gene (denoted ), with continuous-time Markov models of codonevolution and maximum likelihood methods for parameter estimation(4648). This yielded a mean estimate of = 0.247 (median0.144), close to the value of 0.23 estimated for human and chimpanzeegenes (29). About 9.8% of all genes show no nonsynonymous changesin the three species, and 2.8% have > 1, suggesting thatthey are under positive selection. Consistent with previousstudies (49), certain classes of genes exhibit unusually largeor small values, such as those assigned to the gene ontology(50) category "immune response," which have an distributionshifted significantly toward larger values, and those assignedto the "transcription factor activity" category, which havea distribution shifted toward smaller values (fig. S6.1).
Our estimates for in primates are considerably larger thanpreviously reported estimates for rodents, which have a medianof 0.11 (3), and larger than similar estimates from primate-versus-rodentcomparisons (29) (Fig. 7). To compare the average rates of evolutionof protein-coding genes in primates with those in other mammals,we estimated a separate value of for each branch of a five-speciesphylogeny, pooling data from all 5286 one-to-one orthologs forthese species (fig. S6.2). We obtained similar estimates of for the human ( = 0.169) and chimpanzee ( = 0.175) lineages,but substantially smaller estimates for the branches leadingto nonprimate mammals ( = 0.104 to 0.128), suggesting a reductionin purifying selection in hominins (29). The estimate of forthe macaque lineage ( = 0.124) is substantially smaller thanthe estimates for the human and chimpanzee and is closer tothe estimates for the mouse and dog, perhaps reflecting thelarger population size of macaques compared with the other primates.The estimates for the internal branches between the most recentcommon ancestors of the human and mouse and of the human andmacaque, as well as the most recent common ancestors of thehuman and macaque and of the human and chimpanzee, are nearlyequal to the macaque estimate. This suggests that protein-codingsequence evolution in macaques may have occurred at a typicalprimate rate, whereas it is the elevated rates in hominins thatmay be anomalous.
Fig. 7. Distributions of in primates versus rodents. Histogram of estimates of = dN/dS for human, chimpanzee, and macaque versus estimates for mouse and rat in 5641 orthologous quintets, showing a pronounced shift toward larger values in primates (P = 2.2 x 1016, Mann Whitney test). Genes with dN = 0 or dS = 0 are counted in the relative frequencies but not shown.
[View Larger Version of this Image (25K GIF file)]
When primate and rodent of individual genes were compared,primate orthologs were found to be evolving more rapidly bya 3:2 ratio. This asymmetry was also evident among genes showingsubstantial differences in primate (p), on the basis of human-macaquealignments, and rodent (r), deduced from mouse-rat alignments.According to a strict Bonferroni correction for multiple testing,22 genes showed statistically significant p > r, whereasonly three genes showed r > p (McNemar P < 0.001). Ifmultiple testing criteria are relaxed, the bias toward largerp is more notable (144 versus 8; tables S6.1 and S6.2). Casesof p > r generally reflect an increase in p, whereas casesof r > p result both from an increase in r and a decreasein p. The genes showing statistically significant p > r areenriched for functions in sensory perception of smell and tasteas well as for regulation of transcription (7).
Positive selection. Taking advantage of the additional phylogeneticinformation provided by the macaque genome, we performed a genome-widescan for positive selection, using our 10,376 HCR orthologoustrios and likelihood ratio tests (LRTs) (5153). Fourdifferent LRTs were performed: test TA, for positive selectionacross all branches of the phylogeny, and tests TH, TC, andTM for positive selection on the individual branches to human,chimpanzee, and macaque, respectively. Our methods use an unrootedtree and cannot distinguish between the branches to macaqueand the human-chimpanzee ancestor; for convenience, we referto the combined branch as the macaque branch. In all cases,variation among sites in was allowed and, to reduce the numberof parameters to estimate per gene, the branch-length proportionsand transition-transversion ratio () were estimated by poolingdata from genes of similar G+C content (7). Test TA identified67 genes, and tests TH, TC, and TM identified 2, 14, and 131genes (false-discovery rate (FDR) < 0.1 in all cases), respectively.The large number of genes identified for the macaque branchis partly a reflection of its greater length compared with thechimpanzee and human branches (7).
These four sets of genes overlap considerably, particularlyamong their highest scoring predictions (Table 5 and table S6.3).Their union contains 178 genes, or 1.7% of all genes tested.The two genes identified by THthose encoding the leukocyteimmunoglobulin-like receptor LILRB1 and hypothetical proteinLOC399947were also identified by TA, and the gene forLILRB1 was identified by TC as well, indicating evidence ofpositive selection on multiple branches. However, 12 out of14 genes identified by TC were not identified by the other tests,indicating possible lineage-specific selection in the chimpanzee.These include sex comb on midleg-like 1 (SCML1) and protamine1 (PRM1), which were previously identified in an analysis thatcould not distinguish between selection on the human and chimpanzeebranches (52). In addition, 99 genes were identified by TM butnot the other tests. These genes may be under lineage-specificselection in the macaque and/or may have experienced positiveselection on the branch leading to the most recent common ancestorof the human and chimpanzee.
Table 5. Selected genes from top 40 showing evidence of positive selection in primates. Accession, the number of the reference transcript for each gene (human). Chr, human chromosome on which reference gene resides. P value, nominal P value for test TA (7). Genes shown have FDR < 0.04. Test, the test (other than test TA) that detected the given gene. The Dup column has a checkmark if a gene overlaps a segmental duplication preceding the human/macaque divergence.
The genes identified by our tests for positive selection areenriched for several categories from the gene ontology (50)and Protein Analysis Through Evolutionary Relationships (PANTHER)(54) classification systems that are similar to those observedin previous genome-wide scans for positive selection (52, 53).These include defense response, immune response, T cellmediatedimmunity, signal transduction, and cell adhesion (tables S6.4to S6.7). Among the genes in these categories are several immunoglobulin-likegenes, including those that encode the leukocyte-associatedinhibitory receptors LILRB1 and LAIR1 (located in a clusteron chromosome 19), the T cell surface glycoprotein CD3 epsilonchain precursor CD3E, and the intercellular adhesion molecule1 precursor ICAM1. Other identified genes associated with celladhesion and/or signal transduction include those that encodeDSG1, a calcium-binding transmembrane component of desmosomes,and the transmembrane protein TSPAN8 (which has gained an exonby duplication in the macaque genome). Genes encoding membraneproteins in general are strongly overrepresented; other examplesinclude the genes that encode connexin 40.1, active in cellcommunication, and OPN1SW, the gene encoding blue-sensitiveopsin.
In addition, we observed strong enrichments for new categoriessuch as iron ion binding [e.g., thebetaglobin (HBB), lactotransferrin(LTF), and cytochrome B-245 heavy chain genes (CYBB)] and oxidoreductaseactivity (e.g., KRTAP5-8 and KRTAP5-4, which encode keratin-associatedproteins, and NDUFS5, which encodes a subunit of the nicotinamideadenine dinucleotide ubiquinone oxidoreductase). Two keratingenes, which are important for hair-shaft formation, are presentamong the top-scoring genes; these genes could conceivably havecome under positive selection as a result of mate selectionor climate change. Genes classified as part of the extracellularregion, which include the keratin genes, are in general overrepresented.Many of the identified genes from this category encode secretedproteins, such as the interferon alpha 8 precursor IFNA8, whichexhibits antiviral activity; the interleukin 8 precursor IL8,a mediator of inflammatory response; and CRISP1, which is expressedin the epididymis and plays a role at fertilization in sperm-eggfusion.
We found only weak enrichments for genes involved in apoptosisand spermatogenesis (52), but we did see a significant excessof high likelihood ratios among genes involved in fertilization.Other categories that show an excess of high likelihood ratiosbut that are not enriched for genes identified by our testsinclude blood coagulation, response to wounding, and relatedcategories; epidermis morphogenesis; KRAB-box transcriptionfactor; and olfactory receptor activity (tables S6.6 and S6.7).Their elevated likelihood ratios may reflect either weak positiveselection or relaxation of constraint.
The inclusion of the macaque genome substantially improves statisticalpower to detect positive selection in primates, compared withprevious scans that used only the human and chimpanzee genomes(29, 52). By examining about 8000 human-chimpanzee alignmentswith a similar LRT, Nielsen et al. (52) were able to identifyonly 35 genes with nominal P < 0.05, and when consideringmultiple comparisons, they were able to establish only thata 5% false discovery rate set was nonempty. By contrast, theuse of the macaque genome allows the identification of 15 genesunder positive selection in hominins and an additional 163 underselection on one or more other branches of the phylogeny, withFDR < 0.1. We estimate that including the macaque genomemakes test TA about three times as powerful. However, includingmacaque rather than mouse (53) as an outgroup improves the powerof test TH only marginally (7).
The genes identified by the LRTs are generally randomly distributedin the genome, and no significant clustering was observed whentested (P = 0.24), although small clusters were found on humanchromosomes 11 and 19 (7). Chromosome 11, with 10 genes identifiedby test TA, has more than twice the expected number of genesunder positive selection, but this enrichment is not significantafter correcting for multiple comparisons [P = 0.10, Fisher'sexact test and Holm correction (7)]. However, a significantenrichment was observed for genes overlapping segmental duplicationsthat occurred before the human-macaque divergence (P = 0.006,Fisher's exact test), suggesting an increased likelihood ofadaptive evolution following gene duplication. Four of the topfive genes identified by test TA overlap segmental duplicationsthat predate the human-macaque divergence (Table 5).
Genetic Variation in Macaques
The use of rhesus macaques as animal models of human physiologycan be greatly enhanced by an improved understanding of theirunderlying genetic variation. To explore rhesus genetic diversityand to create resources for further genetic studies, we generateda total of 26.2 Mb of whole-genome shotgun sequence from 16unrelated individuals (eight of Chinese origin and eight ofIndian origin, table S7.1). We next identified 26,479 single-basedifferences [putative single-nucleotide polymorphisms (SNPs)]through comparison with the reference genome. Overall, we foundapproximately one SNP per kilobase, which is on average closeto that found in similar human studies. There was a surprisingdifference of 50% in overall diversity between the autosomesand the X chromosome (Fig. 8A); we expected a value of 75%.This expectation was based on differences in effective chromosomepopulation sizes, given that females have two X chromosomesand males carry only one. The reduction in diversity could bedue to recent selective sweeps of positively selected recessivemutations on the X chromosome (55).
Fig. 8. SNP within rhesus macaques. (A) SNP densities per kilobase for eight Chinese (blue) and eight Indian (red) individuals in autosomes and the X chromosome. Error bars indicate standard error with variance calculated across individual-chromosome replicates. (B) Distribution of Tajima's D statistic across 166 amplicons for each population (n = 38 for Indian and n = 9 for Chinese individuals). (C) The distribution of the number of haplotypes per haplotype block (determined using the four-gamete test) across five regions.
[View Larger Version of this Image (12K GIF file)]
We also found that the frequency of the whole-genome shotgunSNPs differed substantially among the animals from the differentpopulations (0.95/kb in Indian rhesus and 1.06/kb in Chineserhesus), and there was suggestive variation in SNP density withintheir subpopulations (SD = 0.0275/kb for Chinese macaques; SD= 0.0527/kb for Indian macaques). Together with complementarydata from PCR analysis of polymorphic L1 and Alu element insertions(figs. S7.1 and S7.2) that showed population substructure, thisprompted additional experiments in which 48 animals from thetwo populations were surveyed by PCR-direct DNA sequencing.Details and most conclusions from that study have been reportedby Hernandez et al. (56), including a demonstration that >67%of SNPs discovered by direct sequencing are private to eachsubpopulation. The strong population differentiation is reflectedin fixation index (FST) values (a measure of population differentiation)and a marked difference in Watterson's (57) estimate of thepopulation mutation rate between the two groups. Here, we observedthat the population differences are also reflected in differentialdistribution of Tajima's D statistic and in linkage disequilibriumacross sampled regions (Fig. 8, B and C). Each of these statisticsfurther reflects the possibilities of sweeps of natural selectionor major differences in population histories that must be factoredinto ongoing genetic studies. These initial insights into theunderlying patterns of variation within individual animals willtherefore provide the basis for future genetic analyses. Inaddition to their utility for identification of individual animals,the SNP markers will be invaluable for larger-scale populationstudies.
Male mutation bias. A comparison of human-rhesus substitutionrates (calculated at interspersed repetitive elements) betweenthe X chromosome and the autosomes yielded an estimate of themale-to-female mutation rate ratio () of 2.87 (95% CI = 2.37to 3.81; table S7.2). This value is lower than = 6 estimatedfor the human and chimpanzee (58) but higher than = 2 estimatedfor the mouse and rat (3, 59). Thus, this argues against a uniformmagnitude of male mutation bias in mammals (5) and supportsa correlation between male mutation bias and generation time(60, 61).
Human Disease Orthologs in the Macaque
While the general morphological and physiological similaritiesbetween humans and macaques greatly enhance the utility of thelatter as a model organism, specific differences in their underlyingcoding sequences can also provide biological insights. By comparinghuman disease genes with their macaque equivalents, we identifiednumerous instances in which the allele observed in the macaquecorresponds to the disease allele in the human. These occurrencessuggest that the human disease variants could be either persistent(i.e., ancestral) or recurring sequences that represent therecapitulation of ancestral states that may once have been protective,but which now result in adverse consequences for human health(62).
To identify the ancestral disease-associated alleles in human,we screened the macaque and chimpanzee assemblies for the presenceof any of the 64,251 different disease-causing or disease-associatedmutations collected in the Human Gene Mutation Database (63,64). A total of 229 substitutions were identified for whichthe amino acid considered to be mutant in human correspondedto the wild-type amino acid present in macaque, chimpanzee,and/or a reconstructed ancestral genome (Table 6) (65) (seetable S8.1 for a full list).
Table 6. Examples of human mutations that cause inherited disease and match an ancestral or nonhuman primate state. Chr:start-stop shows the address in the March 2006 human assembly. Name is the name used by the Human Gene Mutation Database (64). The notation "N>A:CHMT" means that N is the consensus human amino acid, A is the disease-associated form, C is in the current chimp assembly, H is in the inferred human-chimp ancestor, M is in rhesus, and T is in the inferred human-rhesus ancestor (the mouse and dog were used as outgroup species) (73).
Chr:start-stop
Strand
Name
Replacement N>A:CHMT
Gene
Disease
chr1:94270150-94270152
-
CM014300
R>Q:RRQR
ABCA4
Stargardt disease
chr1:94316821-94316823
-
CM015072
H>R:RRRR
ABCA4
Stargardt disease
chr1:94337037-94337039
-
CM042258
K>Q:QQQQ
ABCA4
Stargardt disease
chr6:26201158-26201160
+
HM030028
V>A:VVAA
HFE
Hemochromatosis
chr7:116936418-116936420
+
CM940237
F>L:FFLL
CFTR
Cystic fibrosis
chr7:117054872-117054874
+
CM941984
K>R:KKRK
CFTR
Cystic fibrosis
chr12:101761685-101761687
-
CM962547
Y>H:YYHY
PAH
Phenylketonuria
chr12:101784521-101784523
-
CM941128
I>T:IITI
PAH
Phenylketonuria
chr13:51413354-51413356
-
CM044579
V>A:AAAA
ATP7B
Wilson disease
chr13:112843266-112843268
+
CM021094
D>E:DDED
F10
Factor X deficiency
chr17:37948991-37948993
+
CM040465
R>Q:RRQQ
NAGLU
Sanfilippo syndrome B
chr19:43656115-43656117
+
CM064230
S>G:GGGG
RYR1
Malignant hyperthermia
chrX:38111528-38111530
+
CM941115
R>H:RRHH
OTC
Ornithine hyperammonemia
chrX:38125613-38125615
+
CM961052
T>M:MTTT
OTC
Ornithine hyperammonemia
chrX:138458220-138458222
+
CM045148
E>K:EEKK
F9
Hemophilia B
One surprising result of the analysis was the identificationof several human loci that, when mutated, give rise to profoundclinical phenotypes, including severe mental retardation. Forexample, the macaque data revealed deleterious alleles in theornithine transcarbamylase (OTC) and phenylalanine hydroxylase(PAH) genes, which are associated in human with OTC deficiencyand phenylketonuria. In humans, these mutations greatly perturbthe normal serum amino acid levels. Direct examination of macaqueblood revealed lower concentrations of cystine and cysteinethan in the human and slightly higher concentrations of glycinethan in the human, but no increase in phenylalanine or ammonia,which might have been a predicted result of these changes (tablesS8.2 and S8.3). Although the effect of the observed allelesmight be greatly influenced by compensatory mutations (66) orother environmental factors, it remains a possibility that thebasic metabolic machinery of the macaque may exhibit functionallyimportant differences with respect to our own (Fig. 9).
Fig. 9. Ancestral disease mutations. Examples of human mutations that match the sequences of chimp and/or macaque are shown. (A) Genes in which the ancestral allele is now the disease-associated allele in humans. (B) An instance in which the mutant allele in humans is the normal allele in macaque. The amino acid sequences predicted for the boreoeutherian ancestor (65) are given on the top row of each alignment block. Identities are shown as dots and differences are given as letters (73). The position of the mutation in humans is boxed in orange, and the box extends through the relevant comparisons.
[View Larger Version of this Image (26K GIF file)]
Ancestral mutations were also identified in the N-alpha-acetylglucosaminidase(NAGLU) gene that gives rise to mucopolysaccharidoses (Sanfilliposyndrome), which is also characterized by profound mental retardation.Their occurrence invites further investigation of the contributionof this and related genes to the phenotypic differences betweenmacaques and humans, and the potential for further explorationof these monkeys as models for this disorder.
We also identified a human mutation associated with Stargardtdisease and macular dystrophy that matches an ancestral alleleby replacing lysine with glutamine at position 223 of the humanABCA4 protein (Fig. 9). Umeda et al. (67) reported the presenceof the glutamine in a cynomolgus monkey, and all other eutherianmammals as well as the predicted boreoeutherian ancestral sequencehave glutamine at this position. Furthermore, glutamine is presentat this residue in Xenopus, thereby implying conservation throughsome 300 million years of vertebrate evolution. Thus, it maybe inferred that the ancestral glutamine has been replaced bylysine in humans. Similarly, one CFTR mutation [Phe87Leu87 (Phe87Leu)]is present not only in most mammals (Fig. 9) but also in Fugu,also implying extensive conservation through vertebrate evolution.
Impact of a Genomic Sequence on Biological Studies
In addition to its impact on comparative and genetic studies,the genome sequence reported here heralds a new era in laboratorystudies of macaque biology. The full potential for more precisedefinition of this animal model and its gene content is notyet realized, but the value of the new sequence in guiding DNAmicroarrays for studying macaque gene expression has alreadybecome clear (68). Previously, human or macaque EST-based arrayshad been used for expression studies (69). The most recentlyreleased microarray now adds probes designed by alignment ofthe 3' untranslated regions of 23,000 human RefSeq genes tothe sequences from the initial macaque genome release (January2005, Mmul0.1, approximate genome coverage of 3.5-fold). Thevast majority of the probes on this array (98.5%) now matchthe current macaque genome release with high confidence andrepresent 18,690 unique genomic loci. These provide a representationof recognized functional pathways with an enhancement aboutthree times that of the previous data, and overall more uniformand robust hybridization signals compared with those of previousmicroarrays (69) (tables S9.1 to S9.3).
The power of global transcriptional profiling with advancedmacaque-specific reagents has been demonstrated in studies ofvirulence and pathogenicity of influenza from historic pandemicstrains, as well as from emerging agents of zoonotic origin.We infected macaques with the human influenza strain A/Texas/36/9(70) and compared the expression changes observed in lung tissuesto those seen in whole blood during the course of infection.Figure 10 shows a differential time course of expression betweeninterferon-induced genes and genes in the inflammation pathway,in different tissues (table S9.4). The increased expressionin lung tissue shortly after infection reflects the early innateresponse, whereas genes associated with the reemergence of theinflammation pattern at day 7 implicate a transition to an adaptiveimmune response. These kinds of studies will be crucial forelucidating all of the transitions from innate to adaptive immuneresponses and are fully enabled by the macaque-specific microarraysdeveloped from the genome sequences.
Fig. 10. Application of rhesus-specific microarrays. A microarray based on the rhesus macaque draft genome was used to analyze gene expression in a macaque model of human influenza infection. Gray bars measure an overall response for indicated functional categories, based on corresponding heat maps, and reveal a significant rebound in expression at day 7 for genes associated with the inflammatory response, when compared to interferon induction. Red, increased expression; green, reduced expression. Details are given in (7, 70).
[View Larger Version of this Image (49K GIF file)]
We expect many more immediate examples of the impact of othertools developed from the finished macaque genome. For example,the requirement for improvements in PCR-based methods is shownby a recent report on the large-scale cloning of terminal exonsfor macaque genes, in which the use of human primers was successful,on average, in 67% of cases (71). Only a native sequence canallow sufficient precision for these types of highly specificassays. A similar increase of activity in studies of the macaqueproteome can be predicted, given that early efforts in macaqueproteomics have had to rely on human reference sequences foranalyzing liquid chromatography and tandem mass spectrometrydata (70).
Discussion
The draft genomic sequence reported here has already moved themacaque from a model that has been much studied at the levelof physiology, behavior, and ecology to a whole-organism systemthat can be interrogated at the level of the single DNA base.This transformation is evident in the literature as well asin this special section (15, 19, 57, 72).
Additional general conclusions emerged from this study. First,the data make it conceivable to define completely all of theoperational components of the pathways underlying the individualbiological systems that together constitute the functioningadult macaque. For example, a complete description of all thedifferent macaque immune function components will enable aneven more thoughtful use of rhesus macaques in areas such asAIDS research and for vaccine production.
Second, we were struck by the high value of adding regions ofgenome finishing to the draft sequence for the comparative analysesof genes and duplicated structures. This provides an argumentfor future finished primate genomes.
Third, the data now provide new opportunities to explore thebasic biology of this highly successful species. Rhesus macaquesretain a broad geographic distribution with reasonably healthypopulation numbers and widely studied ecology and ethology.The genetic resources generated in this study will undoubtedlyform the basis of many analyses of population variability andinter-population diversity.
Finally, the genomic rearrangements, duplications, gene-specificexpansions, and measurements of the impact of natural selectionpresented here have revealed the rich and heterogeneous genomicchanges that have occurred during the evolution of the human,chimpanzee, and macaque. The marked diversity of the types ofchange that have occurred demonstrate a major feature of primateevolution: The aggregation of changes that we see, even in closelyrelated species, does not reflect smooth, progressive, and orderlygenomic divergence. Models of abrupt or punctuated evolutionalready acknowledge that smooth and continuous change is difficultto achieve on an evolutionary time scale, but this study providesa notable example of the operation of this principle in ourclose relatives.
Rhesus Macaque Genome Sequencing and Analysis Consortium
Project Leader: Richard A. Gibbs1,2
White paper: Jeffrey Rogers,3 Michael G. Katze,4 Roger Bumgarner,4Richard A. Gibbs,1,2 George M. Weinstock1,2
Principal investigators: Richard A. Gibbs,1,2 Elaine R. Mardis,5Karin A. Remington,6 Robert L. Strausberg,6 J. Craig Venter,6George M. Weinstock,1,2 Richard K. Wilson5
Analysis leaders: Mark A. Batzer,7 Carlos D. Bustamante,8 EvanE. Eichler,9 Richard A. Gibbs,1,2 Matthew W. Hahn,10 Ross C.Hardison,11 Kateryna D. Makova,11 Webb Miller,11 AleksandarMilosavljevic,1,2 Robert E. Palermo,4 Adam Siepel,8 James M.Sikela,12 George M. Weinstock1,2
Genome sequencing: Tony Attaway,1,2 Stephanie Bell,1,2 KellyE. Bernard,5 Christian J. Buhay,1,2 Mimi N. Chandrabose,1,2Marvin Dao,1,2 Clay Davis,1,2 Kimberly D. Delehaunty,5 Yan Ding,1,2Huyen H. Dinh,1,2 Shannon Dugan-Rocha,1,2 Lucinda A. Fulton,5Ramatu Ayiesha Gabisi,1,2 Toni T. Garner,1,2 Richard A. Gibbs,1,2Jennifer Godfrey,5 Alicia C. Hawes,1,2 Judith Hernandez,1,2Sandra Hines,1,2 Michael Holder,1,2 Jennifer Hume,1,2 ShaliniN. Jhangiani,1,2 Vandita Joshi,1,2 Ziad Mohid Khan,1,2 EwenF. Kirkness6 (leader), Andrew Cree,1,2 R. Gerald Fowler,1,2Sandra Lee,1,2 Lora R. Lewis,1,2 Zhangwan Li,1,2 Yih-shin Liu,1,2Stephanie M. Moore,1,2 Donna Muzny1,2 (leader), Lynne V. Nazareth1,2(leader), Dinh Ngoc Ngo,1,2 Geoffrey O. Okwuonu,1,2 Grace Pai,6David Parker,1,2 Heidie A. Paul,1,2 Cynthia Pfannkoch,6 CraigS. Pohl,5 Yu-Hui Rogers,6 San Juana Ruiz,1,2 Aniko Sabo,1,2Jireh Santibanez,1,2 Brian W. Schneider,1,2 Scott M. Smith,5Erica Sodergren,1,2 Amanda F. Svatek,1,2 Teresa R. Utterback,1,2Selina Vattathil,1,2 Wesley Warren5 (leader), George M. Weinstock,1,2Courtney Sherell White1,2
Genome assembly: Asif T. Chinwalla5 (leader), Yucheng Feng,5Aaron L. Halpern,6 LaDeana W. Hillier,5 Xiaoqiu Huang,13 EwenF. Kirkness,6 Pat Minx,5 Joanne O. Nelson,5 Kymberlie H. Pepin,5Xiang Qin,1,2 Karin A. Remington,6 Granger G. Sutton6 (leader),Eli Venter,6 Brian P. Walenz,6 John W. Wallis,5 George M. Weinstock,1,2Kim C. Worley1,2 (leader), Shiaw-Pyng Yang5
Mapping: LaDeana W. Hillier,5 Steven M. Jones,14 Marco A. Marra,14Mariano Rocchi,15 Jacqueline E. Schein,14 John W. Wallis5
Sequence finishing: Christian J. Buhay,1,2 Yan Ding,1,2 ShannonDugan-Rocha,1,2 Alicia C. Hawes,1,2 Judith Hernandez,1,2 MichaelHolder,1,2 Jennifer Hume,1,2 Ziad Mohid Khan,1,2 Zhangwan Li,1,2Dinh Ngoc Ngo,1,2 Aniko Sabo1,2
Assembly comparison: Robert Baertsch,16 Asif T. Chinwalla,5Laura Clarke,17 Miklós Csürös,18 Jarret Glasscock,5R. Alan Harris,1,2 Paul Havlak,1,2 LaDeana W. Hillier,5 AndrewR. Jackson,1,2 Huaiyang Jiang,1,2 Yue Liu,1,2 David N. Messina,5Xiang Qin,1,2 Yufeng Shen,1,2 Henry Xing-Zhi Song,1,2 GeorgeM. Weinstock1,2 (leader), Kim C. Worley1,2 (leader), Todd Wylie,5Lan Zhang1,2
Gene prediction: Ewan Birney,17 Laura Clarke17
Repetitive elements: Mark A. Batzer7 (leader), Kyudong Han,7Miriam K. Konkel,7 Jungnam Lee,7 Webb Miller,11 Arian F. A.Smit,19 Brygg Ullmer,20 Hui Wang,7 Jinchuan Xing7,21
Ancestral genomes and segmental duplications: Richard Burhans,11Ze Cheng,9 Miklós Csürös,18 Evan E. Eichler,9R. Alan Harris,1,2 Andrew R. Jackson,1,2 John E. Karro,11 JianMa,22 Aleksandar Milosavljevic1,2 (leader), Brian Raney,22 XinweiShe9
Gene duplication/gene families: Michael J. Cox,12 Jeffery P.Demuth,10 Laura J. Dumas,12 Matthew W. Hahn10 (leader), Sang-GookHan,10 Janet Hopkins,12 Anis Karimpour-Fard,23 Young H. Kim,24Jonathan R. Pollack,24 James M. Sikela12 (leader)
PRAME Gene Family Analysis: Webb Miller11 (leader), Donna Muzny,1,2Brian Raney,22 Aniko Sabo,1,2 Adam Siepel,8 Tomas Vinar8
Orthologous genes: Charles Addo-Quaye,11 Jeremiah Degenhardt,8Alexandra Denby,8 Melissa J. Hubisz,25 Amit Indap,8 CarolinKosiol,8 Bruce T. Lahn,25,26 Heather A. Lawson,11 Alison Marklein,8Rasmus Nielsen,27 Adam Siepel8 (leader), Eric J. Vallender,25,26Tomas Vinar8
Population genetics: Mark A. Batzer7 (leader), Carlos D. Bustamante8(leader), Andrew G. Clark,28 Jeremiah Degenhardt,8 Betsy Ferguson,29Richard A. Gibbs,1,2 Matthew W. Hahn,10 Kyudong Han,7 Ryan D.Hernandez,8 Kashif Hirani,1,2 Amit Indap,8 Hildegard Kehrer-Sawatzki,30Jessica Kolb,30 Miriam K. Konkel,7 Jungnam Lee,7 Lynne V. Nazareth,1,2Shobha Patil,1,2 Ling-Ling Pu,1,2 Jeffrey Rogers,3 Yanru Ren,1,2David Glenn Smith,3 Brygg Ullmer,20 Hui Wang,7 David A. Wheeler,1,2Jinchuan Xing7,21
Sex chromosome evolution: Kateryna D. Makova,11 Ian Schenck11
Human disease orthologs: Edward V. Ball,31 Rui Chen,1,2 DavidN. Cooper,31 Belinda Giardine,11 Richard A. Gibbs,1,2 Ross C.Hardison11 (leader), Fan Hsu,22 W. James Kent,22 Arthur Lesk,11Webb Miller,11 David L. Nelson,2 William E. O'Brien,2 Kay Prüfer,32Peter D. Stenson31
Additional biological impact of genomic sequence: Michael G.Katze,4 Robert E. Palermo4 (leader), James C. Wallace4
Genome browser: Robert Baertsch,16 Galt P. Barber,22 David Haussler35,16(leader), Donna Karolchik,22 Andy D. Kern,22 Robert M. Kuhn,22Kayla E. Smith,22 Ann S. Zwieg22
1Human Genome Sequencing Center, Baylor College of Medicine,Houston, TX 77030, USA. 2Department of Molecular and Human Genetics,Baylor College of Medicine, Houston, TX 77030, USA. 3Departmentof Genetics, Southwest Foundation for Biomedical Research, SanAntonio, TX 78227, USA. 4Department of Microbiology, Universityof Washington, Seattle, WA 98195, USA. 5Genome Sequencing Center,Washington University, St. Louis, MO 63108, USA. 6J. Craig VenterInstitute, 9704 Medical Center Drive, Rockville, MD 20850, USA.7Department of Biological Sciences, Biological Computation andVisualization Center, Center for BioModular Multi-scale Systems,Louisiana State University, Baton Rouge, LA 70803, USA. 8Departmentof Biological Statistics and Computational Biology, CornellUniversity, Ithaca, NY 14853, USA. 9Department of Genome Sciences,University of Washington, Seattle, WA 98195, USA. 10Departmentof Biology and School of Informatics, Indiana University, Bloomington,IN 47405, USA. 11Center for Comparative Genomics and Bioinformatics,Pennsylvania State University, University Park, PA 16802, USA.12Human Medical Genetics and Neuroscience Programs, Departmentof Pharmacology, University of Colorado at Denver and HealthSciences Center, Aurora, CO 80045, USA. 13Department of ComputerScience, Iowa State University, Ames, IA 50011, USA. 14GenomeSciences Centre, British Columbia Cancer Agency, 570 West 7thAvenue, Vancouver, BC, Canada. 15Department of Genetics andMicrobiology, University of Bari, Bari, Italy. 16Departmentof Bioinformatics, University of California Santa Cruz, SantaCruz, CA 95060, USA. 17The Wellcome Trust Sanger Institute,Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB101SA, UK. 18Département d'Informatique et de RechercheOpérationnelle, Université de Montréal,Montréal, QC H3C 3J7, Canada. 19Institute for SystemsBiology, 1441 North 34th Street, Seattle, WA 981038904,USA. 20Center for Computation and Technology, Department ofComputer Sciences, Louisiana State University, Baton Rouge,LA 70803, USA. 21Eccles Institute of Human Genetics, Universityof Utah, Salt Lake City, UT 84112, USA. 22Center for BiomolecularScience and Engineering, University of California Santa Cruz,Santa Cruz, CA 95064, USA. 23Department of Preventative Medicineand Biometrics, University of Colorado at Denver and HealthSciences Center, Aurora, CO 80045, USA. 24Department of Pathology,Stanford University, Stanford, CA 94305, USA. 25Department ofHuman Genetics, University of Chicago, Chicago, IL 60637, USA.26Howard Hughes Medical Institute, Department of Human Genetics,University of Chicago, Chicago, IL 60637, USA. 27Institute ofBiology, University of Copenhagen, Copenhagen DK-1017, Denmark.28Department of Molecular Biology and Genetics, Cornell University,Ithaca, NY 14853, USA. 29Genetics Research and Informatics Program,Oregon National Primate Research Center, Beaverton, OR 97006,USA. 30Institute of Human Genetics, University of Ulm, Ulm,89081, Germany. 31Institute of Medical Genetics, Cardiff University,Heath Park, Cardiff, CF14 4XN, UK. 32Department EvolutionaryGenetics, Max Planck Institute for Evolutionary Anthropology,Leipzig, 04103, Germany. 33Centre for Stem Cell Biology andTissue Engineering, Sun Yat-sen University, Guangzhou 510080,China. 34South-China Primate Research and Development Center,Guangzhou 510080, China. 35Howard Hughes Medical Institute,Santa Cruz, CA 95060, USA.
5. N. Patterson, D. J. Richter, S. Gnerre, E. S. Lander, D. Reich, Nature441, 1103 (2006). [CrossRef] [Medline]
6. L. M. Zahn, B. R. Jasny, Eds., poster from the special issue on the Macaque Genome, Science316, following p. 246 (13 April 2007); interactive online (www.sciencemag.org/sciext/macaqueposter/).
7. Materials, methods, and additional discussion are available on Science Online.
73. Single-letter abbreviations for the amino acid residues are as follows: A, Ala; C, Cys; D, Asp; E, Glu; F, Phe; G, Gly; H, His; I, Ile; K, Lys; L, Leu; M, Met; N, Asn; P, Pro; Q, Gln; R, Arg; S, Ser; T, Thr; V, Val; W, Trp; and Y, Tyr.
74. This project was supported by National Human Genome Research Institute grants to the Baylor College of Medicine Human Genome Sequencing Center (U54 HG003273), Washington University Genome Sequencing Center (U54 HG003079), and the J. Craig Venter Institute (U54 HG003068). We thank members of the NHGRI staff for their ongoing efforts: A. Felsenfeld, J. Peterson, M. Guyer, and W. Lu. Additional acknowledgments of support are available online (7). We thank the California National Primate Research Center (NPRC), Oregon NPRC, Southwest NPRC, and Yerkes NPRC for contributing biological samples used in this study.
Received for publication 22 December 2006. Accepted for publication 16 March 2007.
The editors suggest the following Related Resources on Science sites:
In Science Magazine
INTRODUCTION TO SPECIAL ISSUE
Laura M. Zahn, Barbara R. Jasny, Elizabeth Culotta, and Elizabeth Pennisi (13 April 2007) Science316 (5822), 215.
[DOI: 10.1126/science.316.5822.215] |Summary »|PDF »
THIS ARTICLE HAS BEEN CITED BY OTHER ARTICLES:
Evolutionary rate variation in Old World monkeys.
N. Elango, J. Lee, Z. Peng, Y.-H. E. Loh, and S. V. Yi (2009)
Biol Lett
5, 405-408
|Abstract »|Full Text »|PDF »
On the Origin and Evolution of Vertebrate Olfactory Receptor Genes: Comparative Genome Analysis Among 23 Chordate Species.
Estimates of Positive Darwinian Selection Are Inflated by Errors in Sequencing, Annotation, and Alignment.
A. Schneider, A. Souvorov, N. Sabath, G. Landan, G. H. Gonnet, and D. Graur (2009)
Gen Biol Evol
2009, 114-118
|Abstract »|Full Text »|PDF »
As in Humans, Pregnancy Increases the Clearance of the Protease Inhibitor Nelfinavir in the Nonhuman Primate Macaca nemestrina.
H. Zhang, X. Wu, F. Chung, S. B. Naraharisetti, D. Whittington, A. Mirfazaelian, and J. D. Unadkat (2009)
J. Pharmacol. Exp. Ther.
329, 1016-1022
|Abstract »|Full Text »|PDF »
ABySS: A parallel assembler for short read sequence data.
J. T. Simpson, K. Wong, S. D. Jackman, J. E. Schein, S. J.M. Jones, and I. Birol (2009)
Genome Res.
19, 1117-1123
|Abstract »|Full Text »|PDF »
Sustained high-level polyclonal hematopoietic marking and transgene expression 4 years after autologous transplantation of rhesus macaques with SIV lentiviral vector-transduced CD34+ cells.
Y.-J. Kim, Y.-S. Kim, A. Larochelle, G. Renaud, T. G. Wolfsberg, R. Adler, R. E. Donahue, P. Hematti, B.-K. Hong, J. Roayaei, et al. (2009)
Blood
113, 5434-5443
|Abstract »|Full Text »|PDF »
Adaptive evolution of young gene duplicates in mammals.
M. V. Han, J. P. Demuth, C. L. McGrath, C. Casola, and M. W. Hahn (2009)
Genome Res.
19, 859-867
|Abstract »|Full Text »|PDF »
Comparative analysis of Alu repeats in primate genomes.
G. E. Liu, C. Alkan, L. Jiang, S. Zhao, and E. E. Eichler (2009)
Genome Res.
19, 876-885
|Abstract »|Full Text »|PDF »
The difficulty of avoiding false positives in genome scans for natural selection.
S. Mallick, S. Gnerre, P. Muller, and D. Reich (2009)
Genome Res.
19, 922-933
|Abstract »|Full Text »|PDF »
The apolipoprotein L family of programmed cell death and immunity genes rapidly evolved in primates at discrete sites of host-pathogen interactions.
Copy number variants, diseases and gene expression.
C. N. Henrichsen, E. Chaignat, and A. Reymond (2009)
Hum. Mol. Genet.
18, R1-R8
|Abstract »|Full Text »|PDF »
NKp44 expression, phylogenesis and function in non-human primate NK cells.
A. De Maria, E. Ugolotti, E. Rutjens, S. Mazza, L. Radic, A. Faravelli, G. Koopman, E. Di Marco, P. Costa, B. Ensoli, et al. (2009)
Int. Immunol.
21, 245-255
|Abstract »|Full Text »|PDF »
EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates.
A. J. Vilella, J. Severin, A. Ureta-Vidal, L. Heng, R. Durbin, and E. Birney (2009)
Genome Res.
19, 327-335
|Abstract »|Full Text »|PDF »
A Null Allele Impairs Function of CYP2C76 Gene in Cynomolgus Monkeys: A Possible Genetic Tool for Generation of a Better Animal Model in Drug Metabolism.
Y. Uno, H. Sakuraba, S. Uehara, T. Kumano, K. Matsuno, C. Nakamura, G. Kito, T. Kamataki, and R. Nagata (2009)
Drug Metab. Dispos.
37, 14-17
|Abstract »|Full Text »|PDF »
Kinomer v. 1.0: a database of systematically classified eukaryotic protein kinases.
D. M. A. Martin, D. Miranda-Saavedra, and G. J. Barton (2009)
Nucleic Acids Res.
37, D244-D250
|Abstract »|Full Text »|PDF »
Ensembl 2009.
T. J. P. Hubbard, B. L. Aken, S. Ayling, B. Ballester, K. Beal, E. Bragin, S. Brent, Y. Chen, P. Clapham, L. Clarke, et al. (2009)
Nucleic Acids Res.
37, D690-D697
|Abstract »|Full Text »|PDF »
MonkeySNP: a web portal for non-human primate single nucleotide polymorphisms.
S. Khouangsathiene, C. Pearson, S. Street, B. Ferguson, and C. Dubay (2008)
Bioinformatics
24, 2645-2646
|Abstract »|Full Text »|PDF »
Developmental Regulation of the NMDA Receptor Subunits, NR3A and NR1, in Human Prefrontal Cortex.
M. A. Henson, A. C. Roberts, K. Salimi, S. Vadlamudi, R. M. Hamer, J. H. Gilmore, L. F. Jarskog, and B. D. Philpot (2008)
Cereb Cortex
18, 2560-2573
|Abstract »|Full Text »|PDF »
Pervasive positive selection on duplicated and nonduplicated vertebrate protein coding genes.
R. A. Studer, S. Penel, L. Duret, and M. Robinson-Rechavi (2008)
Genome Res.
18, 1393-1402
|Abstract »|Full Text »|PDF »
Similar Numbers but Different Repertoires of Olfactory Receptor Genes in Humans and Chimpanzees.
Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals.
I. Hellmann, Y. Mang, Z. Gu, P. Li, F. M. de la Vega, A. G. Clark, and R. Nielsen (2008)
Genome Res.
18, 1020-1029
|Abstract »|Full Text »|PDF »
Evolutionary dynamics of segmental duplications from human Y-chromosomal euchromatin/heterochromatin transition regions.
S. Kirsch, C. Munch, Z. Jiang, Z. Cheng, L. Chen, C. Batz, E. E. Eichler, and W. Schempp (2008)
Genome Res.
18, 1030-1042
|Abstract »|Full Text »|PDF »
Uprobe 2008: an online resource for universal overgo hybridization-based probe retrieval and design.
R. T. Sullivan, C. B. Morehouse, NISC Comparative Sequencing Program, and J. W. Thomas (2008)
Nucleic Acids Res.
36, W149-W153
|Abstract »|Full Text »|PDF »
Analysis of copy number variation in the rhesus macaque genome identifies candidate loci for evolutionary and human disease studies.
A. S. Lee, M. Gutierrez-Arcelus, G. H. Perry, E. J. Vallender, W. E. Johnson, G. M. Miller, J. O. Korbel, and C. Lee (2008)
Hum. Mol. Genet.
17, 1127-1136
|Abstract »|Full Text »|PDF »
No effect of recombination on the efficacy of natural selection in primates.
Physiological and Molecular Determinants of Insulin Action in the Baboon.
A. O. Chavez, J. C. Lopez-Alvarenga, M. E. Tejero, C. Triplitt, R. A. Bastarrachea, A. Sriwijitkamol, P. Tantiwong, V. S. Voruganti, N. Musi, A. G. Comuzzie, et al. (2008)
Diabetes
57, 899-908
|Abstract »|Full Text »|PDF »
From the Cover: TRIMCyp expression in Old World primates Macaca nemestrina and Macaca fascicularis.
Distinct genomic signatures of adaptation in pre- and postnatal environments during human evolution.
M. Uddin, M. Goodman, O. Erez, R. Romero, G. Liu, M. Islam, J. C. Opazo, C. C. Sherwood, L. I. Grossman, and D. E. Wildman (2008)
PNAS
105, 3215-3220
|Abstract »|Full Text »|PDF »
Functional Genetic Analysis of Rhesus Cytomegalovirus: Rh01 Is an Epithelial Cell Tropism Factor.
A. E. Lilja, W. L. W. Chang, P. A. Barry, S. P. Becerra, and T. E. Shenk (2008)
J. Virol.
82, 2170-2181
|Abstract »|Full Text »|PDF »
Mutation-Selection Models of Codon Substitution and Their Use to Estimate Selective Strengths on Codon Usage.
K. C. Worley, G. M. Weinstock, and R. A. Gibbs (2008)
Physiol Genomics
32, 273-282
|Abstract »|Full Text »|PDF »
Human PAML browser: a database of positive selection on human genes using phylogenetic methods.
G. C. Nickel, D. Tefft, and M. D. Adams (2008)
Nucleic Acids Res.
36, D800-D808
|Abstract »|Full Text »|PDF »
28-Way vertebrate alignment and conservation track in the UCSC Genome Browser.
W. Miller, K. Rosenbloom, R. C. Hardison, M. Hou, J. Taylor, B. Raney, R. Burhans, D. C. King, R. Baertsch, D. Blankenberg, et al. (2007)
Genome Res.
17, 1797-1808
|Abstract »|Full Text »|PDF »
Accelerated Rate of Gene Gain and Loss in Primates.
C. S. Navara, J. D. Mich-Basso, C. J. Redinger, A. Ben-Yehudah, E. Jacoby, E. Kovkarova-Naumovski, M. Sukhwani, K. Orwig, N. Kaminski, C. A. Castro, et al. (2007)
Stem Cells
25, 2695-2704
|Abstract »|Full Text »|PDF »
A quantitative trait locus for variation in dopamine metabolism mapped in a primate model using reference sequences from related species.
N. B. Freimer, S. K. Service, R. A. Ophoff, A. J. Jasinska, K. McKee, A. Villeneuve, A. Belisle, J. N. Bailey, S. E. Breidenthal, M. J. Jorgensen, et al. (2007)
PNAS
104, 15811-15816
|Abstract »|Full Text »|PDF »
Long oligonucleotide microarrays for African green monkey gene expression profile analysis.
B. Jacquelin, V. Mayau, G. Brysbaert, B. Regnault, O. M. Diop, F. Arenzana-Seisdedos, L. Rogge, J.-Y. Coppee, F. Barre-Sinoussi, A. Benecke, et al. (2007)
FASEB J
21, 3262-3271
|Abstract »|Full Text »|PDF »
Biased clustered substitutions in the human genome: The footprints of male-driven biased gene conversion.
T. R. Dreszer, G. D. Wall, D. Haussler, and K. S. Pollard (2007)
Genome Res.
17, 1420-1430
|Abstract »|Full Text »|PDF »
Genomics, biogeography, and the diversification of placental mammals.
D. E. Wildman, M. Uddin, J. C. Opazo, G. Liu, V. Lefort, S. Guindon, O. Gascuel, L. I. Grossman, R. Romero, and M. Goodman (2007)
PNAS
104, 14395-14400
|Abstract »|Full Text »|PDF »
Encyclopedias of Life: From Diderot to the Yeti Crab.
Mobile DNA in Old World Monkeys: A Glimpse Through the Rhesus Macaque Genome.
K. Han, M. K. Konkel, J. Xing, H. Wang, J. Lee, T. J. Meyer, C. T. Huang, E. Sandifer, K. Hebert, E. W. Barnes, et al. (2007)
Science
316, 238-240
|Abstract »|Full Text »|PDF »
Demographic Histories and Patterns of Linkage Disequilibrium in Chinese and Indian Rhesus Macaques.
R. D. Hernandez, M. J. Hubisz, D. A. Wheeler, D. G. Smith, B. Ferguson, J. Rogers, L. Nazareth, A. Indap, T. Bourquin, J. McPherson, et al. (2007)
Science
316, 240-243
|Abstract »|Full Text »|PDF »