Note to users. If you're seeing this message, it means that your browser cannot find this page's style/presentation instructions -- or possibly that you are using a browser that does not support current Web standards. Find out more about why this message is appearing, and what you can do to make your experience of our site the best it can be.
DyNAmo qPCR Kits

Site Tools

  • AAAS
  • Subscribe
  • Feedback

Site Search

Search Advanced

Science 28 June 2002:
Vol. 296. no. 5577, pp. 2354 - 2360
DOI: 10.1126/science.1070441


Abstract
Full Text
Diversity Considerations in HIV-1 Vaccine Selection
Brian Gaschen, Jesse Taylor, Karina Yusim, Brian Foley, Feng Gao, Dorothy Lang, Vladimir Novitsky, Barton Haynes, Beatrice H. Hahn, Tanmoy Bhattacharya, and Bette Korber

Supplementary Material

Materials and Methods

Phylogenetic trees and reconstructed ancestor sequences. Maximum likelihood trees were constructed by using the parallel version of fastDNAml (S1, S2), incorporating a general reversible model of evolution and using eight rate categories to assign a relative rate of evolution to each site. The model parameters, including the equilibrium base frequencies and the rates, were estimated according to the maximum likelihood method. The alignment was gap-stripped before making the tree. An additional feature was added to the parallel maximum likelihood tree building code to facilitate the reconstruction of an ancestral sequence at each internal node in the tree. For all nodes and all positions, the posterior probability of each base was calculated starting with an equilibrium prior, under the assumption that our maximum likelihood estimates of the tree and evolutionary rates and model parameters were sufficient for this purpose. The bases with the largest posterior probability at each position were then concatenated to form the reconstructed ancestral sequences at all internal nodes in the tree. Neighbor joining trees and bootstraps were performed by using PHYLIP version 3.5 on the same alignment as was used for the maximum likelihood tree (S3). For alignments for ancestor reconstructions, sites that had gaps in most sequences were excluded from the alignment, sites that had rare gaps had the gaps changed to N so the site could be left in for the tree but would not influence the base composition of the ancestors, and the codons were preserved. The sequence alignments for the trees included all C-subtype full-length gag and envelope sequences that were available to us, and a small set of reference sequences from other M-group clades from the database. The M-group consensus was the consensus of the subtype consensus sequences, derived from the full gag HIV-1 alignments from the year 2000 HIV sequence compendium reference alignments, and it was used as the outgroup. The full-length genome was used for the subtype ancestor reconstructions available at the HIV database (http://hiv-web.lanl.gov). The sequence alignments we used for the tree reconstructions were extracted from the year 2000 HIV sequence compendium reference alignments with additions of newly available subtype C sequences provided by F. Gao, V. Novitsky, and C. Williamson and their colleagues (some of these sequences have been described since this work was begun (S4) and can be found as accession numbers AF443074 to AF443115). The alignments and maximum likelihood phylogenetic code are available at http://hiv-web.lanl.gov.

Similarity plots. The similarity plots were generated by using the program SimPlot by S. Ray (http://www.med.jhu.edu/deptmed/sray/download/). For the protein similarity plots, a 200–amino acid window size and a 20–amino acid step between windows was used. For the nucleotide similarity plots, a 600-nucleotide window was used with a 20-nucleotide step between windows. The total number of sequences included for the C-subtype and M-group comparisons were limited to the 26 sequences that can be included in SimPlot, so only representative subsets of the full-length genome sequences were used.

Predictions of immunoproteasome cleavage sites. The immunoproteasome cleavage prediction scores were made by using NetChop, a neural network prediction program trained on 1110 experimentally verified, naturally processed COOH-termini of known human CTL epitopes, eluted from 59 HLA class I molecules. In the initial study non-HIV epitopes class I ligand boundaries were correctly predicted in 65% of the cleavage sites and in 85% of the noncleavage sites (http://www.cbs.dtu.dk/services/NetChop/) (S5). No HIV epitopes were included in the training set, so that NetChop could be applied to HIV sequences without biasing the results towards previously defined HIV epitopes. A cleavage prediction score was obtained for each site; high scores were found to be highly correlated with known COOH-terminal positions in over CTL epitopes in the HIV Immunology database, for example the P values were < 0.0001 for Env and RT (S6, S7). For the alignments of HIV-1 protein sequences, the medians of predictions over all sequences obtained for each site were obtained. The median cleavage prediction score for each position for each sequence in the C clade alignment was plotted against the value for each position in a vaccine candidate sequence, or against the median values for an alignment of B clade sequences.

Codon-specific nonsynonymous/synonymous substitution ratios. Nonsynonymous mutations become fixed with greater or lesser probability than synonymous mutations depending on whether a site is subject to positive or purifying selection, thus the ratio of nonsynonymous and synonymous substitution rates can be used to characterize specific codons and regions of proteins evolving under positive selection (S8). In viruses, regions under strong positive selection will occur where immune escape confers a selective advantage to new variants that arise within a host. Accordingly, codon-based models of molecular evolution can be used in conjunction with sequence data and maximum-likelihood parameter estimation to identify sites of potential immunological interest. The nonsynonymous/synonymous ratio of rate constants (dN/dS) is indicative of the selection pressure at the protein level: dN/dS < 1 is indicative of purifying selection and amino acid conservation because of structural and functional constraints, and dN/dS > 1 is indicative of diversifying, positive selection where amino acid substitutions confer an advantage. In this study we asked whether or not the precise positions under positive selective pressure may be lineage specific for HIV-1, comparing subtype B and subtype C, as representatives of these lineages are both under consideration as vaccine candidates in trial populations where the HIV-1 C clade dominates. The program CODEML (S8) (http://abacus.gene.ucl.ac.uk/software/paml.html) was used to determine the regional distribution of nonsynonymous to synonymous rate ratios in the V3 loop and flanking regions for two sets of 25 HIV-1 sequences, one for subtype B and the second for subtype C. For each subtype, a three rate prior with flexible locations and weights was fitted to the data. Bayes’ formula was then used to calculate the posterior distribution of the rate ratio at each codon. The posterior mean was taken as a site-specific estimate of the nonsynonymous to synonymous rate ratio.


Supplemental Figure 1. Maximum likelihood phylogenetic tree showing the genetic distances and relative positions of potential vaccine strains to subtype C gag (A) and subtype C env sequences (B), with a few additional representative sequences from other group M subtypes. The two-letter country code for the country of origin of subtype C sequences is indicated: India, IN; South Africa ZA; Botswana, BW; Tanzania, TZ; Israel, IL; Ethiopia, ET; and Brazil, BR. Potential C subtype vaccine strains are indicated by a bold branch, and their isolate name is given. The M-group consensus constructed from the consensus of each subtype was generated from the full alignment at the HIV-1 database web site. Branch points with neighbor joining bootstrap values of greater than 70 out of 100 replicates are indicated on the tree. In this study, whether the C-subtype consensus sequence was included or excluded from the maximum-likelihood tree building process, exactly the same subtype C ancestor was created. (The tree including the consensus is shown to illustrate where it appears in the tree, and the ancestor is the most probable sequence at the internal node that is indicated.) Only one sequence per infected individual was included in this tree. Sequence names are constructed by using the following convention: subtype is followed by an underscore, followed by the two-letter country code, followed by the sequence name. The env and gag trees are maximum likelihood trees built by using the general reversible model, whereas a maximum likelihood model was used to assign the relative rates of DNA evolution to each site. The env tree was built by using the base frequencies (T, 0.238698; C, 0.181576; A, 0.371886; G, 0.207840) and base substitution rate parameters (TC, 0.897250; TA, 0.128606; TG, 0.213391; CA, 0.333647; CG, 0.174980), whereas the gag tree was built using the base frequencies (0.224676, 0.201191, 0.360996, 0.213137) and rate parameters (0.882911, 0.103306, 0.125233, 0.256168, 0.11970).

Figure 1a


Medium version | Full size version

Figure 1b


Medium version | Full size version


Supplemental Figure 2. Scanning the HIV-1 genome and proteins to illustrate similarities between potential vaccine candidates and sequences from isolates. (A) and (B) compare 23 full-length subtype C sequences from South Africa, Botswana and India to potential vaccine sequences. Green lines represent the comparison with the subtype C consensus sequence. The purple and blue lines show the comparison of the sequences of vaccine candidates BR025 and ZA003, respectively. The red lines show an interclade comparison of subtype C sequences with the B clade sequence JRCSF. (A) shows a nucleotide similarity plot, and (B) shows the corresponding amino acid similarity plot. (C) compares consensus sequences with real sequences from the A, B, C, D, G, and J subtypes. Green lines represent within-clade comparisons, between consensus sequences and isolate sequences: the A clade consensus sequences compared with A clade sequences etc., for clades A, B, C and D. The red lines represent the consensus of each subtype compared with sequences not of its own subtype. The black line represents the M-group consensus compared with the sequences from all clades, and these lines are the same in (C) and (D). (D) shows sequences from isolates of one subtype compared with isolates from other subtypes in red, and within-subtype comparisons of sequences in green. The within-subtype similarities are comparable to the distances between the M-group consensus and all recent isolates (the green and black lines are overlapping). The sequences used to generate these plots are as follows: (A) and (B) compare the query sequences with the C clade sequences: C_ZM.96zm651-8m, C_97ZA009-2, C_BR.92BR025, C_BW.00BW1773.2, C_BW.00BW1811.3, C_BW.00BW3970.2, C_BW.00BW2036.1, C_BW.00BW3891.6, C_BW.00BW3871.3, C_BW.00BW1759.3, C_BW.00BW2063.6, C_BW.00BW1783.5, C_BW.00BW0762.1, C_IN.98IN012-14, C_IN.301999, C_IN.101, C_IN.21068, C_IN.98IN022, C_IN.94IN476-10, C_ZA.97ZA012-1, C_ZA.Du-179, C_ZA.Du-155 and C_ZA.97ZA00. For part C, green lines represent the A-consensus sequence compared with the subtype A isolates A_SE.SE8131, A_UG.92UG037 and A_KE.Q2317; the B-consensus sequence compared with subtype B isolates B_US.WEAU160, B_DE.HAN2 and B_TW.LM49; the subtype C consensus sequence compared with subtype C isolates C_BW.96BW1210, C_IN.21068, C_ZA.97ZA012-1 and C_ET.ETH2220; and the subtype D consensus sequence compared with subtype D isolates D_CD.ELI, D_KE.MB2059 and D_UG.94UG114. For part D, green lines represent one isolate of each subtype (A_KE.Q2317, B_US.WEAU160, C_ZA.97ZA012-1 and G_NG.92NG083) compared with the other sequences in its own subtype listed for part C.


Medium version | Full size version


Supplemental Figure 3. Comparison of proteasome cleavage sites between C clade and B clade sequences and vaccine candidates. The scatter diagrams show the correlations between the proteasome cleavage prediction scores, for all positions in comparisons of protein alignments and vaccine candidate sequences. The training set was true COOH-termini of non-HIV epitopes, and the predictions are based on the program NetChop. High prediction scores assigned by NetChop were highly significantly correlated with true COOH-termini of known HIV CTL epitopes, thus NetChop was able to distinguish the population of sites that were known COOH-termini from those that were not (K. Yusim, in preparation). For C and B alignments the median of prediction score over all sequences is assigned to each site. The subtype C alignment was the same as the one used for Figs. 1 and 3, the subtype B alignment came from the HIV database reference alignment for the year 2000. Each point on the diagram corresponds to an amino acid position and has two coordinates: median of the predictions for this position over C clade alignment and prediction for the potential vaccine sequence, or, in the first panel, median of the predictions for this position over B clade alignment. For both Gag (right column) and Env (left column) proteins,


Medium version | Full size version


the predictions for the C clade (y axis) were compared with predictions for the B clade, the combined consensus of each of the subtype consensus sequences (the M-group consensus, called CON-CON), M-group ancestral sequence, the C clade consensus and C clade ancestral sequence, and the vaccine candidates ZA003, BR025, ZM651, and ZA009. The correlations for all comparisons were highly statistically significant (P < 0.001). The figure displays only representative comparisons together with r2 values (where r is the linear correlation coefficient) for the B clade, M-group consensus sequences, BR025, and subtype C consensus sequence. The B clade comparison had the most scatter relative to the C clade, but the scores for positions were still highly significantly correlated, meaning that the cleavage sties would be predicted to be preserved across these two clades in many, but not all, positions. The vaccine candidates all behaved very similarly, and had r2 values roughly comparable to the M-group consensus and ancestral sequences; this suggests that any two subtype C sequences would be about as likely to share proteasome cleavage sites as the M-group consensus and a subtype C sequence. The C consensus and the C ancestral sequence gave the best correlations, and were generally predicted to be cleaved at a given site with a score comparable to the median scores of the sequences, therefore in a pattern most typical of the population of subtype C viruses. For the remaining comparisons the r2 values are: M-group ancestral sequence, Env: 0.8, Gag: 0.9; C clade ancestral sequence, Env: 0.88, Gag: 0.98; ZA003, Env: 0.8, Gag: 0.93; ZM651, Env: 0.79, Gag: 0.87; ZA009, Env: 0.81, Gag: 0.89.


References

S1. B. Korber et al. Science 288,1789 (2000).

S2. G. J. Olsen et al., Comput. Appl. Biosci. 10, 41 (1994).

S3. J. Felsenstein, PHYLIP V3.5. University of Washington (1984).

S4. J. van Harmelen et al. AIDS Res. Hum. Retrovir. 17, 1527 (2001).

S5. C. Kesimir et al. Protein Eng. 15, 287 (2002).

S6. K. Yusim et al. J. Virol., in press.

S7. HIV Molecular Immunology 2000 database, B. Korber et al., Eds. (Los Alamos National Laboratory, Los Alamos, NM), Part II.

S8. Z. Yang et al., Genetics 155, 431 (2000).





ADVERTISEMENT
Click Me!

ADVERTISEMENT
Click Me!

To Advertise     Find Products


Science. ISSN 0036-8075 (print), 1095-9203 (online)