Jump to: Page Content, Section Navigation, Site Navigation, Site Search, Account Information, or Site Tools.
|
|
Technical CommentsComment on "Evidence for Positive Epistasis in HIV-1"Bonhoeffer et al. (Reports, 26 November 2004, p. 1547) presented evidence for positive epistasis in a clinical data set of HIV-1 mutants and corresponding fitness values. We demonstrate that biases in the original and simulated data sets may lead to erroneous evidence for epistasis. More rigorous statistical tests must be used to account for such biases before one can infer epistasis. Department of Microbiology, University of Washington, Seattle, WA 98195, USA. * To whom correspondence should be addressed. E-mail: ram{at}compbio.washington.edu Bonhoeffer et al. reported "strong statistical evidence" for positive epistasis in a clinical sample of 9466 HIV-1 sequences (1). This is of potential importance because it contradicts theories that negative epistasis helps explain the evolution of recombination (2, 3). Their evidence for positive epistasis is derived from a plot showing a decelerating decline in log fitness with the number of mutations [figure 1B in (1)] and from a demonstration that the mean epistasis value for all possible pairs of alternative amino acids was significantly greater than zero [figure 2 in (1)]. Bonhoeffer et al. argue that their results are unlikely to be due to a paucity of viruses with low fitness in the absence of drugs, because these viruses were "generally derived" from patients on antiretroviral therapy. This argument assumes that fitness of HIV-1 in the absence of drugs is completely unrelated to fitness in the presence of drugs, which is contradicted by several sources of evidence. First, studies have shown that viral fitness in the absence of drugs gradually increases as viruses acquire secondary/compensatory mutations during therapy (46), resulting in positive correlation between fitness values in the presence and absence of drugs for clinical samples. Second, a positive correlation between drug hypersusceptibility and reduced fitness in the absence of drugs has been observed in clinical data sets (7, 8). Third, viruses with greatly impaired enzyme function have extremely low fitness in the presence or absence of drugs, so they will be underrepresented in clinical data sets. Finally, some of their samples may be obtained from untreated, recently treated, or lightly treated patients. All these factors indicate that viruses with low fitness but a high number of mutations are likely to be underrepresented in their clinical data set. However, they made no attempt to adjust for these biases when analyzing their data.
To demonstrate the magnitude of the effect of data biases on their conclusions, we performed similar analyses using a simulated data set without epistasis. We assumed a simple model that the log10 fitness value Y of a given 20-residue sequence can be written as
To determine whether these effects would apply to their original data, Monogram Inc. (formerly Virologic Inc.) ran our software on their data set with arbitrarily scrambled genotypes and phenotypes. Although this process precluded us from determining which mutations contribute to epistasis, it allowed us to evaluate whether their data set is exempt from the effects modeled above. Using our software, we were able to replicate the distribution of fitness values in figure 1A in (1) and the decelerating trend in their figure 1B (see our Fig. 2, A and B). We then discarded either 5% or 25% of the lowest fitness values and made the plot of log10 fitness values versus the number of mutations (Fig. 2, C and D). In both cases, we found a more extreme decelerating trend or even a slightly increasing trend in the tail of the curve, demonstrating artifacts caused by simple data biases. Finally, we made the plots after discarding the 5% or 25% highest fitness values (Fig. 2, E and F) and found that the slopes in the head of both curves are less steep than that in Fig. 2B, indicating that a paucity of high-fitness viruses could also result in misleading evidence for epistasis. Because of the absence of an unbiased reference data set, we cannot perform the reshuffling procedures needed to evaluate statistically the effect of culling on Bonhoeffer et al.'s all-pairwise test for epistasis, as we did on the simulated data set. These analyses show how small biases in real data sets, just as with our simulated data set, can easily result in misleading conclusions regarding epistasis.
In conclusion, we demonstrate that an underrepresentation of low-fitness viruses in simulated or clinical data sets can easily lead to erroneous signals for positive epistasis. Although using clinically derived HIV-1 sequences to test evolutionary theories for recombination is an appealing idea, more rigorous statistical tests must be used to account for such biases before one can infer epistasis.
The editors suggest the following Related Resources on Science sites:In Science Magazine
|
Science. ISSN 0036-8075 (print), 1095-9203 (online)