Note to users. If you're seeing this message, it means that your browser cannot find this page's style/presentation instructions -- or possibly that you are using a browser that does not support current Web standards. Find out more about why this message is appearing, and what you can do to make your experience of our site the best it can be.



Abstract
Full Text
Genomic Analysis of Gene Expression in C. elegans
A. A. Hill, C. P. Hunter, B. T. Tsung, G. Tucker-Kellogg, and E. L. Brown

Supplementary Material

Experimental Methods

Nematode growth conditions. Nematodes were grown at 25°C on NGM or HG plates using concentrated OP50 as a food source. For each developmental stage, embryos representing all 12 hours of embryonic development were isolated by hypochlorite treatment of embryos and adults collected from growing mixed-stage populations. For the 0-hour sample, a 1-ml aliquot of embryos was flash-frozen in liquid nitrogen and stored at -80°C. For the larval preparations, the embryos were plated at low density on seeded plates and allowed to develop for the specified time. For the 12-hour time point, the OP50 levels were minimal to reduce bacterial contamination of the samples. For each of the other larval and adult nematode preparations, the worms were separated from E. coli and other contaminants by sucrose density centrifugation. For the 60-hour preparation, adult worms and unlaid embryos (up to ~30-cell stage) were washed on 41-mm filters to remove larvae and laid eggs. The 2-week-old worm sample was prepared by first washing adult worms on 41-mm filters every 12 hours for five consecutive days and then replating on fresh plates and food. Greater than 95% of the worms at 14 days were phenotypically old (1), and none were detected that contained eggs (contaminating young progeny). Oocytes were isolated from fer-1 (ba576) adults grown at 26°C for 4 days. Oocytes were released from washed adults in 35 volumes of egg salts buffer (2) by treatment in a Waring blender. Oocytes were collected and washed on a 20-mm filter and flash-frozen in 100-ml aliquots.

RNA isolation. Total RNA was isolated from the frozen 1-ml aliquot by thawing and vigorous mixing, at 4°C for 10 min, in 9 volumes of fresh RNA extraction buffer [8 ml of guanidine thiocynate solution (0.16 M guanidine thiocynate, 0.8% sarcosyl, 4 mM Na citrate pH 7.0) 80 ml of BME, 0.8 ml of 2 M Na acetate (pH 4.0), 8 ml of phenol, and 1.4 ml of CHCl3]. After centrifugation, the aqueous phase was isolated, and the RNA was precipitated by the addition of 1 volume of isopropanol. RNA pellets were stored frozen at -80°C.

Labeling and hybridization of cRNA. Amplified, biotin-labeled cRNA was produced from total worm RNA as follows: 5 to 15 mg of total worm RNA was incubated for 10 min at 70°C with an oligo-dT primer containing a T7 RNA polymerase promoter site. Following priming, the RNA was reverse-transcribed into cDNA (65 min at 50°C for first strand synthesis with Superscript II RT, followed by 150 min at 15.8°C for second strand synthesis with E. coli ligase, E. coli polymerase, and RNAse H). cDNA was purified by paramagnetic beads (Perseptives), then transcribed into biotin-labeled cRNA in an in vitro transcription (IVT) reaction (16 hours at 37°C, Epicenter T7 RNA polymerase, Enzo Laboratories bio-11-CTP, bio-11-UTP). The cRNA product was purified by paramagnetic beads (Bangs Laboratories), then quantitated by UV absorbance. Two micrograms of cRNA per hybridization was fragmented (35 min, 95°C), then added to a hybridization cocktail containing BSA, herring sperm DNA (Promega), and an appropriate amount of the internal control spike-in transcript pool. A total volume of 200 ml of hybridization mixture was applied to each array, and hybridization was allowed to proceed overnight at 45°C on a rotisserie. Arrays contained ~250,000 25-mer oligonucleotide probes with 20 probe pairs per ORF. When hybridization was complete, arrays were stained as described in the Affymetrix Expression Analysis Technical Manual. Briefly, arrays were stained with strepavidin-conjugated phycoerythrin (SAPE, Molecular Probes), followed by biotinylated anti-strepavidin and a second round of SAPE for signal amplification. Following staining, fluorescent intensity was quantitated using the Affymetrix GeneChip laser scanner, and the resulting array images were captured in the GeneChip software package.

Array Data Reduction, the Absolute Decision, and Array Sensitivity

Data reduction. Raw array images were reduced to individual intensity values for each transcript using the Affymetrix GeneChip software.

The absolute decision. The software provided for each transcript an "absolute decision," which predicted if the gene was present at levels detectable by the array. The absolute decision for each transcript in each hybridization was either "present," "absent," or "marginal." The determination of the absolute decision was based on three heuristic metrics that depended on the magnitude of the difference between the observed hybridization intensity and the array background, and the fraction of probe pairs for a given gene which showed fluorescence above background and noise (for details, see the Affymetrix Expression Analysis Technical Manual). We quantitated the false-positive and false-negative rate of the GeneChip present call as follows. We first examined the false-positive rate by hybridizing a cRNA sample derived from a no-template control cDNA/IVT reaction to the "A" array. One transcript out of 6617 on the array was called "present," suggesting a false-positive rate of ~0 (0.015%). In a more stringent test, which reflected the effects of cross-hybridization in complex worm RNA mixtures, we counted the number of false-positive present calls arising from probe sets on the arrays that monitored nonworm transcripts that were not present. In a total of 768 absolute decisions generated from 12 distinct probe sets, there were seven false positive present calls (a false positive rate of 0.9%). We strove to minimize cross-hybridization effects during the array design process by selecting a probe sequence for each ORF that was as unique as possible to that ORF, based on comprehensive FASTA homology searching of each ORF against all other ORFs. We examined the false-negative rate of the present call by counting the missed present calls for 11 spike-in transcripts, which were present at a range of abundances, in all hybridizations. Each spike-in control was monitored by two distinct probe sets on each array, giving two estimates of the false-negative rate for each spike-in. The false-negative rate was zero for spike-in transcripts present at frequencies above 20 ppm (1:50,000) and then rapidly rose with decreasing frequency to ~0.6 for the lowest abundance spike-in transcript, at 3.3 ppm (1:300,000). Our ability to detect worm transcripts was determined by the false-negative rate and the number of hybridizations that we carried out.

Array sensitivity. In order to estimate our practical sensitivity of detection over the worm life-cycle, the false-negative rate was characterized by two logistic-form functions (3, p. 545). On the basis of these "optimistic" and "pessimistic" fitted false-negative rates and the negligible false-positive rate, we estimated the fraction of transcripts actually present in a cRNA sample that would be called "present" at least once in either 2 hybridizations (representative of the duplicate readouts, which were done for most individual RNA samples) or 15 hybridizations (representative of the total number of hybridizations done with each array, which varied from 15 to 21). The probability of receiving at least one present call in 2 to 15 hybridizations for transcripts that were truly present in all hybridizations was ~100% for transcripts present at frequencies > 10 ppm (1:100,000). For a transcript that was actually present at 3.3 ppm (1:300,000) in a sample that was hybridized to an array twice, we estimated that the probability of receiving at least one present call was ~20 to 60%. For a transcript that was actually present at 3.3 ppm (1:300,000) in a sample that was hybridized 15 times, we estimated that the probability of getting at least one present call was ~80 to 100%. To summarize, we estimate that we detected ~100% of transcripts that were present throughout development at >10 ppm (1:100,000), ~80% of transcripts that were present throughout development at 3.3 ppm (1:300,000), and perhaps ~20% of transcripts that were present at only a single developmental stage at 3.3 ppm (1:300,000).

Uncertainties in Frequency Quantitation

As described in our report, transcripts were quantitated in transcripts per million (ppm) by referring specific hybridization intensity (average difference, AD) for each transcript to a calibration curve constructed from spike-in control transcripts. The resulting values were termed "frequencies." The uncertainty in our frequency estimates can be considered in two components: the variation in repeated measurements of a single transcript (which determines our ability to resolve relative changes in individual messages over multiple samples) and the uncertainty of frequency quantitation in absolute terms.

Relative uncertainty. The coefficient of variation (CV) of repeated frequency measurements for a single transcript was typically 10 to 40%. This CV determines the uncertainty in the relative levels of any single transcript across multiple conditions. The sources of noise that contribute to this CV include uncertainty in mRNA quantitation and fluid volume measurements, variation in array staining, array-to-array performance differences, and scanner and sampling noise in the fluorescent signal output. The ANOVA analysis of the developmental time course samples all of these sources of variation, so our assessment of the significance of changes in individual genes over the worm life cycle should be robust.

Absolute uncertainty. The uncertainty in absolute quantitation of any transcript is primarily determined by any systematic bias in probe response between control transcript probe sets (used to construct the calibration curve) and worm transcript probe sets. Probe sets on our arrays included 20 distinct probe pairs each. To assess the variation in performance among these probe sets, we considered two independently selected probe sets (denoted I and II) for a set of 11 control messages. For each message, the paired probe sets I and II were 75 to 100% (average 89%) distinct (i.e., the number of unique probes in each 20-probe set was 15 to 20). All probes in both probe sets met the same heuristic probe selection criteria and came from comparable regions of the transcript. For each message that was called "present," we determined the ratio of the AD from probe set I to that of probe set II, in a set of 18 hybridizations of the A array. Ratios different from one indicated performance differences between the paired probe sets. Over the set of 11 messages, this ratio was 0.82 (n = 138, CV = 52%, 10th to 90th percentile range 0.39 to 1.44). On the basis of this analysis, we estimate that the uncertainty in absolute quantitation of any single message, due to performance differences between 20-probe-pair probe sets, is likely to be within about threefold of the mean. Unlike the relative uncertainty described in the previous paragraph, absolute uncertainty arising from probe set performance biases will not be reduced by averaging repeated measurements of the same transcript.

SOM Clusters of Developmental Gene Expression Profiles

SOM clusters of expression profiles for developmentally modulated genes are shown in Web fig. 1. A total of 4221 genes was detected at least once and were significantly modulated (ANOVA, P < 1 x 10-3) over the worm life cycle. A SOM was used to partition these genes into 36 clusters with similar normalized expression profiles. Clusters are identified by their row and column indices in the map, starting from the upper left-hand corner, e.g., the cluster in the bottom left-hand corner is cluster (6, 1) and the bottom right-hand corner is cluster (6, 6). In Web fig. 1, each cluster is labeled with its indices and the number of genes in the cluster, e.g., cluster (1, 1) contains 142 genes. Because normalized data are shown, there are no units indicated on the y axes. Superimposed on some clusters are listings of WormPD classes that were overrepresented in that cluster. For each class, the name of the class is given, followed by the degree of enrichment of this class in this cluster, compared to the entire genome, expressed as a fold enrichment. For example, there are 14-fold more "cell structure" genes in cluster (1, 1) than would be expected if this cluster was a random sample from the entire genome. For clarity, only gene classes for which the enrichment was significant (hypergeometric, P < 1 x 10-3), fold enrichment was more than three, and the number of the class in the cluster was greater than three are shown.

Data

Web table 1 gives the measured frequency of all developmentally modulated genes that are clustered in Web fig. 1. The genes are organized by cluster and listed alphabetically by ORF name within each cluster. Web table 2 gives the measured frequency of all genes on all arrays at all developmental stages. The genes are listed alphabetically by ORF name. In both tables, (*) indicates that the gene was called "absent" in some of the repeated hybridizations; (**) indicates that the gene was always called "absent." As described in the report, the sensitivity of each hybridization was determined on the basis of the frequency of the least abundant spike-in transcripts that were called "present." For genes that had frequencies that were less than this sensitivity, frequency was averaged with the sensitivity to give a damped estimate of the message abundance. For this reason, frequencies in the 1- to 20-ppm range, particularly for those genes marked with (*) or (**), must be interpreted very cautiously. Both Web table 1 and Web table 2 are available in tab-delimited and Excel97 formats at http://www.mcb.harvard.edu/hunter/Pubs.

References

1. C. Kenyon, in C. elegans II, D. R. Riddle, T. Blumenthal, B. Meyer, J. Priess, Eds. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1997), pp. 791-814.

2. L. G. Edgar, in Caenorhabditis elegans: Modern Biological Analysis of an Organism, H. F. Epstein, D. C. Shakes, Eds. (Academic Press, San Diego, CA, 1995), pp. 303-322.

3. N. R. Draper, H. Smith, Applied Regression Analysis, Wiley Series in Probability and Statistics (Wiley, New York, ed. 3, 1998).


Supplemental Figure 1.


Medium version | Full size version


These files are in Adobe Acrobat PDF format. If you have not installed and configured the Adobe Acrobat Reader on your system, please see Help with Printing for instructions.

Download supplemental table 1

Download supplemental table 2


To Advertise     Find Products


Science. ISSN 0036-8075 (print), 1095-9203 (online)