
|
Large-Scale Transcriptional Activity in Chromosomes 21 and 22
Philipp Kapranov, Simon E. Cawley, Jorg Drenkow, Stefan Bekiranov, Robert L. Strausberg, Stephen P. A. Fodor,
and Thomas R. Gingeras
|
Supplementary Material
S1. The following human cell lines used in the study: A-375 (melanoma, ATCC no CRL-1619); CCRF-CEM (acute lymphoblastic leukemia; T lymphoblast); COLO 205 (colorectal adenocarcinoma, ATCC no. CCL-222); FHs 738Lu (normal fetal lung fibroblasts, ATCC no. HTB-157); HepG2 (hepatoblastoma, ATCC no. HB-8065); Jurkat (acute T cell leukemia); NCCIT (teratocarcinoma, ATCC no. CRL-2073); NIH:OVCAR-3 (ovarian adenocarcinoma, ATCC no. HTB-161); PC3 (prostate adenocarcinoma, ATCC no. CRL-1435); SK-N-AS (neuroblastoma, ATCC no. CRL-2137); U-87 MG (astrocytoma, ATCC no. HTB-14). Jurkat and CCRF-CEM were obtained from Dr. Jacques Corbeil, Center for AIDS Research and Veterans Medical Research Foundation, University of California San Diego.
Separation of the RNAs present in the nucleus and cytoplasm was evaluated using commercially available high-density oligonucleotide arrays. Total RNA derived from cytosolic or nuclear fractions of each cell line was converted into single-stranded cDNA using random primers, fragmented with DNAse I and end-labeled with terminal transferase as described below without the second strand cDNA synthesis. This cDNA was hybridized to Affymetrix HG_U-95A arrays in duplicate experiments. Probe set 38446_at selected to interrogate the X-chromosome inactivation gene (Xist) present on the Hu 95A arrays (Affymetrix) was used to test the quality of the nuclear/cytoplasmic separation techniques. Analysis of nuclear and cytoplasmic RNA fractions from Jurkat, CCRF-CEM, SK-N-AS, A375, HepG2, NCCIT and FHs 738Lu cell lines indicated that expression of the Xist gene was detected only in the nuclear RNA fraction of the female derived CCRF-CEM, SK-N-AS and A375 cell lines. Expression of this gene was not detected in the nuclear fraction of male derived cell lines nor in the cytoplamsic RNAs obtained from any of the cell lines (data not shown). In addition, a number of cDNAs of unknown functions containing LINE, HERV and other types of repeats as well as unique regions were frequently detected in the nuclear, but not the cytosolic fraction in various cell lines (data not shown). Furthermore, separations of nuclear and cytoplasmic RNA compartments allowed for the enrichment of low copy number RNAs. An increase in the detection of the expression of approximately 10-20% of total genes could be observed after RNA enrichment that accompanied nuclear and cytoplasmic fractionation.
Total cytosolic RNA and its polyA+ fraction were prepared using RNeasy and Oligotex kits (Qiagen) following the manufacturer's instructions. mRNA was mixed with random hexamers (83.3 ng/
g of mRNA; Life Technologies) and the bacterial control transcripts (see below) and subjected to the following cycling conditions in PE GeneAmp9600 PCR System: 70°C- 10 min and 10min ramp to 25°C after which the 5x Superscript II First Strand buffer (Life Technologies), DTT and four dNTPs were added to the following final concentrations of 1x, 10mM and 0.5mM, respectively, followed by a 10 min incubation at 25°C. At this point, Superscript II RTase was added (200Units/
g of mRNA; Life Technologies) followed by a 10 min ramp to 42°C and 60 min incubation at 42°C. The volume of the first strand cDNA synthesis reaction was 20
l per every 3
g of mRNA. After inactivation of the RTase for 15 min at 70°C, the first strand cDNA was split in 20
l aliquots and used as a template for the second strand cDNA synthesis using conditions described in the SuperScript Choice System for cDNA synthesis Manual (Life Technologies). After the second strand synthesis reaction, the mRNA template was degraded using a combination of RNAseA/T1 cocktail (Ambion) and RNAse H (Life Technologies). The second-strand synthesis reactions from each cell-line were pooled, purified using QIAquick PCR purification kit (Qiagen), ethanol-precipitated and subjected to a limited DNAse I (Epicenter Technologies) digest to generate fragments of 50-100 bp. The cDNA was labeled in 70
l using 100 units of terminal transferase (Roche) and 71.4
M of Biotin-N6-ddATP for 2 hrs at 37°C, after which it was directly used for hybridization in the following mixture: 30mM MES (Sigma M-2933); 74mM MES
Na (Sigma M-3058); 3M Tetramethylammonium chloride (Sigma T-3411); 0.1mg/ml herring sperm DNA (Life Technologies); 0.02% Triton X-100; 1X Eukaryotic Hybridization Controls (Affymetrix), 0.05nM biotinylated control oligos 948 or 213 (Affymetrix). Typically, 1-2
g of double-stranded labeled cDNA was used per hybridization on to arrays which contained feature measuring 14 x 14 microns. The chips were hybridized 16-18 hours at 45°C. Washing was done using the antibody amplification protocol as described in the Affymetrix Expression Analysis Technical Manual. Chips were scanned on Affymetrix Gene Array scanner using the highest PMT settings and 2
m pixel. Each sample was hybridized in triplicates.
Since the cDNAs copied from RNA from this sub-fraction were labeled and used as targets for the arrays, careful attention was paid to the removal of possible contamination of genomic DNA. As a control, cytosolic polyA+ RNA from NCCIT and COLO 205 cell lines was treated with RNase-free DNAse I (2 Units/
g of mRNA; Roche) in presence of 10mM Tris-acetate (pH7.5), 10mM magnesium acetate, 50mM potassium acetate, 1Unit/
l ANTI-RNAse (Ambion) for 1hour at 37°C. As a control for DNAse I digest, the reaction was spiked with the control DNAs (1ng/
g of mRNA) corresponding to the plasmids containing the following segments from each of the three bacterial controls LYS 328-1344, PHE 2016-3331, THR 247-2231 (see below for full description of these control genes). After DNAse I digest, the mRNA was purified by phenol/chloroform extraction and ethanol precipitation and used for cDNA synthesis and hybridization to the Chrom21_22 and DGCR arrays as described above. The number of the probes hybridizing within the known exons and outside of annotated regions was calculated and found not to be significantly different to these from the corresponding untreated samples (data not shown). As an additional control for genomic DNA contamination, total cytosolic RNA and its polyA+ fraction was pre-treated with DNAse-free RNAse (Roche) prior to RT-PCR reactions.
S2. The sequences selected were intended to minimize potential cross hybridization of characterized expressed transcripts, duplicated sequences of chromosomes 21 and 22 and repeated/low complexity sequences. To accomplish these goals oligonucleotide probe sequences were selected using empirically based rules developed at Affymetrix and pruned against the Unigene 95 database and chromosome 21 and 22 sequences for potential full or partial homologues. Candidate probe sequences residing in known repeat/low complexity regions were identified using Repeat Masker (http://repeatmasker.genome.washington.edu/RM/RepeatMasker.html) and rejected. Each probe pair on the Chrom 21_22 array interrogated the non-repeat genomic sequences on average by 35 bases. Further measures were taken to protect against cross hybridization of unintended target transcripts by using conservative use of the MM value, which are intended to measure the cross hybridization levels (S3).
S3. A probe pair with background-subtracted perfect match intensity PM and mismatch intensity MM is called positive if the ratio PM/MM exceeded some ratio threshold R and the difference PM-MM exceeded a difference threshold D, otherwise it is termed negative. Varying the thresholds yields different levels of sensitivity and specificity. Maps were generated using R in the range 1.1 through 1.5, and D in the range 4Q through 12Q, where Q, the pixel variation within features belonging to the 2nd percentile value of probe intensities for the chip, is an estimate of noise variation.
S4. Maps were improved by taking into account local probe behavior in a heuristic two-step process. In the first pass, runs of negative probe pairs in between positive probe pairs were re-classified as positive if the length of the negative probe run was at most maxgap bases in length. In the second pass, runs of positive probe pairs of length less than minrun bases were reclassified as negative. The effect of the steps is to reduce the false negative and false positive rates. The values of maxgap and minrun used were 5 and 20 respectively
S5. By fixing the R and D thresholds for any cell line experiment it was possible to calculate false positive (FP), specificity (Sp) and sensitivity (Sn) rates. Bacterial RNA transcripts containing specific sequence deletions were placed each in each polyA+ RNA sample. The following Bacillus subtilis genes/operons were used to estimate the FP rate: lys (LYS, 1612 bp, Acc. No. X17013); spo0B, obg, pheB, pheA (PHE, 3360 bp, Acc. No. M24537), thrC, thrB (THR, 2400 bp, Acc. No. X04603); jojC-birA (DAP, 6540 bp, Acc. No. L38424); trp operon (TRP, 2525 bp, Acc. No. K01391: bp. 1883-4404). The entire sequences of these loci were tiled on the DGCR chip. For the Chrom 21_22 arrays, probes were picked ~ every 30bp from the following regions of each gene/locus used: LYS 328-1344; PHE 2016-3331; THR 247-2231; DAP 1357-3196; TRP 1-2517 using identical probes selection rules as for the rest of the genomic sequences. A polyadenylated transcript corresponding to a smaller portion of each five loci was generated to evaluate the sensitivity of the assay, while the bacterial region outside of the spiked regions was employed in determination of the FP rates. The regions of each gene/locus corresponding to spiked transcripts are: LYS 817-1344; PHE 2852-3331; THR 1221-2231; DAP 1357-2493; TRP 1-1261. The control bacterial transcripts were spiked into human polyA+ RNA preparations before cDNA synthesis procedure at the following concentrations (copies/cell): LYS and PHE- 3; THR and DAP-10 and TRP-30, assuming 300,000 different mRNA species in a human cell and the size of an average transcript is 1300 nt.
False negative (FN) and sensitivity (Sn) rates for these array experiments were estimated by using the present segments of the spiked bacterial RNA control transcripts, and for the DGCR array, exon sequences determined to be present in the polyA+ RNA samples extracted from each cell line by means of reverse transcriptase-mediated PCR (RT-PCR) amplification assays. A total of 52/99 exon regions were detected as being present in the extracted poly A+ RNA from each of three cell lines (A-375, HepG2, SK-N-AS). From these experiments, it was also possible to determine FP, Sn and Sp values for each cell line for a set of fixed R and D values (6). For the array interrogating each base in the chromosome 22 DGCR, Table S1A illustrates that at a 5% FP rate a range of 47-65% Sn for the bacterial control sequences and 15-26% for the human exonic RNA sequences. Table S1B provides similar data for the chrom 21_22 array experiments at fixed R and D values. These data highlight the point that use of the bacterial control sequences as controls to evaluate Sn and Sp values may result in a higher sensitivity than the use of human exonic sequences. The differences in the bacterial and human Sn values can be attributed to differences in concentrations existing between the bacterial and human targets, to the differences in the nucleotide composition and sequence of the two types of controls (human and bacterial) in terms of their interaction with competing RNA found in human cells.
S6. Maps of a certain target false positive rate were generated by fixing the maxgap, minrun and D values, then adjusting R over the range 1.1 to 1.5 until the target false positive rate was reached in the bacterial controls. If the target rate was not achieved over the specified range of R the value achieving the closest was used.
Table S1: Sensitivity and Specificity Estimates
A. DGCR (22q 11.2)1.
|
| Cell Lines | BacSp22 | BacSn3 | HumSn4 | pct.Pos5 | pct.PosUnq6 |
| A-375 | 0.857 | 0.487 | 0.167 | 21.72 | 14.561 |
| CCRF-CEM | 0.817 | 0.613 | 0.221 | 20.642 | 11.077 |
| COLO 205 | 0.820 | 0.652 | 0.185 | 18.772 | 8.279 |
| FHs 738Lu | 0.775 | 0.473 | 0.261 | 22.872 | 14.499 |
| HepG2 | 0.795 | 0.555 | 0.240 | 23.203 | 15.82 |
| Jurkat | 0.783 | 0.542 | 0.153 | 20.064 | 9.876 |
| NCCIT | 0.804 | 0.545 | 0.162 | 21.664 | 9.584 |
| NIH: OVCAR-3 | 0.785 | 0.504 | 0.243 | 20.721 | 10.908 |
| PC3 | 0.792 | 0.559 | 0.161 | 17.35 | 6.765 |
| SK-N-AS | 0.873 | 0.259 | 0.109 | 16.708 | 9.676 |
| U-87 MG | 0.822 | 0.641 | 0.187 | 18.76 | 7.335 |
1.Estimates made at a ~5% FP rate with the exception of A-375 (FP=3%) and SK-N-AS (FP=1.4%), For each cell line D was set to 12Q and R was selected for each cell line to achieve the target FP, R ranged from 1.17-1.47 [S3-S6 (6)]. 2.Bacterial specificity, 3.Bacterial sensitivity. 4.Human Sensitivity. 5.Percent positive probes in the entire 360 kb DGCR. 6.Percent positive probes in non-repetitive sequences of the 360 kb DGCR. For the bacterial controls: the FP rate calculated as proportion of probes called positive in the regions of the bacterial controls absent in the sample; the BacSp2 was calculated from the formula TP/(TP+FP)
, where TP is the number of positive probes in the present regions of the bacterial controls, FP- the number of positive probes in the deleted regions of the bacterial control and the BacSn was calculated from TP/(TP+FN) with FN being the number of negative probes in the present regions of bacterial controls. For the human DGCR region: HumSn is a fraction of probes called positive within the 52 exons or parts of exons corresponding to the known genes (DGCR6, DGCR2 exons 6-10, DGS-I, DGS-H, DGS-A, SLC25A1 exons1-4 and Clathrin) and one validated locus RP8 shown to be present in the human cell lines using RT-PCR. The exact coordinates and descriptions of the regions used to calculate the HumSn rate can be found at http://www.netaffx.com/transcriptome/.
B. Chromosomes 21-221.
|
| Cell Lines | BacSp2 | BacSn | BacFp | pct. Pos | pct. Pos Exn |
| A-375 | 0.941 | 0.711 | 0.046 | 0.062 | 0.272 |
| CCRF-CEM | 0.88 | 0.861 | 0.121 | 0.115 | 0.44 |
| COLO 205 | 0.858 | 0.864 | 0.148 | 0.121 | 0.445 |
| FHs 738Lu | 0.874 | 0.735 | 0.117 | 0.094 | 0.341 |
| HepG2 | 0.886 | 0.859 | 0.114 | 0.099 | 0.386 |
| Jurkat | 0.926 | 0.742 | 0.061 | 0.073 | 0.335 |
| NCCIT | 0.904 | 0.787 | 0.088 | 0.086 | 0.341 |
| NIH: OVCAR-3 | 0.86 | 0.817 | 0.139 | 0.107 | 0.433 |
| PC3 | 0.853 | 0.829 | 0.151 | 0.145 | 0.447 |
| SK-N-AS | 0.949 | 0.646 | 0.036 | 0.059 | 0.234 |
| U-87 MG | 0.839 | 0.854 | 0.17 | 0.127 | 0.44 |
1.Thresholds fixed for all cell lines at R=1.3 and D=12Q (17). BacFP rate varies, see footnote to Table S1A.
| Table S2: RT-PCR Verification of Array Detected Transcripts1
|
| Region Number | Name | PCR start2 | PCR end2 | PCR length | RT-PCR | Library3 | Other Locations4 | Accession # |
| 1 | Chr21-1 | 41484371 | 41484656 | 285 | Yes | NIH: OVCAR-3 | Unique on 21 | BM873316 |
| 2 | Chr21-4 | 41539789 | 41540256 | 467 | Yes | N/D | Unique on 21 | BM873318 |
| 3 | Chr21-5-2 | 21333394 | 21334037 | 643 | Yes | HepG2 | Chr.11, 18 | BM873319 |
| 4 | Chr21-6 | 21320916 | 21321771 | 855 | Yes | HepG2 | Chr. 5,14 | BM873320 |
| 5 | Chr21-7 | 21471231 | 21471568 | 337 | Yes | HepG2 | Unique | BM873321 |
| 6 | Chr21-8 | 11773874 | 11774085 | 211 | Yes | HepG2 | Chr. 13, 17, 18 | BM873322 |
| 7 | Chr21-9 | 11604183 | 11604877 | 694 | Yes | HepG2 | Dup. on 21, Chr.18 | BM873323 |
| 8 | Chr21-10 | 11538194 | 11538927 | 733 | Yes | HepG2 | Dup.on 21, Chr.2 | BM873317 |
| 9 | Chr21-11 | 19259989 | 19260457 | 468 | | HepG2 | Unique | BM890561 |
| 10 | Chr21-12-1 | 14788711 | 14789087 | 376 | | HepG2 | Unique | BM890562 |
| Chr21-12-2 | 14788871 | 14789252 | 381 | | HepG2 | Unique | BM890563 |
| 11 | Chr21-13 | 16969428 | 16969683 | 255 | | HepG2 | Unique | BM890564 |
| 12 | Chr21-14 | 17912096 | 17912477 | 381 | | HepG2 | Multiple, strong similarity to ribosomal L37 gene | BM890565 |
| 12 | Chr21-15 | 39127983 | 39129061 | 1078 | Yes | HepG2 | Unique | BM890566 |
| 14 | Chr21-16 | 13527346 | 13527870 | 524 | Yes | HepG2 | Unique | BM890567 |
| 15 | Chr21-17-1 | 24997648 | 24998018 | 370 | | HepG2 | Unique | BM890568 |
| Chr21-17-2 | 24997898 | 24998325 | 427 | | HepG2 | Unique | BM890569 |
| 16 | Chr21-18 | 26632856 | 26633045 | 189 | | HepG2 | Unique | BM890570 |
| 17 | Chr21-19 | 23364973 | 23365515 | 542 | Yes | N/D | Unique | BM890571 |
| 18 | Chr21-20 | 21186011 | 21186690 | 679 | Yes | HepG2 | Unique | BM890572 |
| 19 | Chr21-21 | 19331943 | 19332313 | 370 | Yes | N/D | Unique | BM890573 |
| 20 | Chr21-22 | 18482854 | 18483206 | 352 | | HepG2 | Unique | BM890574 |
| 21 | Chr21-23 | 15532081 | 15532672 | 591 | | HepG2 | Unique | BM890575 |
| 22 | Chr21-24 | 17359969 | 17361292 | 1323 | | HepG2 | Unique | BM890576 |
| 23 | Chr21-25 | 17611334 | 17611677 | 343 | | HepG2 | Unique | BM890577 |
| 24 | Chr21-26 | 18143485 | 18143941 | 456 | | HepG2 | Unique | BM890578 |
| 25 | Chr21-27-1 | 18706335 | 18706620 | 285 | | HepG2 | Unique | BM890579 |
| Chr21-27-2 | 18706600 | 18706858 | 258 | | HepG2 | Multiple | BM890580 |
| 26 | Chr21-28 | 18841437 | 18841903 | 466 | | HepG2 | Unique | BM890581 |
| 27 | Chr21-29 | 19630931 | 19631365 | 434 | | HepG2 | Unique | BM890582 |
| 28 | Chr21-30 | 38469242 | 38469840 | 598 | | HepG2 | Unique | BM890583 |
| 29 | Chr21-31 | 39235731 | 39235996 | 265 | | HepG2 | Unique | BM890584 |
| 30 | Chr22 DGCR-1-1 | 11463 | 11753 | 194 | Yes | N/T | Dup. on 22 | BM873324 |
| Chr22 DGCR-1-2 | 15486 | 15973 | 487 | Yes | PC-3 | Dup. on 22 | BM873325 |
| Chr22 DGCR-1-3 | 16627 | 17211 | 584 | Yes | N/D | Dup. on 22 | BM873326 |
| 31 | Chr22 DGCR-2-1 | 164261 | 164831 | 570 | Yes | N/D | Unique on 22 | BM873327 |
| Chr22 DGCR-2-2 | 162186 | 163222 | 1036 | Yes | N/D | Unique on 22 | BM873328 |
| Chr22 DGCR-2-3 | 165841 | 166370 | 529 | Yes | N/D | Unique on 22 | BM873329 |
| 32 | Chr22 DGCR-3-2 | 277304 | 277569 | 265 | Yes | NIH: OVCAR-3 and HepG2 | Unique on 22 | BM873330 |
| 33 | Chr22 DGCR-4-1 | 80480 | 80863 | 383 | Yes | N/D | Dup. on 22 | BM873331 |
| 34 | Chr22-5 | 37645595 | 37646222 | 627 | Yes | HepG2 | Unique | BM890585 |
| 35 | Chr22-6 | 37973605 | 37973908 | 303 | | HepG2 | Unique | BM890586 |
| 36 | Chr22-7-1 | 30078531 | 30078780 | 249 | | HepG2 | Unique | BM890587 |
| Chr22-7-2 | 30078760 | 30079043 | 283 | | HepG2 | Unique | BM890588 |
| Chr22-7-3 | 30079458 | 30080259 | 801 | | HepG2 | Unique | BM890589 |
| 37 | Chr22-8 | 34042605 | 34043192 | 587 | | HepG2 | Contains Alu and LTR, non-repetitive sequence unique | BM890590 |
| 38 | Chr22-9-1 | 34198948 | 34199302 | 354 | Yes | HepG2 | Unique | BM890591 |
| Chr22-9-2 | 34199684 | 34200120 | 436 | | HepG2 | Unique | BM890592 |
| 39 | Chr22-10 | 23151780 | 23152082 | 302 | | HepG2 | Unique | BM890593 |
| 40 | Chr22-11 | 31838163 | 31838702 | 539 | | HepG2 | Unique | BM890594 |
| 41 | Chr22-12 | 24084616 | 24084940 | 324 | | HepG2 | Unique | BM890595 |
| 42 | Chr22-13 | 29463017 | 29463153 | 136 | | HepG2 | Unique | BM890596 |
1.Several PCR primer pairs were designed for each selected region (locus) called positive by the chip and used to query cytosolic polyA+ RNA samples from the cell lines used in the mapping experiments or cDNA libraries prepared from cytosolic polyA+ RNA. Primers were typically picked at or near positive probes or contigs (in case of the DGCR region) with a distance between forward and reverse primer on the order of 200-1000 bp. Typically, 3 to 15 primer pairs were designed for each locus, the size of which averaged ~1.4kb. For the DGCR region (Chr22 DGCR), the 5% FP (see on-line supplemental References and Notes) maps were used for primer selection. For the Chromosome 21 regions (Chr21-11 through Chr21-23), the of HepG2 cell line map with R=1.3 and D=12 was used. For all the remaining regions, a combined map of all eleven cell lines obtained with R=1.3 and D=12 was used. Coordinates of region(s) within each interrogated locus are shown where positive product(s) were detected either in the cytosolic polyA+ samples using RT-PCR or in the cDNA libraries from indicated cell lines. 2 The start and end of each such region is shown either in the coordinates of the sequence of the DGCR region tiled on the chip for the Chr22 DGCR loci or in the coordinates of the October 2000 freeze of the Golden Path sequence for the Chr21 regions. 3Positive PCR products were detected in the cDNA libraries made from indicated cell lines. 4Additional locations in the genome having sequences similar to the (RT-) PCR products as shown by the BLAT search (http://genome.cse.ucsc.edu/cgi-bin/hgBlat). In all cases in which a homologue was identified elsewhere on the genome, the (RT-) PCR products specific to sites interrogated on chromosomes 21 and 22 were observed because of chromosome 21 or 22 loci-specific SNPs. N/T- not tested; N/D- not detected.
Figure S3: Northern hybridization analyses of poly A+ cytosolic RNA obtained from 7 of the 11 cell lines (1: NIH:OVCAR-3, 2: Jurkat, 3: HepG2; 4: FHs 738Lu; 5: COLO 205; 6: CCRF-CEM; 7: A-375; 8: A-375 treated with DNAse I.). 3-5
g of cytosolic polyA+ RNA from each of the specified cell lines was loaded on the gel. The following DNA probes were used: (A) a cDNA derived from Chr22 DGCR-3-2 region and represented by bp 277304-277569 of the DGCR sequence; and cDNAs spanning entire validated regions (B) Chr22 DGCR-2-1; (C) Chr21-8 and (D) Chr22 DGCR-1-2. Each probe was labeled with [
-32P]-dCTP (Amersham) using the random hexamer labeling kit (Roche). Filters were hybridized in 0.5M sodium phosphate buffer pH. 7.2, 1% Bovine Serum Albumin, 7% SDS at 65°C overnight. After hybridization, filters were successively washed at 65°C in 2X SSC, 0.1% SDS; 1X SSC, 0.1%SDS and 0.3X SSC, 0.1%SDS, 15 min each wash and exposed to X-ray film for 3 weeks.

Medium version | Full size version