Summary of Data Analysis Parameters


A careful review of the data analysis in parts I and II reveals a choice of three numerical parameters.  We highlight these choices here.  Once these choices are made, the data analysis algorithm proceeds.  Consequently, one can use our algorithm in other array experiments.  For example, in separate work, we have applied these algorithms to analyze experiments done using glass slide ORF microarrays to study the replication timing profile for the same yeast strain used in the manuscript, as well as various mutant strains.

Choice #1:

In part I.1, the original raw data were obtained by averaging hybridization signals over 10kb windows at locations every 0.5 kb across the genome.

Choice #2:

The smoothing algorithm in part II.2 involves finding a FCS (Fourier Convolution Smoothing) closest to the 10kb moving average of the pooled HL curves.  The choice of a sliding 10 kb window was made to be consistent with choice #1.

The above two choices lead to the set of predicted origin locations (and attached confidence levels).  If one is interested in replication timing, one more choice is made.

Choice #3:

As in part II.5, when fitting timing curves at 0.5 kb intervals across the genome, we only keep the "good fits".  A good fit assumes the asymptotic value "a" (maximal replication) satisfies 0 < a ≤ 80 and a > d, where d is the "escape" replication percentage.  One could impose much more stringent criteria for being a good fit, but we opted to retain as much data as possible, aiming to optimize the correlation from "% HL(total) curve" to "trep curve".