Abstract
Introduction
Array normalization, data transformation, and probe-set summarization are the three most common preprocessing steps for microarray data analysis.1,2 In practice, the three steps have been implemented in different orders depending on the specific data preprocessing pipeline. For example, the RMA pipeline follows the order of quantile normalization, log transformation, and probe-set summarization for the analysis of Affymetrix gene expression arrays, 3 while AgiMicroRna uses the order of log transformation, probe-set summarization, and quantile normalization for the analysis of Agilent microRNA (miRNA) expression arrays. 4 It remains uncertain whether the ordering affects downstream data analysis results.
In our previous works, two datasets were generated on the same set of tumor samples, where one dataset had no confounding array effects by experimental design and served as the benchmark, and another dataset exhibited array effects and served as the test data for evaluating normalization methods.5,6 Using the two datasets, we compared the relative performance of different array normalization methods and showed that quantile normalization performs better relative to the other normalization methods we examined. 6 During the analyses, we preprocessed the test dataset following the order of log transformation, array normalization, and median summarization. In this follow-up paper, we set out to further evaluate the performance of quantile normalization when combined with log transformation and probe-set summarization in different orderings.
Methods
Data Collection
A set of 192 untreated primary gynecologic tumor samples (96 endometroid endometrial tumors and 96 serous ovarian tumors) were collected at Memorial Sloan Kettering Cancer Center during the period of 2000–2012. The samples were profiled using the Agilent Human miRNA Microarray (Release 16.0), following the manufacturer's protocol. Two datasets were originated from the same set of samples using different array assignments and handling processes. The first dataset was created using blocked randomization to assign arrays to samples and was handled by one technician in one run. The second dataset had an array assignment in the order of tumor sample collection and was handled by two technicians in multiple runs, which imitated the result under a typical laboratory setting. In this study, we inherit these datasets and refer to the first dataset as the “block randomized” or “benchmark” dataset and the latter as the “test” dataset in this paper. More details on data collection can be found in the study by Qin et al. 5
Data preprocessing for the test dataset
Data transformation, array normalization, and probe-set summarization are the steps in microarray data preprocessing. 6 In this study, our choice of method for each of these three steps was log2 transformation, quantile normalization, and median summarization. There are a total of six combinations for ordering these three preprocessing steps. Since arrangement of logarithm and median results no difference mathematically, our choices of ordering trimmed to just four, which we call Orders A–D (Table 1). Orders A and B both applied quantile normalization to the probe level data. Their difference was whether quantile normalization was applied on the raw data or the logged data. Orders C and D both applied median summarization to replicate probes in each probe set before the normalization step, and then, quantile normalization was either applied to the raw or the logged probe-set level data. A comparison between the two ordering pairs, A–B versus C–D, can help answer the question whether quantile normalization should be applied at the probe level or the probe-set level. To serve as a basis in comparison, we also performed log transformation and median summarization – without quantile normalization – to our test dataset. In this study, we call this “Reference”. Comparing against Reference provides a context in understanding the magnitude of difference among Orders A–D.
Orderings of preprocessing steps applied to the test data.
Data Preprocessing for the Benchmark Dataset
The use of uniform handling and blocked randomization has been shown to be able to effectively control confounding array effects. 6 Therefore, the benchmark dataset was preprocessed only with log2 transformation and median summarization and without quantile normalization.
Differential Expression Analysis for the Benchmark Dataset and the Test Dataset
Two-sample
Method Comparison
The obtained differential expression results for the test dataset were tabulated against the results from the benchmark dataset. True positive rate (TPR), false positive rate (FPR), and false discovery rate (FDR) were computed for each preprocessed data.7,8 Their detailed definitions are provided in Table 2. A preprocessing ordering outperforms other orderings if it results in a higher TPR, lower FPR, and lower FDR.
Statistical measures for method comparison.
Simulation
One characteristic of the Agilent miRNA array data is that they possess a very low level of variation between probe replicates. 5 We speculate that this characteristic may influence the difference between applying quantile normalization to the probe level and to the probe-set level data. To test this hypothesis, we conducted a simulation study to further examine the effect of preprocessing step orderings when the between-probe variation was increased in the test dataset. The simulation followed the below steps.
Generate simulated test dataset: we generated random noise from a Gaussian distribution of zero mean and σ standard deviation and added them to the probe-level data of the test dataset. We considered four possible values for σ 0.2, 0.4, 0.8, and 2. These values of σ are multiples of the estimated standard deviation of probe replicates (0.4) observed in the empirical data.
Repeat and average: for each σ value, we created 100 simulated datasets and applied the preprocessing, differential expression analysis, and method comparison as described in the previous section. TPRs, FPRs, and FDRs were then averaged across the 100 simulation runs.
Results
Empirical Evaluation
Table 3A reports the TPRs, FPRs, and FDRs in percentage for each preprocessing ordering when applied to the test dataset and compared against the benchmark. In particular,
A. Results of differential expression analysis for the test dataset. A
The result for Order A (quantile–log2–median) identified 710 differentially expressed markers. Within these differentially expressed markers, 328 were identified as differentially expressed in the benchmark (TPR: 93.5%, FPR: 12.0%, FDR: 53.8%).
Order B (log2–quantile-median) showed 708 differentially expressed markers and 328 were also differentially expressed in the benchmark (TPR: 93.5%, FPR: 12.0%, FDR: 53.7%).
Order C (median–quantile–log2) indicated that 710 markers were different, among which 326 were also different in the benchmark (TPR: 92.9%, FPR: 12.1%, FDR: 54.1%).
Under Order D (median–log2–quantile), 712 markers were differentially expressed and 326 of them were also differentially expressed in the benchmark (TPR: 92.9%, FPR: 12.2%, FDR: 54.2%).
Test dataset prepared with the Reference (log2–median) indicated 1934 differentially expressed markers, and 185 of which were also differentially expressed in the benchmark (TPR: 52.7%, FPR: 55.1%, FDR: 90.4%).
The empirical evidence from the test dataset suggested that Orders B and A slightly outperformed Orders C and D. For Orders A and B, their TPRs were identical to each other and a little but not substantially higher than that for Orders C and D; their FPRs were a little lower than C's and D's. Moreover, Order B had the smallest FDR (53.7%) and hence was the most desirable of all. Order A closely followed with just less than 0.1% difference in FDR (53.8%), which corresponded to two more falsely discovered markers than B. Both Orders A and B applied quantile normalization to the probe level data, while Orders C and D applied it to the probe-set level. Order B differed from Order A in that it applied log transformation before quantile normalization.
To summarize, our result suggests that applying quantile normalization to the probe-level data is preferred than to the probe-set level data, and log transformation before quantile normalization is slightly more beneficial. Our results are robust to the

Receiver Operating Characteristics (ROC) curves comparing the differential expression
Results of differential expression analysis for the test dataset. A
Simulation Study
We further assessed the robustness of our empirical observation to the level of between-probe variation. The result of this simulation study is displayed in Figure 2. It shows that

Results of the simulation study. Dots represent the means and error bars the standard deviations for each summary statistics (TPR, FPR, and FDR) across the 100 simulation datasets for each simulation setting.
Comparing across the five levels of
As σ increased, FDRs and FPRs decreased for each of the orderings A–D. As shown in our previous work, quantile normalization tended to underestimated standard deviation in the original test dataset, which consequently inflated
As the value of σ increased from 0 to 2, Orders A and B remained to have higher TPRs and lower FPRs and FDRs than Orders C and D; however, the difference shrunk. This was primarily due to a drop in TPR for Orders A and B and an increase in TPR for Orders C and D (eg, σ = 0 to σ = 2: Order A – TPR: 93.5–93.1%, Order B – TPR: 93.5–93.1%, Order C – TPR: 92.9–93.0%, Order D – TPR: 92.9–93.0%, Reference – TPR: 52.7–52.7%).
In summary, the difference of applying quantile normalization to probe-level data versus probe-set level data depended on the level of between-probe variation as we have hypothesized; there is evidence suggesting that smaller between-probe variation can lead to greater advantage for applying quantile normalization at the probe level.
Conclusion and Discussion
Our results showed that the ordering of the three data preprocessing steps had a very small effect on the downstream analysis of differential expression in the context of Agilent miRNA array data. Nevertheless, the ordering of log transformation, quantile normalization on probe-level data, and median summarization slightly outperformed the other three orderings.
Our conclusion eases the anxiety over the uncertain effect that the orderings could have on data analysis of Agilent miRNA arrays. It can be potentially generalized to other types of microarray data where the three preprocessing steps – log transformation, quantile normalization, and median summarization – are needed and the between-probe variability is small.
Author Contributions
Conceived and designed the experiments: LXQ. Analyzed the data: HCH, QZ. Wrote the first draft of the manuscript: HCH. Contributed to the writing of the manuscript: LXQ, HCH. Agree with manuscript results and conclusions: LXQ, HCH, QZ. Jointly developed the structure and arguments for the paper: LXQ, HCH. Made critical revisions and approved final version: LXQ. All authors reviewed and approved of the final manuscript.
