Abstract
Introduction
A cell responds to environmental and physiological changes through reorganization of genomic expression. This kind of regulation is realized by transcriptional regulatory networks (TRNs), which are mainly controlled by transcription factors (TFs). Therefore, identifying the sophisticated architecture of TRNs would reveal the fundamental aspects of the mechanisms involved in the maintenance of life and adaptation to new environments.1–5
The first step toward reconstructing TRNs is to identify the target genes of known TFs.6–10 Genome-wide transcription factor binding analysis, also called ChlP-chip analysis, was developed to fulfill this goal.11,12 ChIP-chip analysis can be used to identify physical interactions between TFs and the promoter regions which they bind to. Simon et al 13 performed ChIP-chip experiments to find out the binding targets of nine major cell cycle TFs. Lee et al 14 performed ChIP-chip experiments to investigate how the yeast 106 TFs bind to promoter sequences across genome. Harbison et al 15 conducted genome-wide transcription factor binding assays for 203 TFs in yeast to construct an initial map of the yeast's transcriptional regulatory code. All these three studies are experiment-based approaches. They provided direct evidence of TF-promoter binding relationships. However, TF-promoter binding relationships are not equal to TF-gene regulatory relationships. A TF may bind to the promoter of a gene but has no regulatory effect on that gene's expression. Hence, additional information is required to solve this ambiguity inherent in ChIP-chip data.
Gene expression data were widely used to solve this problem. Exploiting the additional information provided by gene expression data, several algorithms have been developed to identify a TF's regulatory targets from its binding targets (inferred from the ChIP-chip analysis). For instance, Garten et al's method 6 used co-expression analysis, MA-Network 9 used multivariate regression analysis, and TRIA 7 used time-lagged correlation analysis on gene expression data to classify a TF's binding targets (inferred from the ChIP-chip analysis) into regulatory and non-regulatory targets. In this paper, we develop a new method, called REgulatory Targets Extraction Algorithm (RETEA), which applies partial correlation analysis between a TF and all those pairs of its binding targets which are highly co-expressed. Partial correlation analysis has been widely used to determine whether the association between two variables is due to the effect of the third variable.16,17 Here partial correlation is used to measure the residual correlation between two co-expressed binding targets of a TF after removing the TF's regulatory effect. Low partial correlation means that the co-expression between the two binding targets of the TF is mainly due to that TF's regulatory effect. That is, this co-expressed binding target pair of the TF can be regarded as the co-regulation pair of the TF. Therefore, RETEA assigns a pair of the TF's binding targets as the TF's regulatory targets if these two binding targets have high correlation but low partial correlation. The flowchart of RETEA could be seen in Figure 1.

The flowchart of RETEA.
Methods
Datasets
Four data sources were used in this study. First, the ChIP-chip data of the cell cycle TFs in the rich media growth condition were downloaded from Harbison et al's paper. 15 Second, the gene expression data of the yeast cell cycle process were downloaded from Paramila et al's paper. 18 Samples for all genes in the yeast genome are collected every 5 minutes for 25 time points, which cover two cell cycles. Third, the mutant data of the TFs under study are downloaded from Hu et al's paper. 8 They grew each of 263 TF knockout strains as replicates and compared mRNA expression of each of these strains with a wild-type strain using microarrays to identify the target genes whose expression profiles are affected when a TF has been knocked out. Fourth, the genome-wide distribution of the high-confidence TFBSs of many TFs in yeast was downloaded from MacIsaac et al's paper. 19 The high-confidence TFBSs were derived by using six motif discovery methods, with the requirement for conservation across at least two of four related yeast species.
REgulatory Targets Extraction Algorithm (RETEA)
We first define
The details of RETEA are as follows. Let
Assume that the genes
where
Results
Only a subset of a TF's binding targets are identified as its regulatory targets
Since cell cycle process is one of the most well-investigated cellular processes in yeast, we applied our method to identify the plausible regulatory targets of known cell cycle TFs (according to MIPS database).
20
Eleven cell cycle TFs whose sizes of
The numbers of genes in
First validation: Enrichment for cell cycle-regulated genes in B+R+ and B+R–
Since the function of a cell cycle TF is to regulate the expression of the cell cycle-regulated genes, the regulatory targets of a cell cycle TF should be enriched in cell cycle-regulated genes. Therefore, our predictions are validated if the cell cycle-regulated genes are more enriched in
The enrichment of the cell cycle-regulated genes in
Second validation: Enrichment for the common cellular processes and common molecular functions in B+R+ and B+R–
Because genes in

Testing for the enrichment for the common cellular processes and common molecular functions in
Taken together, the two validations mentioned above convincingly demonstrate that RETEA is capable of extracting a TF's regulatory targets from its binding targets.
Discussions
Performance comparison with three published methods
To identify the regulatory targets of a TF, Gao et al
9
developed MA-Network that used multivariate regression analysis on gene expression data and Wu et al
7
developed TRIA that identified a temporal relationship between a TF and its target genes. Besides, Garten et al
6
developed a method to identify a TF's regulatory targets by integrating the ChIP-chip, promoter sequence, and gene expression data. In their approach, gene
Since our method and the three published methods mentioned above are developed to do the same task, a performance comparison of these methods should be done. Since a TF has to bind to its regulatory targets in order to regulate their expressions, enrichment of the high-confidence TFBS among the identified regulatory targets of that TF can be used as a criterion for performance comparison. The high-confidence TFBS were downloaded from the MacIsaac et al's paper, 19 which were derived using six binding motif discovery methods, also including the requirement for conservation across at least two of the four related yeast species.
The details of the performance comparison are as follows. Let
Performance comparison of RETEA with MA-Networker using TFBS data.
Performance comparison of RETEA with TRIA using TFBS data.
Performance comparison of RETEA with Garten et al's method using TFBS data.
Determination of the thresholds used in correlation and partial correlation analysis
The threshold
The threshold values are determined by the following procedure. We ran RETEA using 12 different settings of the correlation threshold (
Performance comparison of RETEA using different correlation threshold (
Factors that affect the performance of RETEA
Two kinds of factors can affect the performance of RETEA. The first kind is the threshold values used in RETEA. We tried 12 different settings of the correlation threshold (
Applying RETEA to identify plausible regulatory targets of oxidative stress-response TFs
In this paper, RETEA is applied to identify regulatory targets of eleven cell cycle TFs. For showing the generality of RETEA, we demonstrated that RETEA also performs well for cell-cycle irrelevant regulators. In this regard, we applied RETEA to identify regulatory targets of TFs that are involved in the oxidative stress response. The genome-wide gene expression and ChIP-chip data under the oxidative stress were downloaded from Gasch et al's paper 22 and Harbison et al's, 15 respectively.
Using GO term finder in SGD
21
with FDR <0.05, we found that in most cases (8/11), the number of enriched common cellular processes in

Testing for the enrichment for the common cellular processes and common molecular functions in
Conclusions
In this study, an algorithm called RETEA is developed to identify the plausible regulatory targets of a TF from its binding targets. Since the binding of a TF to a gene does not necessarily imply regulation, algorithms like RETEA are needed in solving this ambiguity. We validated the effectiveness of RETEA by checking the enrichments for cell cycle-regulated genes, the common cellular processes and common molecular functions. Besides, the performance of RETEA was shown to be better than three published methods (MA-Net-work, TRIA, and Garten et al's method). In addition, we showed that RETEA performed well not only for cell cycle TFs but also for cell cycle-irrelevant TFs. Taken together, we are confident that RETEA has the ability to find biologically relevant results and can be useful in systems biology study.
Disclosures
This manuscript has been read and approved by all authors. This paper is unique and not under consideration by any other publication and has not been published elsewhere. The authors and peer reviewers report no conflicts of interest. The authors confirm that they have permission to reproduce any copyrighted material.
