Abstract
Introduction
Single-celled organisms such as yeasts constantly face variable or even harsh external environments that threaten their survival or, at least, prevent them from performing optimally. Environmental changes can be of a physical or chemical nature such as temperature, oxidation, osmolarity, acidity and nutrient availability (Hohmann and Mager, 2003). The response mechanisms to stress are highly complex. One aspect of this cellular adaptation is the reorganization of gene expression. The gene expression program required for maintenance of the optimal cell physiology in one environment may be far from optimal in another environment. Thus, when the environmental condition changes abruptly, the cell must rapidly adjust its gene expression program to adapt to the new conditions (Gasch et al. 2000).
Large-scale reprogramming of gene expression can be revealed from genome-wide DNA microarrays, which measure the relative transcription levels of every gene in the yeast genome at any given moment, providing a snapshot of the genomic expression program (Gasch and Werner-Washburne, 2002). Exploring the dynamic nature of the yeast genome through time-course experiments can illuminate yeast stress responses. For example, Gasch et al. (2000) and Causton et al. (2001) used genome-wide expression analysis to explore how gene expression in yeast is remodeled over time as cells respond to heat shock, oxidative shock, osmotic shock, acidic stress, nitrogen depletion, amino acid starvation as well as other environmental stresses. They discovered that more than half of the genome is involved in responding to at least one of the investigated environmental changes. A set of genes (~10% of yeast genes), termed as the environmental stress response (ESR) genes or common environmental response (CER) genes, showed a similar drastic response to almost all of these environmental changes. Other gene expression responses appeared to be specific to particular environmental conditions. However, the regulators of these stress-response genes are not revealed from their studies. The complete network of stress-response regulators and the details of their actions remain to be investigated (Gasch et al. 2000).
Computational and statistical methods have been developed to identify plausible regulators of many cellular processes in yeast (Pilpel et al. 2001; Bar-Joseph et al. 2003; Segal et al. 2003; Middendorf et al. 2004; Yu and Li, 2005; Kim et al. 2006; Wu et al. 2006; Lin et al. 2007; Rokhlenko et al. 2007; Wu et al. 2007a; Wu et al. 2007b). When the time-course data of a system are available, using dynamic system modeling is more appropriate than using statistical approach because it can model the dynamic behavior of the system (Tegner et al. 2003; Chen et al. 2004; Chen et al. 2005; Chang et al. 2005; Chang et al. 2006). Therefore, in this paper we use a dynamic system model of gene regulation to describe the mechanism of how TFs may control a gene's expression. Then, based on the dynamic system model, we develop the Stress Regulator Identification Algorithm (SRIA) to identify stress-response TFs for six different kinds of stresses. Our goal is to reconstruct a network of stress-response regulators and study the details of their actions.
Methods
Data sets and data preprocessing
Two kinds of data sets are used. First, for each gene in the yeast genome, the TFs that may regulate its expression are retrieved from the YEASTRACT database (Teixeira et al. 2006b). The regulatory associations between the target gene and the TFs are included if they are supported by published data showing at least one of the following experimental evidences: i) Change in the expression of the target gene due to a deletion (or mutation) in the gene encoding transcription factor; these evidences may come from detailed gene by gene analysis or genome-wide expression analysis; ii) Binding of the transcription factor to the promoter region of the target gene, as supported by band-shift, foot-printing or Chromatine ImmunoPrecipitation (ChIP) assays. However, the TF-gene regulatory association data are noisy. The genes whose expressions are affected by the mutation of a TF may not be the direct targets of that TF. Moreover, the genes that are bound by a TF identified by ChIP-chip experiments may not be regulated by that TF since TF binding does not necessarily mean regulation. Therefore, other independent data source such as gene expression or proteomic data should be used to filter out the noise inherent in the TF-gene regulatory association data.
In this study, we incorporate the gene expression data into our analysis. The genome-wide gene expression time profiles under various stress conditions such as heat shock, oxidative shock, osmotic shock, acidic stress, nitrogen depletion, and amino acid starvation are from Gasch et al. (2000) and Causton et al. (2001). Under each stress condition, samples for all genes in the yeast genome are collected at multiple time points. In order to reconstruct the missing values and avoid overfitting, the cubic spline method (Faires and Burden, 1998) is applied to interpolate some extra data points. (The missing values of the microarray data are those data points whose values are questionable due to the experimental defects. Therefore, these data points are not reported in the microarray data and regarded as the missing values.) In order to fit the dynamic system model in the linear scale, the microarray data are transformed from the log2 scale to the linear scale.
Dynamic system model of gene regulation
We consider the transcriptional regulatory mechanism of a target gene as a system with the regulatory profiles of several TFs as the inputs and the gene expression profile of the target gene as the output. Owing to random noise and fluctuations at the molecular level, the transcriptional regulation of a target gene is described by the following stochastic dynamic equation
where
It has been shown that TF binding usually affects gene expression in a nonlinear fashion: below some level it has no effect, while above a certain level the effect may become saturated. This type of binding behavior can be modeled using a sigmoid function (Chen et al. 2004; Chen et al. 2005; Chang et al. 2005; Chang et al. 2006; Wu et al. 2006; Wu et al. 2007a). Therefore, we define
where
Estimating the parameters of the dynamic system model
Using the TF-gene regulatory association data from the YEASTRACT database and gene expression data from Gasch et al. (2000) and Caustion et al. (2001), we can estimate the parameters of the dynamic system model in Equation (1). We rewrite Equation (1) to the following regression form:
where ϕ[
From the gene expression data, it is easy to get the values of {
For simplicity, we can further define the notations
The parameter vector
Stress Regulator Identification Algorithm (SRIA)
After constructing the dynamic system model of gene regulation as in Equation (1) and estimate the parameter vector as in Equation (6), we are now ready to identify the stress-response TFs for six kinds of stresses. We develop Stress Regulator Identification Algorithm (SRIA) to do this task. SRIA can be divided into the following three steps:
The first step is to find out all the genes in the yeast genome that respond to a specific stress (e.g. heat shock, oxidative shock, osmotic shock, acidic stress, nitrogen depletion and amino acid starvation). A gene is said to respond to a specific stress if more than two time points of its gene expression profile measured under that specific stress are induced or repressed by more than three folds compared to that of the unstressed condition (see Supplementary Table 1 for details).
The high-confidence TFs in response to each of the six different stresses. The high-confidence TFs in response to heat shock, oxidative shock, osmotic shock, acidic stress, nitrogen depletion and amino acid starvation are shown. The TFs are in red (blue) if there exist solid (partial) experimental evidence showing that they are involved in the same stress as we predicted.
The high-confidence TFs in response to each of the six different stresses. The high-confidence TFs in response to heat shock, oxidative shock, osmotic shock, acidic stress, nitrogen depletion and amino acid starvation are shown. The TFs are in red (blue) if there exist solid (partial) experimental evidence showing that they are involved in the same stress as we predicted.
For each stress-response gene identified in Step 1, we retrieve all TFs that may regulate its expression from the TF-gene regulatory association data. Knowing this information enables us to construct the dynamic system model of the transcriptional regulation of the stress-response gene as in Equation (1). Then we apply the ML method to estimate the parameter vector θ = [
The stress-response TFs identified in Step 2 are ranked according to the number of times that they are identified under different stress-response genes. The first TF in the list is the one that is identified with the largest number of times. The TFs that are at the top 5% of the ranked list are classified as the high-confidence stress-response TFs. Finally, we output a ranked list of the high-confidence stress-response TFs for each of the six different kinds of stresses. The flowchart of SRIA could be seen in Figure 1.

The flowchart of SRIA.
Results
Stress-response TFs
We identified the TFs that respond to heat shock, oxidative shock, osmotic shock, acidic stress, nitrogen depletion, and amino acid starvation, respectively. Table 1 shows the high-confidence stress-response TFs for each of the above six stresses. The identified stress-response TFs can be divided into two categories. The first category is the well-known stress-response TFs with solid literature evidence that directly indicates involvement of these TFs in response to that specific stress. The second category is the novel stress-response TFs that have only partial or no literature support. (The partial literature support means that these TFs are predicted by pervious studies as plausible stress-response TFs but still need further verification.) We found that 38% (13/34 counting multiplicity) of the identified stress-response TFs belongs to the first category, indicating the effectiveness of SRIA. In addition, 52% (11/21 counting multiplicity) of the second category has partial literature support, revealing the predictive power of SRIA. Therefore, the novel stress-response TFs (Arr1, Ifh1, Rpn4 and Sok2) that have no literature evidence yet are worthy of experimental verification.
Biological validation of predictions
Now we discuss in detail our predictions that are supported by experimental evidence in the literature.
Heat shock
The predicted heat shock TFs Msn2, Msn4, Rpn4, Yap1 and Cad1 have solid or partial literature evidence. First, Msn2 and Msn4 bind DNA at stress response element (STRE) and activate many STRE-regulated genes in response to many stresses such as heat shock, oxidative shock and osmotic shock (Cherry et al. 1998). Second, it has been demonstrated that the heat shock TF Hsf1 co-ordinates a feed-forward gene regulatory circuit for
Oxidative shock
The predicted oxidative shock TFs Sfp1, Yap1, Rpn4 and Rap1 have solid or partial literature evidence. First, Sfp1 is known to regulate ribosomal protein (RP) gene expression in response to various stresses such as oxidative shock and osmotic shock (Marion et al. 2004). Second, Yap1 is known to regulate genes that respond to oxidative shock. For example, Yap1 regulates
Osmotic shock
The predicted osmotic shock TFs Msn2, Msn4, Sfp1 and Yap1 have solid or partial literature evidence. First, the msn2-msn4 double deletion mutants exhibit higher sensitivity to severe osmotic stress, indicating Msn2 and Msn4 are involved in response to osmotic stress (Cherry et al. 1998). Second, Sfp1 regulates RP gene expression in response to various stresses such as oxidative shock and osmotic shock (Marion et al. 2004). Third, the
Acidic stress
The predicted acidic stress TFs Msn2, Msn4 and Rpn4 have solid or partial literature evidence. First, it is known that
Nitrogen depletion
The predicted nitrogen depletion TFs Sfp1, Ifh1, Fhl1, Rpn4 and Rap1 have partial literature evidence. First, ribosomal protein (RP) genes in eukaryotes are coordinately regulated in response to growth stimuli and environmental stress, thereby permitting cells to adjust ribosome number and overall protein synthetic capacity to physiological conditions. Sfp1, Fhl1 and Ifh1 are known to regulate RP gene expression in response to nutrient depletion (Cherry et al. 1998; Marion et al. 2004; Güldener et al. 2005). Second, on solid growth media with limiting nitrogen source, diploid budding-yeast cells differentiate from the yeast form to a filamentous, adhesive and invasive form. Both low availability of nitrogen and a solid growth substrate are required to induce diploid filamentous-form growth. It is known that Rpn4 regulates filamentous growth, indicating that Rpn4 is involved in response to nitrogen depletion (Prinz et al. 2004). Third, Rap1 is known to be involved in the regulation of nitrogen, sulfur and selenium metabolism (Güldener et al. 2005).
Amino acid starvation
The predicted amino acid starvation TFs Gcn4, Rap1 and Sfp1 have solid or partial literature evidence. First, Gcn4 is a well-known transcriptional activator of amino acid biosynthetic genes in response to amino acid starvation (Cherry et al. 1998). Second,
Discussions and Conclusions
In this study, we use a dynamic system model of gene regulation to describe the mechanism of how the stress-response TFs may control a stress-response gene's expression. Based on the dynamic system model, we develop the Stress Regulator Identification Algorithm (SRIA) to identify the stress-response TFs for each of six kinds of stresses. Some general stress-response TFs that respond to various stresses and some specific stress-response TFs that respond to a specific stress are identified. For example, we found the well-known general stress-response TFs Msn2, Msn4, Yap1, Rpn4 and Sfp1, consistent with the results of
SRIA identified 12 distinct TFs (Arr1, Cad1, Fhl1, Gcn4, Ifh1, Msn2, Msn4, Rap1, Rpn4, Sfp1, Sok2 and Yap1) to be in response to at least one of the six stresses under study (see Figure 2). This indicates that a small number of TFs may be sufficient to control a wide variety of expression patterns in yeast under different stresses. Two implications can be inferred from this observation. First, the response mechanisms to different stresses may have a bow-tie structure (Csete and Doyle, 2004). As shown in Figure 2, the core stress-response TFs make up the ‘knots’ of a bow tie, facilitating the fan in of a large variety of environmental stresses through signal transduction pathways and fan out of an even larger variety of stress-response proteins through activating stress-response target genes. Actually, approximately two-thirds of the yeast genome (about 3600 genes) is involved in responding to the changes in environment (Causton et al. 2001). Second, there may exist regulatory cross-talks among different stress responses (see Figure 3). We found that heat shock, osmotic shock and acidic stress all can trigger TFs Arr1, Msn2, Msn4 and Rpn4, indicating that these three stresses may share a similar stress response mechanism. In addition, we found that oxidative shock, nitrogen depletion and amino acid starvation all can trigger TFs Arr1, Rap1, Rpn4 and Sfp1, indicating cross-talks among the cellular responses to these three stresses. The fact that different stress response mechanisms share some, but not all, of their regulators suggests a higher level of modularity of the yeast stress response network (Segal et al. 2003).

The network of stress-response regulators in yeast.

Regulatory cross-talks among different stress responses.
In Step 3 of SRIA, only those stress-response TFs that are at the top 5% of the ranked list are classified as the high-confidence stress-response TFs and reported as the final result. The reason for choosing only the top 5% of the ranked list is that when the cutoff threshold equals 5%, SRIA has the best performance in terms of the tradeoff between maximizing the true positive rate and minimizing the false negative rate to find out the known amino acid starvation TFs (see Figure 4 for details).

Statistics of the performance of SRIA using different cutoff thresholds.
The response mechanisms to stress are highly complex. They require a complex network of sensing and signal transduction leading to the adaptation of cell growth and proliferation along with the adjustments of the gene expression program, metabolic activities and other features of the cell (Hohmann and Mager, 2003). This study focused on the regulation of gene expression and proposed a network of stress-response regulators and their details of actions. Thus, it provides a starting point for understanding the adaptation mechanisms that yeast uses to survive some of the environmental stress conditions, experienced in the wild. We believe that as more gene expression data are accumulated, in combination with data from other whole-organism approaches, novel computational algorithms such as SRIA have the potential to construct a dynamic picture of the integrated cellular response of yeast cells to environmental changes.
