Abstract
Introduction
The lifetime risk of developing breast cancer is about one in eight for women, with around 192,370 new invasive cases, 62,280 new in situ cases being diagnosed and 40,170 deaths in the United States each year. 1 Breast cancer is the second leading cause of cancer death among women in the US. While major inroads have been made in reducing mortality rates due to increased screening, digital mammography, specialized care, and the widespread use of therapeutic agents such as aromatase inhibitors, trastuzumab and others, defining the genetic architecture of breast cancer remains an important long-term goal for the development of more effective therapeutic strategies and early interventions. Recent advances in microarray technology and reduction in genotyping costs have made possible genome-wide association studies to identify genetic variants associated with risk for breast cancer.2–6 Although these studies are providing valuable clues about the broad patterns of genetic susceptibility to breast cancer, the ultimate goal of SNP and gene discovery is to identify and characterize the biological pathways and the molecular mechanisms underlying the disease. This is especially important in breast cancer, a group of biologically and genetically heterogeneous diseases with distinct oncogenic pathways and therapeutic targets. To date, there is little information associating GWAS data to known oncogenetic pathways involved in breast cancer. This knowledge gap is hindering translation of discoveries from GWAS into clinical practice to develop clinically useful genetic tests as well as early therapeutic interventions and new targeted drugs.
The objective of this study was to investigate the association of GWAS information with the Notch signaling pathway. The rationale for choosing the Notch signaling pathway in this study was that apart from its involvement in breast cancer, the Notch signaling pathway is involved in many types of cancer including breast cancer, lung cancer, neuroblastomas, skin cancer, cervical cancer, and prostate cancer.
7
However, it is less well characterized compared to other biological pathways involved in breast cancer such as the estrogen, kinase, apoptosis and P53 pathways which are enriched with SNPs associated with risk for breast cancer. Consequently, association of GWAS information with the Notch signaling pathway may provide proof of concept that this approach could work and provide insights about the putative functional bridges between GWAS information and biological pathways that are less characterized and may not contain genes harboring mutations or SNPs associated with cancer. The Notch signaling pathway is extremely contextual-dependent, meaning that crosstalk and interaction with other pathways including those enriched by SNPs associated with risk for breast cancer would be very important in determining outcomes. With the exception of T-ALL (T-cell acute lymphoblastic leukemia),
7
there are very few instances where mutations have been detected in solid tumors in Notch pathway genes, despite solid evidence that the pathway itself is very important to the biology of tumors.8–13 Thus, it is conceivable that the genes in the Notch signaling pathway may be regulated
Our group 8 among others9,10 have shown that the Notch signaling pathway plays a critical role in the development of breast cancer. Numerous cellular functions and microenvironment cues associated with tumorigenesis are modulated by Notch signaling, including cell fate, proliferation, apoptosis, adhesion, and angiogenesis.11,12 Additionally, Notch signaling plays an important role in the maintenance of breast tumor-initiating cells. 13 Of the four known Notch receptors, three have been implicated in breast oncogenesis (Notch-1, –3, and –4) while one (Notch-2) has been suggested to have opposite roles and have a positive prognostic significance. 14 Notch-2 has recently been associated with ER-positive breast cancer tumors. 15 Both pan-Notch inhibitors and specific monoclonal antibodies (mAb) to individual Notch receptors are being developed for breast cancer. However, the molecular mechanisms underlying dysregulation and aberrant expression of Notch receptors and other genes involved in the Notch signaling pathway leading to breast cancer remain poorly understood. The association between the Notch signaling pathway and genes containing SNPs associated with risk for breast cancer could provide putative functional bridges between GWAS information with an oncogenic pathway that does not harbor mutations, but is involved in cancer development and progression.8–13 Therefore, elucidating the association of GWAS information with the Notch signaling pathway may help to determine which patients may benefit from Notch inhibitors and to explore the role of Notch transmembrane receptors as potential drug targets and predictive markers.
We hypothesized that genes containing SNPs with large (
Methods
Data sources
We mined the literature through PubMed searches and websites containing supplementary data on 41GWAS to identify SNPs and genes associated with risk for breast cancer. The search included terms (GWAS, GWA, WGAS, WGA, genome-wide, genomewide, whole genome, all terms + association, or + scan) in combination with breast cancer from the primary published reports through July 2010. All the reports were read and information was manually extracted and entered into the database. The inclusion criterion was that the study must include a sample size of ≥500 cases and ≥500 controls. We catalogued SNPs with large (
We used two publicly available gene expression data sets based on the case control design as in GWAS design to evaluate and establish the expression levels of candidate genes and genes involved in the notch signaling pathway. The first data set involved the Caucasian population, and consisted of 143 histologically normal breast tissues derived from patients harboring breast cancer who underwent curative mastectomy and 42 invasive ductal carcinomas of various histological grades obtained from breast cancer patients. The data set has been fully described by the originators. 16 Briefly, this data set consisted of histological data. Histologically-normal breast has the potential to harbor pre-malignant changes at the molecular level and thus provides a boon for identifying risk markers. We postulated that a histologically-normal tissue with tumor-like gene expression patterns might harbor substantial risk for future cancer development. Thus genes associated with these high-risk tissues would be considered to be malignancy-risk genes. From this assumption, it follows that these genes could serve as potential molecular predictors of breast cancer. Normal breast cancer tissue included histologically normal and benign. All samples were assessed for global gene expression profiles using the Affymetrix platform on U133 Plus 2.0 Array. The tumors were not associated with any known genetic risk factors such as BRCA1 or BRCA2 mutations. The microarray data from these samples including the raw probe-level hybridization intensities were downloaded from the NCBI's Gene Expression Omnibus (GEO) database under accession number GSE10780.
Most GWAS have been performed on Caucasian populations. It remains unclear to what extent findings from these studies can be extrapolated to non-Caucasians. To determine whether results found using data from Caucasian population could be replicated in the Asian population, we used a second gene expression data set. The second data set involved a multi-ethnic Asian population, consisting of Malaysian breast cancer patients (Malays, Chinese and Indian). The data set has been described by. 17 Briefly, the data set consisted of a total of 43 breast carcinomas and 43 patient-matched normal tissues collected from Kuala Lumpur, UKM and Putrajaya Hospitals in Malasia. The data set was generated using the Affymetrix platform's U133A Chip and was downloaded from GEO accession number GSE15852. The two data sets contained similar information, both involved ductal carcinomas with same tumor grades. The clinical and histological characteristics of the two gene expression data sets used are summarized in Table 1.
Clinical and histological characteristics of Caucasian and Asian patients used in this study to generate gene expression data.
In each of the two microarray data sets described above, entries in the data matrix were expression values generated by Affymetrix's Microarray Analysis Suite 5.0 (MAS5) statistical algorithm. 18 Following normalization and scaling, MAS5 signal values were summarized by Turkey's biweight estimation of the probe level intensities within each probe set. This was followed by a global normalization (linear scaling) to give all chips the same average intensity. These procedures yield robust weighted means called average-scaled differences that are proportional to the amount of a particular RNA transcript present in the sample after background correction, which we used as the input in this analysis, after filtering out spiked control genes.
Data analysis
As a first step, we mapped SNPs to the genes by matching gene names, SNP IDs and positions using the information in the database (dbSNP). We then sorted and ranked the genes on the basis of
Briefly, we assumed that the
where,
The challenge was how to represent a gene containing multiple SNPs within the gene and how to account for correlations among those SNPs. Correlations among
Next, we matched the 150 candidate genes containing SNPs along with genes involved in the Notch signaling pathway to probes on the U133 Plus 2.0 Chips and U133A Human Chips, representing gene expression from the Caucasian and Asian populations, respectively. The probes were extracted from the NetAffx Database using the batch query (Affymetrix Inc). We then used probes to extract the gene expression values for the candidate genes and genes involved in the Notch signaling pathway from gene expression data sets on Caucasian and the Asian populations, respectively.
On each data set containing candidate genes and genes involved in the Notch signaling pathway, we performed supervised analysis comparing mean gene expression profiles in cancer patients to mean gene expression profiles in cancer-free controls to identify significantly differentially expressed genes, which distinguished the two groups and were predictors of cancer, as demonstrated in Figures 1 and 2 for the Caucasian and Asian populations, respectively. We used the Benjamin and Hochberg
22
procedure to correct for multiple testing. Genes were then ranked on estimated

Distribution of mean expression values for candidate genes between breast cancer patients (y-axis) and normal subjects (x-axis) in the Caucasian population. Blue dots significantly deviating from the red line indicate differential expression. The genes, estimates of

Distribution of mean expression values for candidate genes between breast cancer patients (y-axis) and normal subjects (x-axis) in the Asian population. Blue dots significantly deviating from the red line indicate differential expression. The genes, estimates of
where
Finally, we performed pathway prediction and network modeling using the Osprey System 24 to identify candidate genes which interact with genes involved in the Notch signaling pathway and other biological pathways relevant to breast cancer. The Osprey network modeling and visualization system is a very dynamic software platform which integrates experimental information from the literature with gene ontology information from the GO database about all the genes. Therefore, it allowed capturing all the genes that interact with the input genes (ie, candidate genes and genes in the Notch signaling pathway) that have been experimentally confirmed and are involved in the same biological process, which may have been missed during differential expression and co-expression analysis. Thus, is an optimal tool for pathway prediction, network modeling and in silico validation of predicted pathways and gene networks.
In pathway prediction and network visualization, we first performed pathway prediction using a set of candidate genes containing SNPs associated with risk for breast cancer and members of the Notch signaling pathway, which were differentially expressed between cases and controls in the Caucasian population. We repeated the same analysis for the Asian population. To determine whether genes containing SNPs with larger effects and SNPs replicated in multiple independent studies interact with genes in the Notch signaling pathway, we performed separate analysis for each set of genes. In pathway prediction, genes were represented by nodes and the interactions by vertices. Two genes were considered to share a genetic susceptibility architecture and network properties if they were interconnected as represented by the vertices and were correlated as determined by the correlation coefficient. To determine the functional relationships and biological properties of genes in the networks, we used the biological process category of the Gene Ontology classification built in the Osprey System to color-code the nodes (genes). We imposed level 3 filtering criteria to remove genes with spurious interactions, which could be less informative or could distort the reliability of network modeling. This approach also served as a validation step in that we randomly removed genes with fewer interactions and repeated the analysis.
Results
We investigated the association of GWAS information with the Notch signaling pathway. GWAS information included a total 497 SNPs associated with risk for breast cancer. The SNPs were derived from 41 GWAS, totaling more than 250,000 cases and 250,000 controls, mostly (99%) from the Caucasian populations. From the total, 112 SNPs were located in intergenic regions and were not used in the analysis. The remainder, 385 SNPs mapped to 150 genes, of which 130 candidate genes matched probes on the U133 Plus 2.0 Chip for data on Caucasian population and 111 candidate genes matched probes on the U133A Chip for data on Asian population, and were used in the analysis. The discrepancy in the number of genes in the two data sets is due to differences in Chip density (ie, difference in probes and unique number of genes represented on the U133 Plus 2.0 and U133A Human Chips). The list of gene symbols, SNP (rs_IDs), number of SNPs per gene along with the primary sources (ie, references) are provided as supplementary material in Table A in the appendix. Genes involved in the Notch signaling pathway included the 4 members of the Notch family of transmembrane receptors,
As a first step, we evaluated the expression of candidate genes and genes involved in the Notch signaling pathway by comparing normal breast to breast tumors in Caucasian and Asian populations using publicly available gene expression data described in the methods section, as demonstrated in Figures 1 and 2 for the Caucasian and Asian populations, respectively. We sought to identify candidate genes and members of the Notch signaling pathway that were significantly differentially expressed between breast cancer and normal tissue. Such genes would serve as molecular predictors of breast cancer. We then used the identified differentially expressed genes as the input for pathway prediction and network modeling.
Using supervised analysis, we identified 71 candidate genes and 12 genes involved in the Notch signaling pathway, with significant differences in expression profiles between the cases and controls in the Caucasian population. The list of significantly differentially expressed genes involved in the Notch signaling pathways included
This suggests that like GWAS results, gene expression can be heterogeneous among populations, making it difficult to replicate results. The observed differences in expression profiles between cases and controls in the two populations can be attributed to several factors; including the fact that gene expression varies among populations, 25 differences in tissue procurement timing and storage, use of chips with different probe densities, as well as the genetic and phenotypic heterogeneity inherent in the GWAS data used in this study. Co-expression analysis however revealed that candidate genes that were not differentially expressed exhibited co-expression patterns with sets of genes distinguishing cancer from normal controls.
To formally test the hypothesis that candidate genes containing SNPs associated with risk for breast cancer interact with genes involved in the Notch signaling pathway, we performed pathway prediction and network modeling. As a first step, we performed pathway prediction and network modeling using the 71 candidate genes confirmed in the Causation population and all genes involved in the Notch signaling pathway. In addition we modeled the biological relationships of the genes in the predicted pathways and networks using Gene Ontology information and experimental information derived from the literature by text mining using the module built in the Osprey System. The key for GO information characterizing genes in the predicted pathways and networks according to the biological process in which they are involved is presented in Figure 3. The results of pathway prediction and network modeling for the Caucasian population are presented in Figure 4. For easy interpretation throughout the figures, names of candidate genes (ie, genes containing SNPs associated with risk for breast cancer) are shown in red. Names of genes involved in the Notch signaling pathway are shown in blue, while names of the new set of genes not identified by GWAS are shown in black. Nodes represent the genes and the vertices represent the interactions.


Results of pathway prediction and network modeling showing interactions between genes containing SNPs associated with risk for breast cancer and genes involved in the Notch signaling and other biological pathways based on the Caucasian population. Nodes represent genes and vertices represent interactions. The color code denotes the biological process in which the genes are involved as defined in Figure 3. The color codes in the vertices indicate the experimental techniques or a combination thereof used to confirm the relationship between the genes as determined by the experiments reported in the literature. Candidate genes containing SNPs associated with risk for breast cancer are shown in red, genes involved in the Notch signaling pathway are shown in blue and new genes not reported in GWAS studies are shown in black. For the full names of genes and number of SNPs per gene including GWAS references, please refer Table A in supplementary data.
Members of the Notch family of transmembrane receptors
To evaluate the strength of association between candidate genes and the Notch signaling pathway in the Caucasian population we estimated correlations between pairs of genes. Focusing on candidate genes with SNPs replicated in multiple GWAS studies. Significant correlations (
In general, the interactions between candidate genes and Notch signaling appears to be complex involving multiple pathways, suggesting that multiple interacting pathways are likely involved in the development and progression of breast cancer. The involvement of multiple pathways also indicates that interactions between the Notch signaling pathway and candidate genes containing SNPs associated with risk for breast cancer may involve multi-pathway crosstalk. A clear example was the involvement of NUMB, which is involved in the Notch signaling pathway but also controls P53 tumor suppressor activity in breast cancer. 26 NUMB is a cell fate determinant, which by asymmetrically partitioning at mitosis, controls cell fate choices by antagonizing the activity of the plasma membrane receptor of the Notch family. 27
Of particular interest were the three-way interactions among genes containing SNPs with large (
A major concern in genome-wide association studies is that majority of the GWAS studies ~99% (based on this study) have been conducted on Caucasian populations. To determine whether results of pathway prediction and network modeling observed in the Caucasian population could be replicated in the Asian population, we performed pathway prediction using the 31 differentially expressed candidate genes identified using gene expression data derived from the Asian population and the set of genes involved in the Notch signaling pathway. The results showing pathway prediction and gene interaction networks for genes containing SNPs and members of the Notch signaling pathway based on the Asian population are presented in Figure 5. Genes involved in the Notch signaling pathway (

Results of pathway prediction and network modeling showing interactions between genes containing SNPs associated with risk for breast cancer and genes involved in the Notch signaling and other biological pathways based on the Asian population. Nodes represent genes and vertices represent interactions. The color code denotes the biological process in which the genes are involved as defined in Figure 3. The color codes in the vertices indicate the experimental techniques or a combination thereof used to confirm the relationship between the genes as determined by the experiments reported in the literature. Candidate genes containing SNPs associated with risk for breast cancer are shown in red, genes involved in the Notch signaling pathway are shown in blue and novel genes are shown in black.
Like in the Caucasian population, genes containing SNPs associated with risk for breast cancer were also found to be associated with Notch signaling, P53, apoptosis and MAP kinase pathways. To assess the strength of association between Notch receptors and candidate genes in the Asian population, we estimated correlations. With the exception of
Both the validity and reproducibility of results from GWAS studies particularly the ones with small to moderate effect sizes (

Results of pathway prediction and network modeling showing interactions between genes containing SNPs associated with risk for breast cancer and genes involved in the Notch signaling and other biological pathways. The results are based on genes containing SNPs replicated in multiple independent GWAS. Nodes represent genes and vertices represent interactions. The color code denotes the biological process in which the genes are involved as defined in Figure 3. The color codes in the vertices indicate the experimental techniques or a combination thereof used to confirm the relationship between the genes as determined by the experiments reported in the literature. Candidate genes containing SNPs associated with risk for breast cancer are shown in red, genes involved in the Notch signaling pathway are shown in blue and novel genes are shown in black.
To further address the problem of reliability of GWAS data, we performed additional analyses combining genes containing SNPs replicated in multiple independent studies with genes involved in the Notch Signaling pathway. Genes containing SNPs reported in multiple independent studies included

Results of pathway prediction and network modeling showing interactions between genes containing SNPs associated with risk for breast cancer and genes involved in the Notch signaling and other biological pathways. The results are based on genes containing SNPs with the largest effect size (
To address the problem of publication bias and to determine whether members of the Notch signaling pathway interact with candidate genes with small to moderate effects, we performed further analyses combining the genes involved in the Notch signaling pathway and candidate genes containing SNPs with small effects. We found that genes containing SNPs with small effects interact with genes involved in the Notch signaling pathway (results not presented because we captured the same results in the four figures reported above). In additional, we identified novel genes not reported in GWAS studies, including
Overall, in all the analysis, we confirmed our hypothesis that genes containing SNPs associated with risk for breast cancer (regardless of effect size) interact with genes involved in the Notch signaling pathway as well as other biological pathways known to be relevant to breast cancer. Additionally, we identified novel genes not yet reported by GWAS. These results demonstrated that GWAS information can be leveraged with biological knowledge and gene expression data to infer the association between gene expression and breast cancer. The association of the Notch oncogenic pathway with genes containing SNPs associated with risk for breast cancer demonstrates that integrative analysis combing GWAS information, gene expression data and biological knowledge is a powerful approach to identifying molecular markers underlying GWAS findings.
Discussion
In this study, we provide evidence of association between breast cancer risk candidate genes and the Notch signaling pathway, an important oncogenic pathway involved in many aspects of tumor development, growth and progression, and a potential therapeutic target. Additionally, we identified other biological pathways including the ESR1 pathway, IGF pathway, the Map kinase pathway, the apoptosis pathway and the P53 pathway enriched by SNPs associated with risk for breast cancer. This suggests that regulation of the Notch pathway by candidate genes from GWAS is complicated and potentially involves multi-dimensional crosstalk between the Notch signaling pathway and other oncogenic pathways. Our results tend to agree with those in a recent association study, Fu et al,
15
which showed the association between
Several studies have now attempted pathway-based approaches to dissect the genetic susceptibility architecture of common diseases, for example, in inflammatory diseases,
29
in bipolar disorder,
30
in multiple sclerosis,
31
in breast cancer,32,33 prostate cancer,
34
and in seven common diseases.
35
To our knowledge, this is the first study to associate GWAS information with the Notch signaling pathway. This is an important finding because although GWAS as demonstrated in this and other studies2–6 can effectively map loci contributing to phenotypes of interest in breast cancer, they offer limited insights about the biological mechanisms by which SNPs confer risk. Of particular interest is the association of Notch signaling with multiple DNA repair genes including the
One possible explanation for these findings is that Notch signaling may be necessary for the survival of cells that are deficient in DNA repair. Also, DNA repair pathways are especially active in cells with stem-like phenotypes, potentially including tumor-initiating cells.44,45 Interestingly, an increase in Notch activity is part of the response to radiation in breast cancer-initiating cells, 46 endothelial cells 47 and glioma stem cells. 48
A number of genes involved in cell proliferation and survival are represented in our analysis. Among them,
The E3 ubiquitin ligase
Thus, an analysis of interactions of genes that contain breast cancer-related SNPs with Notch pathway genes reveals genes and gene products that have been suggested to cross-talk with Notch signaling in other models, including invertebrate models,
64
supporting the validity of our approach. Moreover, this analysis detected additional candidates for functional interactions with Notch, including numerous DNA repair and checkpoint genes, the androgen receptor (
Although the results of this study offer valuable clues about association of GWAS information with the Notch signaling pathway and other pathways relevant to breast cancer, limitations in interpreting these results must be acknowledged. We have used results of genome-wide association studies and publicly available gene expression data in this analysis. Therefore, interpretation of our results is inherently subject to the constraints of such data. Key limitation include but are not limited to the fact that GWAS information was derived from results obtained from different studies conducted using different platforms, sample sizes, cryptic population stratifications, different phenotypes, and potentially different analysis techniques all of which could potentially affect our results.
However, the results presented in this study demonstrate conclusively that genes containing SNPs associated with risk for breast cancer interact with genes involved in the Notch signaling pathway and other biological pathways relevant to breast cancer. Important work remains to be done to determine how the SNPs disrupt the genes and pathways, leading to cancer development and progression. Such work is beyond the scope of this paper, but the work reported here is the first step in that direction.
Disclosure
This manuscript has been read and approved by all authors. This paper is unique and is not under consideration by any other publication and has not been published elsewhere. The authors and peer reviewers of this paper report no conflicts of interest. The authors confirm that they have permission to reproduce any copyrighted material.
