Abstract
Background
Along with overall improvement in the quality of medical services and technologies, the prevention and treatment of cancer have been greatly improved over the past decades; however, death from cancer is still common and ever increasingly. In 2012, there were 8.2 and 2.2 million estimated cancer deaths worldwide and in China, respectively.1,2 Besides the high mortality rates, the pathophysiology of cancer is not completely understood. The heterogeneous properties of cancer still pose significant challenges for preventing, treating, and gaining a deep understanding of the pathological mechanisms of cancer; thus, an expedited discovery of effective biomarkers is of prime importance. During the past decade, extensive researches have been performed to identify molecular biomarkers for presymptomatic diagnosis, stratification of cancer subtype, assessment of cancer progression, prediction of patient response to therapy, and detection of recurrences.3–5 However, effective biomarkers of the oncogenic process remain poorly predictive of outcome and, therefore, are too unreliable for clinical application.
Network biology has been widely used to represent, compute, and model intracellular interactions to gain insights into cellular mechanisms.
6
The recent progress of network biology has provided new methods for cancer-related biomarker discovery.
7
Network-based holistic analysis integrates multidimensional high throughput
Here, we give a comprehensive overview of the biological networks that have been used to identify the biomarkers from genomic, transcriptomic, and proteomic levels (Fig. 1). We also summarize the network-based tools for biomarker discovery (Table 1). The intention of this review is to provide an understanding of the potential benefits of using network analysis of complex systems for the biomarker discovery.

Biological networks used to biomarker discovery.
Network-based analysis tools for biomarker discovery.
Genomic Level
A variety of genomic alterations, such as point mutations, copy number variations, and gene rearrangements, contribute to tumor formation and development. While genome-scale loss of function assays, such as Achilles Project that contains genome-scale loss of function screens in hundreds of cancer cell lines, 10 also provide important resources for biomarker discovery. Several studies have used the direct genomic alteration data from primary tumors in biomarker research. Using only cancer gene mutation data, Cui 11 built a gene network from the co-occurring and antico-occurring relationships between gene mutations (CCA network). The resulting CCA network had two complementary modules with distinct functions and roles in tumorigenesis. Genomic alteration information also plays a complementary role in the construction of molecular networks. Endogenous perturbation analysis of cancer is a causal network model that explains the transcriptional consequences of DNA copy number alterations to detect survival markers for glioblastoma.12,13 Shi et al. 14 also developed a network model that combined copy number alteration and mRNA expression data using a sparse double Laplacian shrinkage (SDLS) method. The advantage of the SDLS is that it effectively accommodates correlations on both sides of the gene expression and copy number alteration regression. Based on gene mutation information, biomolecular interaction networks, and patient clinical information, Leung et al. 15 developed a Cytoscape plug-in called HyperModules to clinically and phenotypically identify significant mutated gene modules as potential multivariate biomarkers for cancer. Hyper Modules can analyze diverse biomolecular interaction networks including gene regulatory networks (GRNs), PPI networks, and curated biological pathways. By integrating genome-wide association datasets and PPI data, Chimusa et al. 16 presented an algebraic graph-based method (ancGWAS) to identify significant disease-specific sub-networks. ancGWAS can handle not only the linkage disequilibrium data as PPI weights but also other user-defined weights.
Transcriptomic Level
At the transcriptomic level, GCNs and transcription regulatory networks are most widely used for biomarker discovery. In GCN, two genes (nodes) are connected if there is a significant correlation (eg,
The aim of GRNs is to mathematically capture the dependencies between transcriptional regulatory genes and their downstream targets. 25 Zhang et al.26,27 proposed a local dependency method called differential dependency network (DDN) to find statistically significant topological changes in GRNs under two biological conditions, which provides an alternative mean for biomarker prediction. This method can capture topological changes even when the fold change in gene expression is not significantly different and can be executed both by the Matlab package and Cytocs-ape plug-in (CytoDDN). However, DDN only take account of linear relationships. Based on biological knowledge and conditional independency, GRNs can be reconstructed from microarray data and have been employed to predict prostate cancer-related genes and sub-networks. 28 Using the GRN inferred by an algorithm for the reconstruction of accurate cellular networks, Remo et al. 29 predicted NFAT5 as a novel marker of inflammatory breast cancer. Seifert et al. 30 inferred signature-specific GRNs to distinguish different subtypes of astrocytoma, and Akutekwe and Seker 31 developed a GRN construction method based on support vector regression and dynamic Bayesian network, which have been applied to time-course data in ovarian carcinoma.
Due to the emerging role of noncoding RNAs, the networks containing noncoding RNAs have also made contributions to biomarker discovery. MicroRNA (miRNA) biomarkers offer a powerful alternative to protein-coding gene signatures and have the flexibility of gene expression signature classifiers. 32 microRNA regulatory networks (miRNA–mRNA networks) represent the regulation patterns between microRNAs and genes. By using these networks, Zhang et al. 33 identified that hsa-let-7i and its target genes play crucial roles in colorectal cancer meta stasis, Canturk et al. 34 found the net work hub microRNAs and genes might be candidate markers for bladder cancer, Zafari et al. 35 determined the disease-related and housekeeping microRNAs, and Sehgal et al. 36 identified microRNAs that regulate functional pathways in multiple cancers. The combinatorial regulatory network that comprises the interactions between microRNAs and genes, transcription factors (TFs) and genes, and TFs and microRNA was constructed to identify key regulators that contribute to hepatocellular carcinoma metastasis. 37 Lee et al. 38 developed software called ActMiR to infer miRNA-mediated regulatory networks and the activity of microRNAs that could be as potential prognostic biomarkers of cancers based on expression data of microRNAs and their predicted target genes by regression models. This has proven to be a relative robust approach for modeling microRNA activity and was applied to multiple breast cancer data sets. More recently, the pipeline called pipeline of outlier microRNA analysis was constructed to identify candidate microRNAs by exploring the sub-structure of the microRNA regulatory network, which is constructed by integrating the miRNA–mRNA interaction database with microRNA and mRNA expression data. Pipeline of outlier microRNA analysis has already been applied to find candidate microR NA biomarkers in prostate cancer, clear cell renal cell carcinoma (ccRCC), and pediatric acute myeloid leukemia.39–41 Currently, most of the microRNA regulatory networks only take account of the relationships between microRNAs and mRNAs and lack the cooperative or synergetic effects between miRNAs.
Long noncoding RNAs (lncRNAs) are nonprotein-coding transcripts longer than 200 nucleotides that can function as scaffolds for chromatin modification and transcriptional and posttranscriptional regulations42,43 and exhibit aberrant expression in various human cancers. 44 Yang et al. 45 built a lncRNA-coding GCN and indicated that lncRNAs and mRNAs may act as biomarkers for nasopharyngeal carcinoma. A coexpression network among differentially expressed lncRNA and mRNAs was also constructed for breast cancer patients and control group to identify the core genes of network as biomarkers for HER-2-enriched subtype breast cancer. 46
There are more methods for constructing network-based models in cancer biomarker discovery by measuring the relationships between transcripts, such as feature selection-based genetic networks for lung cancer 47 and Boolean implication networks for both normal and malignant tissues.48,49
Proteomic Level
Proteomics has been increasingly applied to cancer research, especially for biomarker discovery, 50 and quite, a few tools for PPI networks analysis 51 and PPI-based methods for cancer biomarker identification based on proteomic data have emerged. STRING is a well-known database to predict PPIs 52 and has been used to find proteins and modules that related to the diethyl nitrosamine-induced progression of immune suppression and apoptosis resistance in hepatocellular carcinoma. 53 STRING has additional applications related to biomarker discovery, eg, revealing the transition from early stage to late stage in colorectal cancer, 54 detecting key genes related to inflammatory responses in bladder cancer, 55 finding platelet-derived growth factor receptor beta as a biomarker from urinary for recurrence in bladder cancer, 56 and identifying noninvasive blood-based diagnosis markers for pancreatic cancer. 57 Ding et al. 58 developed a web-based PPI network tool named atBioNet, which was created by integrating seven public PPI databases for biomarker discovery using a fast network-clustering algorithm called structural clustering algorithm for networks, to find disease-related functional modules in the PPI network. Functional modules are constructed at the time of the query in atBioNet, so it allows novel modules to be generated based on the input proteins/genes. atBioNet also has powerful network analysis and visualization tools, but the biological annotation is relatively weak, only relying on KEGG. Pradhan et al. 59 generated a TF interaction network by text mining to find key TFs in colorectal cancer. Based on the theory of coevolution, Zhang et al. 60 constructed PPI networks and identified seven key proteins in non-small cell lung cancer. Using phosphoproteomic data of breast cancer cells treated with transforming growth factor-β (TGF-β), Ahn et al. 61 constructed a TGF-β-affected PPI network and detected sub-network markers for TGF-β treatment. Shen et al. 62 identified that the caveolin-1 is a candidate biomarker for gastric cancer through the PPI network built by the Human Interaction Network for differentially expressed proteins between gastric cancer-associated fibroblasts and their corresponding inflammation-associated fibroblasts. Oh and Deasy 63 constructed PPI networks for proteins selected by literature-based methods to identify chemoresistance-related genes and pathways in cancer.
New methods that combine PPI networks and other omics data such as gene expression data are currently being used to identify biomarkers. Methods combining PPIs and gene expression profiles were applied to identify proteins associated with liver, lung, and brain metastases in breast cancer,64,65 novel interactions in pancreatic cancer, 66 sub-network markers for breast cancer metastasis, 67 protein biomarkers for lung cancer diagnosis 68 and hepatocellular carcinoma diagnosis, 69 proteins related with clinical outcome in patients with metastatic melanoma, 70 biomarkers of early onset colorectal cancer, 71 module markers for gastric cancer, 72 and core and specific network markers of carcinogenesis in bladder, colorectal, liver, and lung cancers. 73
Similarly, by taking significantly changed proteins between normal and late-staged colon cancer from two gel-based proteomics experiment as
Recently, Zhang et al. 76 built dynamic PPI networks to identify informative proteins based on PPI networks and gene expression data. He et al. 77 used the PPI network that was formed by four public PPI databases as a background network and mapped the differentially expressed genes to the network to suggest three hub genes as potential markers for oral squamous cell carcinoma.
Like GCNs, protein coexpression networks can also be constructed based on the correlation between protein expression levels. Based on this kind of network, Yang et al. 78 predicted specific profiles of inflammatory mediators for non-small cell lung cancer. Analysis of the coexpressed proteins' network between cytokines revealed biomarkers for HIV/HPV-associated anal cancer. 79
Other Network-Based Methods
Recently, there has been a tendency to construct networks by integrating multiple omics level data for biomarker discovery. For example, a biomolecular network was constructed by combining GCN and STRING PPI database and was used to identify T-cell homing factors as the genes whose expression was significantly associated with disease-free survival in colorectal cancer. 80 Butz et al built a gene interaction network by integrating mRNA, microRNA, and protein expression profiles of ccRCC and normal samples from 28 publications and identified three genes as potential biomarkers for ccRCC. 81 Sehgal et al detected the key network modules from pathways that were enriched by differentially expressed genes for colorectal cancer. 82 An SDLS method was used to construct a network that combined gene expression and copy number alteration data to describe the interconnections between genes. 14 Xu et al. 83 constructed a combinatorial network by integrating PPI networks, microRNA regulatory networks, and GRNs and took the network hubs as candidate biomarkers for hepatocellular carcinoma. High-throughput screening for drug sensitivity patterns are also frequently integrated into biomarker discovery, such as the drug–drug network that was employed in a systemic identification of genomic markers of drug sensitivity.84,85 Moreover, two commercially available web-based applications, MetaCore (portal.genego.com) and Ingenuity Pathway Analysis (IPA, www.ingenuity.com), were used to construct molecular networks that made contributions to biomarker discovery. Both software construct networks based on literature annotations and provide features for identifying the promising and relevant biomarker candidates within experimental data. MetaCore has helped identify radiation-specific biomarkers by constructing gene networks for the top 500 genes that were predicted by a linear regression model, 86 finding novel proteins and functional sub-networks with an altered expression in prostate cancer, 87 and building systems biology-based classifiers for hepatocellular carcinoma. 88 Meanwhile, IPA has been used to generate connections between proteins identified by a fluorescence two-dimensional difference gel electrophoresis approach combined with matrixassisted laser desorption/ionization time-of-fight tandem mass spectrometry (MALDI–ToF-MS/MS), 89 contribute to finding differentiation-related biomarkers for head and neck cancer, 90 predict plasma protein biomarkers for cervical cancer, 45 and investigate tumor-specific changes of plasma proteins in hereditary breast cancer. 91
Conclusions
The past decade has seen rapid developments in network models for cancer biomarker discovery. Different network-based methods have provided a new paradigm and hold great promise for the future study of cancer. However, network-based methods also have their limitations and disadvantages, which fall mainly into two categories. One is that most of the current methods lack effective validation, especially in large and multiple datasets, which is also the key problem for identifying efficient and clinical useful biomarkers. Another is that the power of integration for multiple level data is still relatively weak, and most of the methods can only integrate two or three different level data. Therefore, we still have several future challenges in the field of biomarker discovery based on the network approach. First, due to the heterogeneous properties of cancer, there are differential responses from individual biomarkers, making the identification of clinically useful and precise biomarkers for cancer diagnosis and predicting clinical outcomes quite difficult. Second, integration of multiscale omics data, cell level data, tissue level data, phenotype level data, and clinical data remains a major challenge in network medicine. The continued refinement of the algorithms and tools based on networks is critically needed and will have a significant impact on the development of personalized biomarkers. Third, the development of robust and standardized methods for the assessment of molecular biomarkers, especially the sub-network biomarkers, will be essential in the future. It is hoped that network-based approaches will guide treatment decisions and accelerate the development of personalized medicine for therapeutic regimens for cancer patients.
Author Contributions
Conceived and designed the experiments: WYY. Wrote the first draft of the article: WYY, WJX. Contributed to the writing of the article: WJX, GH. Agreed the article results and conclusions: WYY, WJX, JJC, GH. Jointly developed the structure and arguments of the article: WYY, WJX. Made the critical revisions and approved the final version: WYY, WJX, JJC, GH. All authors reviewed and approved the final article.
