Abstract
Introduction
Network biology has become a vital research concept in revealing the structural features of biological systems, 2 but the study and modeling of advanced biological processes necessitate the creation of highly integrated networks that can handle heterogeneous and complex data. 3 As a result, numerous biologists and bioinformaticians routinely study and elucidate biological networks using interactive graphs, enabling the mapping and classification of signaling pathways, as well as anticipating the functions of unidentified proteins. 4 Meanwhile, the ubiquitous nature of big data across many fields, including biology, as well as advancements in computation have led to the emergence of many complex networks, the primary goals of which include modeling and understanding real, complex systems.5,6 Special properties, such as being small-world 7 and scale-free, 8 are key indicators of complex networks, 9 where the former is the value derived when the average path length scales logarithmically and when the clustering coefficient is higher than the random network of the same size. 10 Conversely, the latter is a functional form that cannot be changed in a multiplicative factor while rescaling independent variables. 11
Visualization is an essential concept used to understand and analyze data. 12 Currently, several visualization methods and tools are available, having been introduced in the literature, but due to the magnitude and intricacy of biological datasets, it is difficult to obtain useful information from interaction networks.13,14 Furthermore, Cromar et al. 15 mentioned that due to this difficulty, knowledge of big molecular assemblies and physiologically active fragments has not been well-captured in the research. Therefore, the literature has introduced a wide range of methods and techniques that can be used to develop, represent and evaluate biological networks.16,17 Further, in response to Pavlopoulos et al.’s 12 assertion that advanced methods are required for the visualization of biological networks given their complexity, several network visualization tools have been introduced to assist researchers in studying complex biological networks, some of which include Cytoscape, 18 Gephi, 19 Medusa, 20 Ondex, 21 Osprey, 22 Pajek, 23 and Proviz. 24
Given the large number of network visualization tools made available to biologists, as well as the consequential challenge of reviewing and choosing the right one, a hypothesis was generated to determine which qualities are most critical to the efficient and effective visualization of complex networks. This list of quality factors can facilitate the analysis and comparison of complex network visualizations and enable investigators to grasp each tool’s critical components. Thus, this study focuses on identifying appropriate factors that can be utilized to assist designers and users of existing tools for the visualization of complex biological networks in improving and selecting the most suitable tools for different purposes.
Background
Biological networks
Complex network theory spans many disciplines, from computer science to the biological and molecular sciences, and within these disciplines, many biological networks exist, including protein–protein interaction (PPI) networks, 25 gene-regulatory networks (GRNs), 26 signal transduction or metabolic networks 27 and biomedical networks. 28 Concerning PPI networks, in computational biology and bioinformatics, such models as affinity purification, 29 pull-down assays, 30 yeast two-hybrid (Y2H), 31 mass spectrometry, microarrays 32 and phage display 33 are used to identify protein functions from their relationships and interactions with other biomolecules. 12
Further, concerning GRNs, control over gene expression in cells is assigned to the regulatory network. As such, the study of gene regulatory networks on a large scale is now feasible with the help of data collection, analysis and visualization tools. 34
Signal transduction networks, alternatively, use multi-edged directed graphs to visualize and represent interactions within various bioentities (proteins, chemicals or macromolecules).35,36 In addition, studying the transmission of the signal can be done either from the outside toward the inside of the cell or by investigating transmission within the cell. 12
Metabolic and biochemical networks are powerful tools for investigating and studying metabolism patterns in various organisms. Similar to bacteria in humans, modifications to the biomedical reaction network can be done via modern techniques for sequence operation. 37 Further, biological networks can be described using computer-readable formats, such as Systems Biology Markup Language, 38 Proteomics Standards Initiative Interaction (PSI-MI), 39 Chemical Markup Language, 40 Cell Markup Language and the Resource Description Framework. 12
Data visualization and visualization tools
The emergence of big data and its associated challenges has recently piqued the interest of researchers across many sectors, including healthcare, academics, information technology (IT) and government.41–43 Other manual operations are now being digitalized, 44 producing another set of data requiring suitable visualization tools. 45 Thus, the analysis, interpretation and presentation of the results also face serious challenges in a meaningful way. 46 One of big data’s greatest challenges is visualization, as the best tool for structured data will be incapacitated for unstructured data. 47 For instance, visualizing complex biological data requires advanced techniques to identify patterns in the data structure, which then aid in making decisions that fit the data content.
Data visualization can be defined as a method of unveiling data content by graphically presenting and conveying messages. According to Munzner, 48 data visualization is defined by how a designed dataset provides a visual representation of data to help people carry out their work more effectively. While visualization has been used for centuries to communicate data, the associated challenges and opportunities have greatly changed with the emergence of big data. Conventional data visualization methods are becoming inefficient and obsolete, considering the rate at which data are generated.43 Big data have five main characteristics, known as the “5Vs”: huge volume, high velocity, high variety, low veracity and high value. 46 The main problem relates not to the processing of huge amounts of data but to the diversity among the data. 46 For instance, biological data present numerous complexities, and only networking approaches can assist in visualizing such data. Examining the content of a genome goes beyond what can be visualized via bar charts and histograms: first, the data structure must be investigated, and suitable visualization tools with relevant features are developed. Hence, choosing the right tools and features is crucial to obtain accurate information from the data. 49
Major network analysis factors include structure and dynamics, which are the by-products of network science. 50 Concerning the network structure, the importance of a node is measured by nodal centrality, 51 and network communities can detect similar nodes. 52 Whether nodal centrality or network communities are used, both approaches endeavor to identify essential nodes in complex networks. 53
Further, the manifold’s width of data visualization has increased significantly due to the development of digital technology through internet advancements, due primarily to such visuals as graphs and graphic diagrams. 54 Thus, visualization has become a great tool for information analysis and sharing, 55 and it is essential to the scientific process because, no matter the research significance, if policymakers, other experts or the public cannot grasp the science presented to them, society will not profit from its results 56 ; as such, visualization uses images to represent data to give viewers a clear understanding, 57 and it enables the comprehension of highly complex biological data, 58 making it possible for administrators to understand much data quickly and easily at first glance on the state-of-the-art network. In addition, it offers decision-makers the power to visualize analytics to help them comprehend complex ideas and patterns. 59
Network visualization
Network visualization focuses primarily on interpreting, interacting, identifying and exploring the patterns within a dataset, 60 for which several tools have been developed. The investigated tools are classified into two major sections: 2D and 3D visualization tools. By investigating them, it is possible to understand their algorithms and the data structures they utilize, as well as to explore their application domains and learn about their capabilities and features. These features reviewed in this section to enhance our understanding of them.
2D network visualization tools
Among the 2D visualization tools analyzed are Cytoscape.js, Osprey, Medusa, ProViz, Pajek, ONDEX, Gephi and Tulip, and overviews and key feature analyses of the selected tools are provided in Tables 1 and 2 respectively.
Review of fundamental features of widely used 2D network visualization tools.
Review of functional features of widely used 2D network visualization tools.
Su et al. 61 explored the use of Cytoscape 3 for biological network data, the main advantage of which is that it offers users an interactive and versatile visualization interface with which they can easily navigate available features to explore network data. 61 Their work highlighted other features added to Cytoscape 3, which only advanced users can access. As rendering interactive graphs in a web browser is among its most frequent use cases as a visualization software component, it can be implemented in this capacity easily, and it can be utilized heedlessly, which is helpful for graph operations on a server, such as Node.js. 61
Oeltzschner et al. 62 analyzed Osprey as an open-source processing approach toward reconstructing and estimating magnetic resonance spectroscopy (MRS) data, and they used it to load a series of MRS data formats and carry out phased-array coil combinations, as well as to determine the frequencies and phase corrections of transients. An MRS voxel co-registers an anatomical image, so it was found that Osprey has the capacity to load, process, model and quantify MRS data successfully using different conventional and spectral editing methods.
Meersche et al. 63 explored Medusa’s ability to predict protein flexibility in sequences, having derived protein homologous sequences and amino acid physiochemical features from evolutionary trends to serve as inputs for a convolutional neural network. This was possible using the Medusa tool due to its flexibility, as its output was found to allow users to identify highly deformable protein regions and the general dynamics of protein properties, though it is important to note that the tool is currently inactive.
Jehl et al. 24 discussed ProViz, a web-based visualization tool, to investigate the functional and evolutionary features of protein sequences. With the goal of streamlining the study of proteins’ operational and developmental characteristics, ProViz, a potent browser-based tool, was created to assist biologists in developing concepts and designing experiments. Resources outlining the modular architecture of protein, sequence variations, post-translational modifications, structures and experimental characterizations of functional areas are used to derive feature information automatically. The data are presented via a user-friendly, interactive visualization medium, made available via a straightforward protein search tool, enabling people with modest bioinformatic expertise to obtain appropriate information quickly for their research. User-defined data can also be added to visualizations via manual customization or by using a representational state transfer (REST) application programing interface (API).
The Windows program Pajek evaluates and visualizes huge networks with dozens – sometimes even millions – of vertices, 64 and its primary objectives are to offer users a powerful visualization tool and to develop various effective algorithms for investigating huge networks. The tool was primarily built on the experiences gained while developing the libraries of the graph data structures and the algorithms for graph and X-graph, which are network analysis and visualization programs that identify transformations, numbering, partitions, maximum flow, random networks, hierarchical components, decompositions, citation weights, k-neighbors, the critical path method (CPM), paths between two vertices, vectors and counts in NET. 64
Concerning heterogeneous biological networks, Taubert et al. 21 visualized and explored Ondex Web, an updated version of the Ondex data integration platform that includes new network visualization and research characteristics. The appearance of heterogeneous biological networks may be explored and altered easily by users thanks to such novel capabilities as context-sensitive menus and annotation tools. Further, open source, Java-based Ondex Web is effortlessly embeddable as an applet into websites, and data can be uploaded onto Ondex Web in a variety of network formats, including Pajek, XGMML, OXL and NWB.
Jayamohan and Chatterjee 65 analyzed Multiviz, a Gephi plugin that uses a multi-layer network-scalable tool to visualize complex networks that are also multi-layered. They discovered the availability of different settings that can be used to transform extant multi-layered networks, which shows that the Gephi plugin can visualize multi-layered data in complex real-life situations.
The TULIP framework was created to foster extensibility and reusability 66 ; generally, it encourages the implementation of new technologies and scientific collaborations, and it gives users the option to build rapidly and browse via cluster trees or graph hierarchies (nested subgraphs). These methods have served as a key visual framework for the research team, as they frequently supply data analysts with the necessary answers.
Allegri et al. 67 designed and developed a new network-based visualization tool called CompositeView, an open-source application developed in Python. It mainly improves the visualization and extraction of complex interactive networks, increasing the chances of obtaining actionable insights. The authors found that although CompositeView was developed to visualize network data using ranking properties, it functions better on non-network datasets.
3D network visualization tools
Some of the 3D visualization tools analyzed include Arena3Dweb, CellNetVis, Graphia and OmicsNet, overviews and key feature analyses of which are provided in Tables 3 and 4 respectively. The first web program to enable the visualization of multi-layered graphs in 3D space was Arena3Dweb, which is entirely dynamic and independent. 68 Users of Arena3Dweb can combine numerous networks with their intra- and inter-layer connections into a single view, and a wide variety of inter- and intra-layer layouts and network indicators is available for node scalability, with easy use by beginners on a web browser. Moreover, it was created using R, Shiny and JavaScript, and it supports weighted and unweighted undirected graphs.
Review of fundamental features of widely used 3D network visualization tools.
Review of functional features of widely used 3D network visualization tools.
CellNetVis creates an adaptive network structure where nodes are organized into flexible cellular components using an iterative force-directed process, 1 where a correctly documented network in the XGMML format serves as the tool’s input. It provides some capabilities that are crucial to modern biological network analysis and that are not offered by other tools, including simultaneously being web-based, supporting enormous networks and automatically displaying nodes within their cellular components.
The open-source platform Graphia was developed for the graph-based analysis of the massive volumes of quantitative and qualitative data currently being produced from research on cells, genes, proteins and metabolites. 69 Computing the correlation matrices of any tabular matrix, whether of discrete or continuous values, is at the heart of Graphia’s capabilities, and the program is built to demonstrate swiftly the frequently enormous graphs that emerge in 2D or 3D space.
Another web-based application, called OmicsNet, was designed to simplify for users the creation, visualization and analysis of multi-omics networks for the exploration of intricate correlations between lists of relevant ‘omics traits. 70 Some highlights include a new 3D module called layout and improved network visual analytics, with 11 2D graph layout possibilities. It includes steps to enhance study reproducibility by introducing the companion OmicsNetR package, linking R command history and creating ongoing links for the exchange of interactive network views.
Comparison of 2D and 3D network visualization tools
From the in-depth analyses performed in Tables 2 and 4, it is clear that 3D visualization tools provide an enhanced user experience, offering interactive features and making it easy to explore a 3D space. Conversely, 2D network visualization tools provide several layout algorithms that can be used to provide a wide range of visualizations. Therefore, depending on the requirements, the analyzer must select the proper tool to maximize their output.
Factors for evaluating visualization tools
The factors that can be adopted in the evaluation and selection of visualization tools for different purposes are classified into generic and heuristic. This section will describe these factors, including how they were derived and their use among the reviewed network visualization tools.
Generic factors
There are 15 generic factors derived from the literature (see Table 5), of which “Factors in evaluating visualization tools” was used as a keyword to search for scientific publications in such online databases as Google Scholar, Research Gate, JSTOR, IEEE and Science Direct, among others. A number of publications were discovered and sorted using the keyword “generic factors,” and different factors in each publication were identified and ranked based on the number of publications in which they appeared. Each of the 15 factors appeared in more than three journals; hence, they were included in the study.
Generic factors.
A brief discussion of the generic factors is included below.
User input is critical for visualization tools because it is essential to the identification and comprehension of the functional and technical requirements a product must meet. This information also guides less obvious but often equally significant qualities, such as fulfilment, acceptance or esthetics. Customization allows users to choose what they want to see or to set preferences for the information arrangement or presentation process. Because it gives consumers power over their interactions, it can improve the user experience. Suderman and Hallett 95 stated that while most tools support improved functionality in graphic user interfaces (UIs), most tools’ functionality is often insufficient for specified tasks.
Heuristic factors
A brief discussion of heuristics factors 10 is provided here (see Table 6), where “Factors in evaluating visualization tools” was the keyword used to search for scientific publications in such online databases as Google Scholar, Research Gate, JSTOR, IEEE and Science Direct. Several journals were identified, and they were narrowed down using “heuristics factors” as keywords. Different factors in each journal were identified and ranked based on the number of journals in which they appeared. Each of the 10 factors appeared in more than two journals, so they were included in the study.
Heuristic factors.
Methodology
This study gathered information through a mixed methodology, using a combination of quantitative and qualitative research provided in Figure 1. to study the subject matter, 108 allowing the researcher to avoid the constraints associated with employing a single approach and enhancing the knowledge gained in relation to the stated research problems. 109 Further, it can leverage the benefits and limitations of both strategies and is especially helpful when dealing with complicated, multidimensional challenges. 110

Method adopted.
Data collection
Qualitative research involves gathering descriptive opinions and experiences using various methods, including interviews, observations, focus groups and case studies, 111 the former of which were utilized in this study. Conversely, quantitative research produces numerical data or data that can be converted into useful statistics to measure a specific concept using questionnaires, surveys, etc., 112 the latter of which was adopted in this study. The interviews consisted of open-ended questions, whereas Likert-scale questions were utilized in the survey, 5 created and distributed via the Newcastle University Online Surveys tool to gather responses. The survey and interviews were carried out independently, and the interview participants were not required to complete the survey before participation. The survey captured the participants’ demographic information and asked them to rate five general statements related to each factor identified in the literature. Alternatively, the interview questions were more dynamic and related to the factors, including participants’ opinions about them, particularly their importance when adopting a specific network visualization tool.
Data analysis
The qualitative data were analyzed using thematic analysis, an approach that focuses on the discovery, description, rationalization of, as well as interconnections between themes. 113
Steps utilized for the thematic analysis were derived from Dawadi, as follows 114 :
Participants’ demographic information
Interview responses were gathered from five participants and survey responses from 98; even though the number of participants gathered for the study is relatively low, the representation of participants ensures equality in general concerning age, job position, domain, experience with visualization tools, etc.
Age
It is important to note that there is no participant representation from the 18 to 24-year age group. However, to overcome this drawback, 34% of survey participants were aged 18–24 years. The statistics in Table 7 show that most survey and interview participants were within the age range of 25–34 years (at 37% and 40%, respectively), while participants aged 45–54 years comprised the fewest survey participants (7%) and the majority of interview participants.
Age statistics.
Job position
The various job positions held by the participants are outlined in Table 8. The specialist category includes computer specialists, customer service agents, data analysts, economists, writers and engineers, constituting 7% of participants, according to the table, while 1% were biomedical specialists from Newcastle University. Meanwhile, lecturers comprised most of the survey participants, while postdocs comprised most of the interview participants.
Job position statistics.
Experience
For the survey, participant selection focused on individuals with varying years of experience, but for the interviews, selection focused on individuals with relatively more experience. Table 9 shows the experience statistics of the participants, where those with 1–3 years’ experience comprised most survey participants (51%) and those with 6–10 years comprised the majority of interview participants.
Experience statistics.
Knowledge and experience with visualization tools
There are additional visualization tools, apart from those specified in Table 10: MATLAB, Spike, QuPath, PowerBI, Tableau, NetworkX, etc. However, from the survey, Pajek tops the list, followed by Cytoscape, even though during the interviews, Cytoscape was a widely discussed visualization tool compared to Pajek. The table below shows the top–bottom order in which the respondents rated the visualization tools.
Visualization tools.
Findings
The factors identified from the survey and interviews with the participants were divided into two major categories: generic and heuristic.
Generic factors
The survey respondents’ ratings of the importance of the generic factors are provided in Figure 2. The overall score indicates the level of importance of the factors for the network visualization tools in general according to all participants. As such, Figure 2 shows that the respondents consider filtering tools to be the most important factor, at 73%, followed by user input and customization, at 58%. Graph analysis and the benefit of the tool to its users follow closely at 57%, and user-friendliness comes next at 56%. Meanwhile, the percentage of responses for scalability was 55%, while an efficient layout, plugin availability and runtime performance ranked 54%, followed closely by visual style, text mining and user feedback, at 53%. The participants’ response rate for different file formats was 51%, but an advanced search showed 48% and open source (free) had the lowest percentage (44%). Thus, of the 15 generic factors, the majority (12) are considered of moderate importance to the survey participants, as the score ranges between 50% and 60%. Furthermore, the factors “Advanced Search” and “Open Source (free)” were considered less important (below average) compared to other factors. Nevertheless, the factor “filter tools” was considered key (73%) in assessing and selecting a visualization tool for complex biological networks.

Importance of the generic factors.
From Table 11, it is clear that the mean values of the generic factors range between 3.41 and 3.88, indicating that the ratings received from participants are mostly moderately positive. Moreover, the range of the standard deviation is from 0.52 to 0.65, which specifies that the variation between ratings from responses was relatively low. Therefore, it is possible to assume that the opinions of the participants were almost similar.
Descriptive analysis of the generic factors.
The interview responses substantiated the survey responses, indicating that open-source or free licensing is not compulsory or preferred if the visualization tools are suitable for analyzing a complex biological network. It is also specified that the analysts are ready to pay any price for the license for a visualization tool to analyze complex biological networks. Moreover, the interview participants stated that the basic search feature is sufficient and that there is no need to perform an advanced search. Filter tools are also considered essential for visualizing complex biological networks, as they assist the analyst in extracting the relevant segment of the large chunk of data.
Heuristic factors
The importance of the heuristic factors identified by the respondents is outlined in Figure 3.

Importance of the heuristics factors.
It is essential to realize that the survey responses from the participants indicate that all 10 heuristics factors are of moderate importance, as the scores range between 50% and 60%. The interview specified that all heuristic factors should be considered when developing a good graph visualization tool for a complex biological network. Accordingly, Figure 3 shows that the heuristic features with the highest percentage (59%) include information coding, flexibility and consistency, closely followed by prompting and recognition rather than recall, at 57%. Orientation and help follow, at 54%, as well as minimal actions. Spatial organization and dataset reduction comprised 54% of responses, while the heuristic factor with the lowest percentage concerns removing extraneous, at 53%. Consequently, from the interview responses, it is safe to assume that even though all heuristic factors are of moderate importance, information coding, flexibility and consistency are considered the most crucial.
From Table 12, it is clear that the mean values of the heuristics factors range between 3.49 and 3.61, indicating that the ratings received from the participants are mostly moderately positive. Moreover, the range of the standard deviation is from 0.55 to 0.68, which specifies that the variation among the ratings from the responses was relatively low but slightly high compared to the generic factors. Therefore, it is possible to assume the opinions of the participants were almost similar.
Descriptive analysis of the heuristics factors.
Remarkably, the survey and interview participants were conversant in the graph visualization tools and the generic and heuristic factors. For example, the interview participants mentioned that Cytoscape is one of the most widely used tools, and several issues related to Cytoscape factors were discussed, among which was the tool’s filter feature, which the survey responses also confirmed was the most important among the generic features of the visualization tools. Moreover, all participants confirmed that it is difficult to identify a solution that adequately handles large graphs.
Conclusion
This research studied essential factors for evaluating and selecting a visualization tool for complex biological networks. It employed a mixed research approach to gather responses from participants having a wide range of backgrounds and experience using graph-based visualization tools. In total, 98 participants responded to the survey questions, and five were interviewed to obtain detailed responses.
Importantly, the responses received from the survey and interviews corresponded with each other. From the interviews, it is clear that the users prefer 3D to 2D visualization, as well as that some network visualization tools, such as Cytoscape.js, Gephi and others, which are primarily known for 2D visualization support, offer 3D visualization via plugins. In addition, the interviews provided detailed opinions about and justifications for the factors. This study divided the 25 factors identified into two major categories: generic and heuristic, where the former total 15: efficient layout, advanced search, plugin availability, graph analysis, user friendliness, runtime performance, visual style, text mining, different file format, filtering tools, benefits of the tool to its users, user feedback, user input and customization, scalability and open source (free), and the latter 10: information coding, flexibility, orientation and help, minimal actions, prompting, consistency, spatial organization, recognition rather than recall, removing the extraneous and dataset reduction. The findings indicate that all generic factors except advanced search, open source (free) and filtering tools are moderately important. Furthermore, the advanced search and open source (free) factors are less important compared to others, whereas filtering tools are key considerations as network visualization tools.
The findings indicate that all heuristic factors were essential, and the interview respondents added that they should be considered when developing a visualization tool for a complex biological network to increase the tool’s user-friendliness. Future studies should assess the different factors of visualization tools to rate their applicability to complex biological networks.
Supplemental Material
sj-pdf-1-ivi-10.1177_14738716231181545 – Supplemental material for An investigation into various visualization tools for complex biological networks
Supplemental material, sj-pdf-1-ivi-10.1177_14738716231181545 for An investigation into various visualization tools for complex biological networks by Hanin Alzahrani and Sara Fernstad in Information Visualization
Footnotes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
