Sage Journals: Discover world-class research

Abstract

Neural network models are widely used, and visualization helps to understand the black-box behavior of these models. Current visualization methods mainly focus on neural networks trained on data with an intrinsic representation (image, text, speech) and depend on human interpretation of the data. However, generic multivariate data is the most commonly used form of data, and neural network visualization options are limited. Furthermore, current methods mainly focus on showing the final learned weights and filters. In contrast, we propose an instance-based approach and show the flow of instances through the neural network to explain model behavior. The visualization method is centered around selecting instances of interest and showing the propagation of weight and activation contribution to the final classifications. This enables users to explore and understand both global and local model behavior by inspecting varying groups of instances. Combined automated and interaction techniques enable tracing importance-scored paths to explore and understand feature importance. The effectiveness of the visualization method is shown through examples and use cases on real-world classification datasets and compared with insights from computational explainability methods. Additionally, a qualitative user study confirms the effectiveness and value in analyzing neural networks using our instance-based visualization approach.

Keywords

neural network analysis model behavior

Introduction

Machine Learning (ML) and, specifically, Artificial Neural Network (NN) models have become more important and are widely used in various contexts for example, image classification,¹ speech processing,² and, natural language processing.³ Typically, the NN models outperform traditional ml methods in terms of classification accuracy. A NN learns a non-linear relation between the input features and the output, mainly used for classification. The training process is performed by tuning the weights of a layered network using back-propagation.⁴ However, the models are generally used as black boxes; only the input and output are observed. This is acceptable if the model’s outcome is deemed less important or used to support human decisions and not as the final decision maker. However, in general, end users and even model developers do not fully understand the internal decision mechanisms of the trained neural network. Understanding the internal mechanisms helps users better comprehend the limitations and the strengths, enabling them to use these to their advantage.⁵ For example, users need to understand why a certain class is predicted, whether this prediction can be trusted,⁶ and to what extent. Several visualization and computational methods have been developed that provide insight into the internal mechanisms of NN models for data with an intrinsic (visual) representation such as images, text, and speech.^7–9

For interpretable data, the trend is to scale up network architectures and introduce specialized layers and neurons. However, for tabular data, it recently was shown that small-scale, well-regularized NN models, with as little as 3–9 layers, significantly outperform specialized NN architectures.^10,11 Still, understanding what internal decisions the NN makes is challenging as current methods rely on the human interpretation of the generated, annotated, or transformed image and text data, typically focusing on the NN filters. However, the NN filters for generic multivariate data are difficult to interpret. Therefore, we propose an instance-based approach that focuses on the propagation of items through the network and shows their contribution to the predictions. Our main contributions are:

a method to compute the classification contribution of an instance propagated through the network in relation to all selected instances.

a network visualization to show the instance contributions with color-banded Sankey edges, enabling users to analyze (small-scale) neural networks, not limited to data with an intrinsic representation, supported by interaction mechanisms to select instance groups of interest on the network, and,

direct manipulation to enable tracing paths using an importance-scoring measure to explore feature importance.

The paper is organized as follows. First, user tasks and requirements are discussed. Next, related work is discussed in Section 3. Our solution is presented in Section 4 applied to example use cases and evaluated through a qualitative user study in Section 5. Limitations of the method are discussed in Section 6. And finally, in Section 7, we present conclusions and directions for future work.

Background and user tasks

As we focus on a post-hoc analysis^12,13 (i.e. analysis after training), we first discuss the elements involved with a trained NN model. Next, we identify user tasks supporting the exploration and analysis of the models. Finally, we link the elements of trained models to the defined tasks and provide visualizations and interaction techniques in Section 4.

Neural networks

A neural network is a ml model that is used to make predictions from data. The model consists of layers of neurons (nodes) connected by weighted edges (links). Important elements of a NN are the input, hidden, and output layers (see Figure 1). The input layer receives the (pre-processed and normalized) data, typically one node for each data feature $X_{1} \dots X_{n}$ , and passes it through the network. The hidden layers use computational functions (activations) to process the data and extract relevant features. The number of hidden layers and amount of neurons in each layer can be adjusted to optimize the network’s performance. Based on the processed data, the output layer generates the predictions $C_{1} \dots C_{m}$ . The output can be a single neuron for regression problems or multiple neurons for classification problems. In this paper, we focus on classification problems and elaborate in section 6 on how to extend to support regression. During training, the model output of each instance is compared with the actual data label. The computed error is then used to update the edge weights. Overall, NN learns complex (hierarchical) patterns from the data and performs better than traditional ml models.

Figure 1.

Elements that define a neural network.

Neurons & activations

Neurons and activations in each layer are the building blocks of a NN. Each neuron receives input from neurons in the previous layer, performs an operation (activation function) on that input, and then sends the output (activation) to neurons in the next layer:

$a_{ij} = σ (\sum_{k}^{n} a_{i - 1 k} w_{i - 1 k} + b_{i - 1 k}),$ (1)

where a_ij is the activation of neuron j in layer i, b_ik is the bias of neuron k in layer i, w_ik is the weight of neuron a_ik, and σ the activation function. Common activation functions include sigmoid,¹⁴ rectified linear unit (ReLU),¹⁵ and hyperbolic tangent (tanh).¹⁶ Each activation function has its own properties that make it suitable for different types of problems and data. The input to neurons in the first layer is usually a set of values representing the features of a single data point (one row of the data table).

Edge weights

Edge weights, the input parameters of a neuron, are used to adjust the importance of each input. Each neuron input (output of the previous layer) is multiplied by a weight, which determines the relative importance of that input. In other words, the weights determine how much influence each input has on the neuron output. This property inspired us to develop an instance-based visualization, rather then showing the raw final learned weights. Weights are updated during the training phase of the network by minimizing the error between the predicted and actual output using back-propagation.⁴ The combination of the input, weights, and the activation function of the neuron is used to compute the output of the neuron that is passed as input to the next layer.

User tasks

At the highest level, the goal of our visualization is to provide users (model developers) insight in model behavior. To be more precise, our aim is to show how the model responds to (groups of) instances. Some example questions we would like our visualization and analysis method to answer are, are there neurons that are not activated and thus can be removed from the network? Or, are there neurons that show similar behavior, or classes that the model has trouble classifying, or that are easy to classify? Model understanding can be achieved through global (top-down) or local (bottom-up) exploration. With a global analysis method, we want to understand the model behavior as a whole. As related work on eXplainable Artificial Intelligence (XAI) techniques is plenty, we scanned literature for the most common tasks on model understanding and categorized these in global understanding tasks and local understanding tasks. Note that these tasks are not exhaustive and others might exist, however, we believe these tasks to be representative for model understanding. Some tasks involved with global understanding summarized and collected from literature,^9,17–23 are:

G1. understand how well the model classifies each class (to understand whether there is bias for a class and thus how well it can be trusted for a given prediction).

G2. understand which neurons are responsible for classifying a class (to understand the prediction certainty).

G3. understand which network parts are underutilized or are not activated (to identify optimizations in network architecture).

G4. identify parts of the network showing similar behavior (to understand network redundancy).

For a bottom-up or local exploration, users focus on a single instance prediction (or small group of instance predictions) and need to understand the model classification. Typical tasks, identified in earlier work,^{9,17,18,22–25} involve:

L1. understanding feature importances for a classification (to explain predictions in terms of influential features).

L2. understanding classification certainty and the model decisions involved (to judge trustworthiness).

L3. understanding instance misclassifications.

This paper aims to support a hybrid approach, enabling a mixed local and global understanding of the NN model which is enabled through observing behavior for various instances. The analysis starts from a selection of instances, a single selection for a focused local exploration, or all instances for a global understanding. The instances are mapped onto the network architecture and visualized through Sankey edges (see Section 4.1) to reveal model behavior. Additionally we show node activations for the selected instances (see Section 4.2). The model can then be explored and analyzed with direct manipulation and trace-back and forward interaction mechanisms (see Section 4.3). The instance-based hybrid exploration process supported by our solution is shown schematically in Figure 2.

Figure 2.

Our hybrid exploration process with a selection of instances as the central element. The instances of interest are propagated through the neural network and visualized using color-banded Sankey edges showing how neurons transform the high-dimensional space. Local and global exploration is supported by direct manipulation, clustering, and importance-scored paths.

Related work

Key to our method to support both global and local model understanding is the mapping and visualization of instances onto the neural network architecture in contrast to visualizing neuron activations and filters. We review the literature on these three categories and identify whether they support NN understanding globally, locally, or both.

Visualizing neural network architecture

For understanding the network, a direct visualization of the architecture is typically used. The network is often presented using a node-link diagram for global,^7,26–30 local^31–34 and combined⁹ understanding. Also, adapted Sankey diagrams are used for global,³⁰ local,^8,35–38 and combined^17,39,40 understanding. The advantage of direct visualization of the architecture is the familiarity of the neural network structure to developers. Typically, the learned weights are visualized using the edges, either by width^28,29 or color.^17,27 Next to showing the weights, the signs of the weight (positive or negative) are also important for the interpretation of the network. If width is used to show the weight of the edge, then frequently, color is used to convey whether the weight is positive or negative^17,30,41 (see Figure 3(a)). For example, De Vries et al.⁴¹ and Ming et al.³⁰ (focusing on text data) use red and blue for positive and negative weights, respectively. Liu et al.¹⁷ also use colored edges to denote positive and negative weights, and in addition, they aggregate edges between layers and show an aggregation rectangle to indicate the proportion of negative and positive edges in the formed bi-cluster. Visually closest to our work is Tensorflow Playground,²⁹ that provides non-expert users with a direct node-link visualization of a NN model for (simplified) tabular data. The edges’ width is used to convey the learned weights and color to distinguish positive and negative weights (see Figure 3(a) and (c)). However, we differ in the information shown – the instance contributions, rather then the learned weights - and scale beyond the limited two feature inputs. Visualizing the final learned weights only shows stronger and less strong connections in the network (see Figure 3(a)), but does not reveal how the model behaves under different instance inputs (see Figure 3(e)). Therefore, in contrast to existing works, we propagate the instances through the network and show the contribution of each instance to all connections (see Figure 3(d)–(f)). This reveals how the model behaves overall, but also shows differences between and within instances of the same class. A slight variation on the direct visualization of the architecture using node-link diagrams is introduced by Wongsuphasawat et al.⁷ where, instead of the weights, the (TensorFlow) computational graph is shown in TensorBoard. Hohman et al.⁹ compute an attribution graph by combining and aggregating neuron activations and influences to summarize the learned image features of the network (see Figure 3(b)); the architecture is reflected with a node-link diagram showing weighted edges and example images with a high activation at the nodes. The focus here is on the learned filters as they have an inherent visual representation. The edges show the learned weights but do not differentiate between instances and do not show their individual contribution.

Figure 3.

(a) Typical NN visualization where trained weights are visualized as width-encoded edges. Positive weights are blue, negative weights orange. Neuron activations are shown using filters for image data (b) for example, in the Summit approach⁹ or two-dimensional scatterplot (c) for example, in Tensorflow Playground.²⁹ Note that (b) only works for data with an inherent representation and (c) does not scale beyond two feature inputs. (d) Our proposal to show for the selected instances the class contributions using color-banded edges rather than showing learned weights and activation histograms for each neuron. (e) Edges with negative weight can be drawn semi-transparent and node clustering (f) reveals groups of similar and non-activated nodes (g). By selecting nodes and computing importance-scored paths, features contributing to the final classification are revealed (h–j). Note that a classical weight visualization (a) does not reveal this, but relies on filter visualizations (b) for the interpretation.

Visualizing instances

Next to visualizing the architecture, other techniques focus on visualizing inputs and outputs of the model, but rarely their propagation through the model. The work by Halnaut et al.⁴² is an exception to this by showing the progressive classification of all instances on a trained NN. Users can quickly identify fast-recognized, unstable, or lately recognized instances using a Sankey diagram based on clustered classifications per layer. However, our goal is different as we want to analyze model behavior under different (user-selectable) instance cohorts. In contrast to clustering the instance classifications we show the instance contributions on the neural network architecture and their feature importance through importance-scored paths. For other works, derived properties are computed and visualized typically by producing a 2D embedding of either the data, classes,⁹ the hidden layer activations,⁴³ a projection of the latent representation space,⁴⁴ or model outputs.³ The embeddings are constructed using non-linear dimensionality reduction techniques, typically visualized using scatterplots. In contrast to these post-hoc analysis methods, DeepEyes⁴⁵ assists in model building by combining embeddings of multiple layers, activation heatmaps, and filter maps. In our method, we also use a two-dimensional scatterplot embedding of the instances, not as an end result but as a means to select instances of interest to compute and visualize their relative contribution to the classification to reveal model behavior under different instance inputs.

Visualizing neurons

Several works focus on the computations of the filters at each neuron. The majority of visualization methods are applied to CNNS and images⁴⁶ as they have an inherent visual representation which makes it easy to interpret and understand filters. Zeiler et al.¹ were among the first to create visualization techniques to explain the convolution filters using deconvolution. At each layer, reconstructed images are shown to reveal patterns of what the network learned (see Figure 3(b)). This technique was optimized by Yosinski et al.⁴⁷ to produce more recognizable images using regularization methods. Also, here the focus is on post-hoc understanding of the layers of a CNN applied to image and video data. Analyzing the training processes using similar techniques and focusing on image data are presented in DGMTracker.³² LSTMVis⁴⁸ focuses on understanding the internals of Recurrent Neural Networks (RNN)⁴⁹ with heatmaps. Most techniques focus on directly interpretable data such as images and text; however, how to extend these filter visualizations to generic multivariate data is unclear. Therefore, we show the neuron activations of selected instances using histograms, and focus on the propagation of instances and their relative contribution instead (see Figure 3(e)).

Global and local analysis

We aim for visualization and analysis methods on the instance level to support both local and global model exploration and understanding. Most methods focus purely on global understanding^7,26–30 or do not support switching between global and local analysis easily. An exception is ActiVis³¹ which enables users to explore deep NN on instance and subset-level. Using a combination of coordinated views, showing the instances, the computation graph, and neuron activations using a matrix. In contrast, we focus on the visualization of instances mapped onto the neural network architecture by using color-banded Sankey edges to show how they are processed by the model. Finally, several explanation methods are model-agnostic, not necessarily focusing on an interactive visual representation of the elements involved but instead relying on computational methods for local and global analysis. Most methods perturb the input to train surrogate models like LIME⁵⁰ (local analysis) and SHAP⁵¹ (global analysis). However, here we focus on model-specific explanations for NN, exploiting the model’s unique properties and decision mechanisms. In Section 5, we evaluate how our method compares to these computational methods.

We discussed the main approaches for NN visualization closest to our work. However, xai techniques are plenty, and for a more detailed discussion, we refer to recent surveys.^21–54 In summary, most current techniques focus on the visualization of filters for data with an inherent representation (CNN for images, RNN for text). However, no good solutions exist for generic multivariate data as filters are not directly interpretable, and visualization research on NN models for multivariate data is lagging behind, while their development and usage are ubiquitous.^{10,11,55–60} Therefore, we propose to add an instance-based visualization method to show model behavior under different instance properties to the existing toolbox of NN visualization and understanding.

Instance-based network visualization

To enable the exploration of NN models on a global and local level, we exploit the familiarity of the network architecture and use it as context. The network architecture is rendered using a network visualization where neurons are nodes and the weights are links. The exploration starts by selecting instances of interest (train and/or test). This can be a single instance, groups of instances (e.g. all instances from one class), or all instances to explore global model behavior. We provide users with three ways to select instances, using:

a sortable table widget where each row is an instance;

an instance embedding in 2D space using dimensionality reduction of the features (e.g. umap⁶¹); and,

filtering with scented widgets⁶² for each input feature.

This enables selecting instances based on feature values using the table and scented widgets or selecting structurally similar instances using the embedding. This flexible setup allows for easy grouping, slicing and dicing of data, to observe model behavior under different input data. Next, the selected instances are propagated through the trained NN to compute their relative contribution and visualized using color-banded Sankey edges (see Section 4.1) and node activations (see Section 4.2).

Edge visualization

Recall that after training, each NN model edge is assigned a weight w_ij (see Figure 1). A positive weight contributes positively to activating the nodes it is connected to, whereas a negative weight reduces its influence. In other words, the weight denotes the importance of the input value to each node. Typically, in NN model visualization, the network is shown with node-link diagrams where the edge width is used to depict the learned weight, and color denotes a positive (e.g. blue) or negative (e.g. red) weight. This provides an overview of the final learned weights but does not provide any insight into model behavior as data instances are not taken into account.^41,63 Therefore, in contrast to previous work, we use the instances of interest as an exploration mechanism by propagating these through the network, computing the contribution values, and visualizing these on the network architecture (see Figure 3(e)).

Given an instance $(X, c_{i})$ , where X denotes the feature values $x_{0}, \dots, x_{n}$ , and c_i the class label of this instance, we compute all involved activations using equation (1). In addition to the activation value at each neuron, the values at each edge, having an influence on the activation (and ultimately the classification), are computed by multiplying the activation of the previous layer with the current weight: $a_{i - 1 j} w_{ij}$ that is, propagating the instance through the network. After computing the instance contributions, the values are used for visualizing the edges; see Figure 4 for an example. Note that, in contrast to related work, this is different from visualizing the learned weights. For insights into model behavior, we do not want to see the effect of a single instance but rather groups of instances. Now, for each computed edge weight and node activation, we compute each relative class contribution to this value. This is then used to color the edges proportionally according to the class contributions. Here we use color-banded Sankey edges where the width of each edge represents the total weight after computation, and the colors represent the contribution of each class to this weight (see Figure 4 (right)). In summary, the edge width ω is equal to the sum of all activations a_ij times the weights w_ij of the involved N instances:

$ω = \sum_{n}^{N} | W_{n} |,$ (2)

where $W_{n} = {(a_{ij} w_{ij})}_{n}$ , the weight computation for instance n. The proportional contribution of instance n is then the current weight divided by the total weight $W_{n} / ω$ . The edge color is set to the associated class label(s). Note that we take the absolute value in the width computation as weights w_ij can be negative. To distinguish between positive and negative weights, edges with similar signs can be bundled (similar to Liu et al.,¹⁷ see Figure 5), and/or negative edges can be drawn semi-transparent (see Figure 3(e)), as they are generally less important. After computing all instance contribution values, the edge widths are normalized. This enables users to confirm that connections become stronger in each layer (as is typical for well-trained NN models). However, other normalization methods may provide other insights; therefore, we provide three normalization options, global normalization, normalization per layer, and normalization of edges in view (after filtering). The edges are rendered using Bézier splines to support tracing the instances through the network by making use of the Gestalt continuity principle. The spline control points can be influenced with a bundling strength parameter, from straight edges to fully bundled to improve readability.

Figure 4.

Neural network with three inputs, one hidden layer, and two output classes. The trained weights are depicted on the edges (left). One instance ( $x_{1} = 5, x_{2} = 1, x_{3} = 5, c_{1}$ ) is propagated through the network and (normalized) weights are visualized using edges; color denotes the class (middle). We can conclude the instance is classified as $c_{1}$ because $x_{1}$ and $x_{3}$ have a high contribution toward the final classification. In the right plot, three instances with different class labels are propagated showing all weight contributions. Projecting multiple instances reveals global model behavior, where one or a few instances provide local explanations.

Figure 5.

Graphical user interface with coordinated components: instance selection via (a) attribute filtering; (b) 2D embedding of the instances, based on input features; and/or (c) a sortable table. The instances are visualized on the network architecture using Sankey edges, and neuron activation histograms (d). Neurons can be clustered based on similar activations, and model behavior can be explored by importance-scored paths (e). Insights from the visualization (see also section 5 are 1) coNNections become stronger with each layer. (2) Neurons without activations; can potentially be removed. (3) clusters of similar behaving neurons. Both activate highly on the Gentoo class. (4) Neurons that activate on Chinstrap are under-represented. (5) Importance scored path exploration. (6) Bundled positive and negative edges (inspired by Liu et al.¹⁷).

Node visualization

With the color-banded edges, the importance of each input to a node is shown. In addition, we also want to see the activations and how well a node is able to separate the instances of different classes. The activation strength represents the node’s response to an instance and, therefore, its contribution to the overall classification of it. The inputs to a node are multiplied by the weights on the edges and then summed. This represents a linear (regression) function through the input space (see equation (1) and Figure 6, step 1). This linear function is used as an input to the activation function (see Figure 6, steps 2–3). The activation function induces a non-linear distortion of the space in order to separate the instances. This mapping (typically ranging from −1 or 0 to 1) are the instance activations for this node; instances with a high value activate the node, and instances with a low value do not cause node activation. The resulting activation value is passed forward through the network. The activations, presented as raw values (filters) in previous work, do not directly convey how well the node is able to separate the instances of different classes, but rather what a combination of nodes has collectively learned (see e.g. Figure 3(b)). It does not give insight into how the neuron distorts the space to separate the instances of different classes. This works for instance data with an intrinsic representation, but provides no insight for generic multivariate data. Many different activation functions are possible, and they are difficult to understand without mathematical reasoning. We aim for a generic visualization solution that shows the activations and how well the instances are separated by the deformation of the input space. Therefore, we create stacked histograms of the instance activations (see Figure 6, step 4). The colors of the histogram bars denote the class ratios of the involved instances. With this histogram, the activations for the different instances are conveyed, as well as the ability of the node to separate the different classes. We create histograms for each node of the network’s hidden layers. Users are enabled to normalize for each histogram or set a global normalization. The number of bins and associated bin ranges are initially computed using the Freedman-Diaconis formula⁶⁴ as it is robust against outliers. Users are enabled to change the number of bins manually or according to other formulas (i.e. square-root, Scott’s choice, and Sturges rule). No activation exists for the network’s input and output layers. Therefore, we show the feature distribution of each input (again colored according to the involved classes of the selected instances). The network’s last layer shows histograms of the predicted class probabilities.

Figure 6.

Steps to create the proposed activation histograms that show how well instances of different classes are separated by each activation function. Here we show a simplified example with two inputs $x_{1}, x_{2}$ ; in reality, the inputs and weights are many. (1) When propagating the instances through the NN, the inputs are multiplied by the weights and summed to create a linear regression function. Next, the instances are projected (2) and a non-linear activation function is applied to separate the classes (3). (4) The resulting histogram shows the node activations and simultaneously show how well the node separates the different classes.

Interaction & analysis

To enable the exploration and analysis of NN models, we show the instance-based network visualization on a zoomable canvas. Instances can be selected from a (combination of a) table, 2D scatterplot embedding, and scented histogram widgets. Through direct manipulation, users are enabled to select and manually reposition network nodes, for example, to group similar nodes. While the visualization of the NN model is similar for both global and local analysis, the interaction mechanisms involved are different, as described below. We refer the reader to the video in the Supplemental Material for a quick demonstration of the interaction methods.

Understanding NN behavior typically starts by selecting all data instances. One of the tasks (G3) is to understand if the chosen number of layers and neurons is appropriate (see Section 2.4). Through the activation histograms, we can easily identify whether nodes are minimally or not activated. This indicates that these nodes could be removed when creating a final model or that the architecture could be reduced overall. In addition to inspecting the histograms, nodes can be sized according to the average activation of the selected instances for quick identification. Next to identifying non-activating nodes, another task (G4) is to discover if nodes have learned similar behavior (and thus are redundant and could be removed for a final model) or if the number of neurons in that layer could be reduced to prevent learning similar behavior. To support this task, we provide users with node clustering capabilities. For each node, the instances activation values and associated class values are input to a density-based clustering algorithm (here we use DBScan⁶⁵). We found ε= 0.3 (distance between points to be considered for a cluster) to be a good default value for determining the clusters as the activations are typically in the range [0..1], but users are free to adapt this parameter. In general, this cluster parameter depends on the activation function used and how well the activation function is able to separate the classes. If the activation function is within the [0..1] range and achieves an aggressive non-linear separation (e.g. sigmoid, tanh) a value between 0.2 and 0.4 typically works well. After applying clustering for each layer, the nodes in the layer are grouped and repositioned using smooth animations to preserve the users’ mental map. Note that clustering can be applied in each stage of the analysis, taking into account selected instances and applied data filters.

The activation histograms show how the learned spaces, or non-linear mappings, separate the instances of different classes (task G2). This can be linked to trust in the classification capabilities of the involved classes. In combination with the Sankey edges, we can also observe the influence of the input spaces on this learned space; a thick connected incoming line means there was a bigger influence of connected neurons (or higher importance) compared to connected neurons with thinner incoming lines. Another important task in understanding model behavior is identifying the certainty of classifying the different classes (G1). One way to determine this is by analyzing the Sankey edges, which are colored according to their class contributions, and showing the flow of instances through the network. The Sankey edges colors show if one class is over-represented by the network, under-represented, or well-balanced between the classes. This can be linked to class certainty and trust in the model; if a given class is not represented well, the uncertainty of a classification result for this class is higher, and the output should be carefully considered. The opposite also applies; if a class is over-represented (the network activates highly for this class), the certainty of the model for this classification is higher. Naturally, the output should still be handled with care on a case-by-case basis. Next to exploring edge colors, the overall strength of the connections between the nodes can be explored by filtering on weights; users are enabled to apply this as a filter for hiding or showing connections, for example, to only visualize and focus on the strongest connections.

Importance-scored paths

Users need to understand why an instance of interest is classified (task L2) and what features determine the output (task L1). To support these tasks, users are enabled to select a node in the output layer (see Figure 3(h)–(j)) and for the involved instances we then trace back from this node all paths P leading to the input nodes (the data features). The paths $p \in P$ are given an importance score p_s that is computed by multiplying the projected weights ( $a_{ij}^{k} w_{ij}^{k}$ ) of each edge k on the path p from input to output:

$p_{s} = Π_{k = 0}^{K} a_{ij}^{k} w_{ij}^{k},$ (3)

where K equals the number of layers minus 1. Next, paths are ordered such that $p_{s}^{i} \leq p_{s}^{j}, \forall_{i, j}$ , where i and j represent the indices of path $p \in P$ . Users are presented with the sorted path list and can select these for analysis. The same procedure is applied to trace the edges forward if a neuron is selected in one of the hidden layers or the input layer. After selecting (multiple) paths from the list, they are visually shown in the model visualization while hiding all other edges. The intuition behind this analysis is that paths with a higher importance score are more important for classifying the selected instances. This means that the nodes on paths with a high score have a high activation, and the input node at the start of the path contributes highly to this classification. Through this direct manipulation of selecting nodes and exploring paths, we enable users to trace back (and forward) the strongest enablers from class prediction to data features.

Implementation details

We implemented the visualization methods in a desktop prototype using Qt/C++ (see Figure 5 and Supplemental Material). The prototype needs an input file containing the data, the NN architecture, and the weights after training. As NN models are typically trained using Python, we provide a script to extract the needed information, to be used as input file.

Examples, use cases, and user study

For the evaluation of NN models using data that has an intrinsic representation, several standardized datasets are typically used (e.g. MNIST,⁶⁶ ImageNet⁶⁷). However, for NN models for generic multivariate data, no such prominent datasets exist. We aim to demonstrate the most important features using a combination of real-world datasets from different domains. Furthermore, we compare our insights with insights from computational methods SHAP⁵¹ (global analysis) and LIME⁵⁰ (local analysis). Finally, we perform a qualitative user study with an heuristic-based evaluation methodology (ice-t⁶⁸) and report on the results.

Global model understanding

The first use case is centered around predicting penguin species from the Palmer Penguins Dataset.⁶⁹ The dataset consists of 345 instances, each with six multivariate attributes (two categorical and four quantitative measures), and aims to predict penguin species from one of three classes. We build a high accuracy (0.985) NN model of four fully connected layers having 10, 10, 5, and 10 neurons in each layer, respectively. We one-hot encode the two categorical values for a total of nine input values.

To understand global model behavior, we select all data instances and see that classes are well separated by the model looking at the output layer (task G1). The neurons responsible for the final classification all have a high probability for the associated class and a low probability for the others, see Figure 5(1–4). Also, the width of the edges increases with each layer of the network (1). Next, we cluster neurons in each layer on similar behavior (task G4). Here we see that there is not much clustering in the first layer, and all neurons learned different functions to transform the input space. In the second layer, there are 5 clusters, layer three again learned different functions, and the final layer again shows several neurons that learned similar functions. We also see that there are several neurons in the layers that do not activate (task G3) and essentially do nothing (2). In combination with the clusters, we conclude that the architecture can be smaller, and the number of neurons can be reduced while preserving function (3) – indeed after training a smaller network with the number of neurons according to our clustering and removing non-activating neurons – 8, 4, 5, and 7 neurons in each layer respectively, we achieve the same accuracy (0.985). Another important insight we see is that the connections of the classes Gentoo (pink) and Adelie (Green) are well represented, especially in the final layers (task G2). This means that predictions for these classes the network is certain, while for Chinstrap (Blue), users should be more careful in trusting results (4). These insights cannot be acquired with computational methods such as SHAP, as they only consider input and output and ignore internal model decision mechanisms. However, for comparison we retrieve each class’s feature value impacts with SHAP and compare these to our visual insights by tracing importance-scored paths. When selecting the Adelie class, the highest scoring paths lead to a combination of low flipper-length-mm, high culmen-depth-mm, and in general not from island-Biscoe (see Supplemental Material video). Visually, the network defines this as characteristic of Adelie. SHAP reports Adelie is characterized by low culmen-length-mm, high culmen-depth-mm, and high island-Torgersen (see Figure 7). This is in agreement with our visual findings except for a low culmen-length-mm. We see the model does not use this feature in the internal neurons but rather uses a low flipper-length-mm, which is not reported by SHAP. For Chinstrap, it is a combination of high culmen-length-mm, low flipper-length-mm, low culmen-depth-mm, not from island-Biscoe but from island-Dream. SHAP agrees that Chinstrap instances are not from island-Biscoe but from island-Dream. Again SHAP reports culmen-length-mm as impactful, but this is not reflected in the model, and flipper-length-mm has a bigger influence. For Gentoo, the characteristic is a combination of high flipper-length-mm, low culmen-depth-mm, and high island-Biscoe. Here, the SHAP values agree, except again for the culmen-length-mm.

Figure 7.

SHAP values for the Palmer Penguins dataset.

We conclude that flipper-length-mm, culmen-depth-mm, and island-Biscoe are important to the model for the classification. Generally, our visual findings and SHAP agree, and we believe the differences are due to SHAP only taking into account the input and output and not the internal decision mechanisms. Hence, with the interactive visualization, we better understand the internal functions and how instances are propagated to reach a final classification.

An example NN model where classification results should be much less trusted is shown in Figure 8. A NN trained on the Pima Indians Diabetes dataset⁷⁰ aims to diagnostically predict whether the subjects have diabetes based on measurements. For this NN, the output layer is more fuzzy and confused between the two classes (1). Also, there is no real clustering of the neurons, and in addition, the neurons are not good at separating the classes, as the histograms show mixed class colors (2). There are no neurons that activate highly for one class. Upon tracing back the highest-scoring paths, both classes consider similar features, and there is no clear distinction between classes. Predictions are made mainly based on Glucose levels, which SHAP also reports as most impactful feature.

Figure 8.

NN model for the Pima Indians Diabetes dataset.⁷⁰ (1) The final output (class probabilities) is fuzzy and (2) there are no clear clusters of neurons.

Local analysis

Continuing with the Palmer Penguins dataset, from the data embedding, we see a cluster of Adelie structurally similar to a cluster of Chinstrap (see Figure 9-1) and wonder how the model distinguishes between the two (task L1). We select both clusters from the embedding and cluster the neurons per layer to reveal similar behaving neurons. From the input layer, we conclude that both clusters are structurally similar because all points involved are from island-Dream, are all sex-Female, and the quantitative attributes culmen-depth-mm, body-mass-g, flipper-length-mm are all similar, as can be seen from the histograms. Next, we wonder how the model distinguished between the two classes (task L1). By selecting the Adelie class (Figure 9-2) and inspecting the highest-scoring activation paths, we see that the model considers mentioned features for classification, except for culmen-length-mm. When selecting the Chinstrap class (Figure 9-2), the same features are considered, but now, the culmen-length-mm is also used (Figure 9-3). Therefore, the model uses this feature to distinguish between the two clusters. Furthermore, we see that the neurons in the later layers are capable of enhancing the distinction between the two, as the histograms nicely show a further separation of the classes (task L2).

Figure 9.

Local model behavior for two structurally similar classes (1). After selecting the highest scoring paths for both classes (2), we see that culmen-length-mm is the distinguishing feature (3) used for the classification.

To compare these insights with a local computational method, we take representative instances of each cluster (the centroids) and apply LIME to both. The Adelie instance is classified because it is not from island-Biscoe and not from island-Torgersen (implying from island-Dream) and is sex-Female– all in agreement with our findings. The same attributes are considered for the Chinstrap instance and additionally culmen-length-mm is different (see Figure 10). So we conclude this is the differentiating feature, again in agreement with our visual findings. However, with our method, we gained additional insights, and we consider it a strong point to be able to quickly switch between global and local analysis, whereas LIME is only able to deal with one instance at a time.

Figure 10.

LIME ⁵⁰ values for two instances of the Palmer Penguins dataset. Culmen length is the differentiating feature between the classes, in agreement with our visual findings.

Another local analysis form is understanding misclassifications (task L3). In Figure 11, a three-layer fully connected network is shown that classifies wines grown in the same region in Italy but derived from three different cultivars.⁷⁰ The network is well-trained (thicker edges as layers increase), and the neurons have different functions and are able to activate on different classes (well-separated colors in the histograms). From the table view, we identify several misclassifications and would like to understand why they are misclassified. For one of the items, with actual label 3 and classification label 2, we show the exact activation values as orange triangles underneath the histograms by activation the associated option from the menu. We select the highest scoring paths for class 2 and see that for most involved neurons, the activation value of this instance is on a boundary of class 2 and class 3. Also, for the involved features, except for Nonflavanoid.phenols, the activation values are on the boundary of classes 2 and 3. Running LIME on the same misclassified instance returns a correct classification. LIME fails here because it trains a local surrogate model, which does not necessarily reflect the true model. It is locally correct but does not represent our model well, which does not help in understanding model behavior.

Figure 11.

NN trained on wine dataset showing a misclassified instance’s activation values (orange triangles). All values are on the boundary of classes 2 and 3, explaining the misclassification.

User Study

To evaluate our visualization approach regarding effectiveness, strengths, weaknesses, and value, we use a combination of different evaluation methods in a broader user study:

observational task performance analysis, while,

thinking aloud,⁷¹ and afterward perform a,

heuristic-based evaluation (ice-t),⁶⁸ and,

qualitative interview.

The evaluation taking about 45 min per candidate consisted of four subsequent parts. (1) An introduction where users were introduced to the topic and evaluation procedure. This first part contained an explanation of the neural network visualization and tasks on an example dataset were shown and explained. (2) A task performance analysis: the test users had to perform a mix of 10 different tasks corresponding to our user tasks (G1-4, L1-3) defined in section 2.4 on a bigger more complex dataset (Palmer Penguins⁶⁹). The tasks were categorized into a set of global analysis tasks (task 1–4) and local analysis tasks (task 5–10). While solving a task, the task description was always visible to the user. The task was completed when the test person provided an answer. Users were encouraged to think out loud and share their thoughts. (3) A heuristic evaluation. After completing the tasks, the test users were asked to fill in the ice-t heuristic-based survey. (4) A qualitative interview. After the survey users were asked a few open ended questions about the visualization tool and the evaluation.

As recommended by Wall et al.⁶⁸ five participants are required for the heuristic-based evaluation method. We selected the participants based on having a background and domain knowledge in machine learning (all having at least a master’s degree in computer- or data-science with a specialization in machine learning). The evaluation was conducted in person, using a laptop with an external 24-inch monitor with Full High Definition resolution.

Task performance analysis

Users did not have any problems in executing the tasks successfully. Both the tasks on global understanding as well as local understanding of the NN were all answered correctly and quickly.

Heuristic evaluation

After concluding the tasks, the participants were asked to fill in the ice-t questionnaire. The participants received 21 statements that they needed to rate on a 7 point Likert scale from 1 (strongly disagree) to 7 (strongly agree) or n/a if it was unclear how to rate it. Only P2 scored statement 21 (“If there were data issues like unexpected, duplicate, missing, or invalid data, the visualization would highlight those issues”) as n/a. No other questions were rated n/a by any of the participants. The 21 statements were categorized into four components, Insight, Time, Essence, and Confidence. Table 1 shows the average ratings of the categories for all participants, as well as the overall average rating. An overall mean score of 5 or higher is considered a success.⁶⁸ All mean scores for the components were above a 5, confirming our method helps users in understanding NN. All individual components were scored at least 5, except “Confidence” for P3. After the qualitative interview it became clear P3 would have liked additional visualizations (more embeddings, other representations) to verify and double-check findings.

Table 1.

User study results of the ice-t evaluation. Average scores for each participant (P1-5) categorized by the four components of insight, time, essence, and confidence. As stated by Wall et al.⁶⁸ an average score of 5 or higher is considered a success.

	P1	P2	P3	P4	P5	Average
Insight	6.9	6.4	6.3	6.5	6.4	6.5
Time	7.0	6.2	5.6	6.0	6.4	6.2
Essence	6.8	6.8	6.3	6.5	6.8	6.6
Confidence	6.3	6.0	4.5	5.3	5.8	5.5

Average scores (1–7) are colored using a red-yellow-green colormap.

Qualitative interview

In the interview round it was mentioned that the system was intuitive and easy to use. “The system is easy to use and intuitive. The experience is nice and I do think it can be an amazing tool for students” (P3). P2 mentions “It’s fun to play around with, which helps spark curiousity towards the data.” The histograms and path exploration features were mentioned as helpful in the analysis of NN models. P2 mentioned this could be even exploited further: “The histogram was particularly useful to the point that I wish I got a larger view of the histogram in a side panel when selecting a node.” However, P3 mentioned that understanding the activation histogram took some time and could maybe be improved with another representation or be supported by additional views. The clustering of nodes was also appreciated: “clustering the nodes together based on distribution is also very intuitive” (P1). Finally, the addition of more techniques for computing the embedding (t-sne, pca) and different representations were suggested to improve the analysis further. Additionally, several participants mentioned visual scalability as a challenge, but also acknowledged this is not an issue for multivariate NN, as they can be relatively small and still have a high accuracy.

Summary

Overall, the user study has shown that the proposed visualizations and analysis method are useful for the analysis of NN for multivariate data. Users were able to perform tasks effectively and were able to understand the network both on a global and local level.

Discussion and limitations

Visualizations of NN mainly focus on inherently interpretable data (e.g. images, text, speech). However, abstract multivariate data is (still) the most commonly used form of data. Therefore, in this work, we aimed to create visualizations and interaction techniques for NN models not limited to human interpretable data. Note, however, that our instance-based method also works for data with an intrinsic representation, but more sophisticated methods exist.⁹ Typically, the performance of networks trained on abstract data compared to networks trained on interpretable data is lower.⁵³ The instance-based visualization and analysis methods proposed in this paper may help bridge this gap by identifying and understanding the characteristics that cause weaker performances.

Modern NN trained on intrinsic interpretable data (i.e. so called foundation models⁷²) consist of many layers and neurons⁷³; however, since NN models for multivariate data are under-explored, size is also typically much smaller.^10,11 Therefore, we believe our current visualization method, in combination with a semantically zoomable canvas, is appropriate for our purpose: the adoption of a semantically zoomable canvas allows users to navigate between coarse aggregate network overviews and fine-grained neuron-level details, mitigating visual clutter by revealing detail only when needed. Still, visual scalability is a concern, as the network size may grow in the future. A possible (partial) solution is to further exploit automated methods, such as clustering and aggregating similar behaving nodes. Clustering can be performed based on activation patterns, as discussed in Section 4.3, weight similarity, mutual information with class labels, or even temporal dynamics if the network includes recurrent components. Visual aggregation of such clusters, representing them as single composite nodes or bundled edges, reduces the number of visual primitives dramatically, allowing users to understand overarching patterns without being overwhelmed by low-level details. Importantly, such abstractions need to be reversible through interaction: users must be able to expand aggregated nodes to inspect individual contributions.

Also, the data and edge filtering based on the weight helps focus on the elements of interest and increases clarity without compromising interpretability. For instance, only edges whose weights exceed a user-defined threshold are shown at higher zoom levels, while minor edges can be suppressed, bundled, or rendered translucently. This approach leverages the fact that many NN connections contribute only marginally to the model’s decision boundary. This dynamic filtering approach also enables scenario-specific analyses; for example, examining only the edges relevant for a particular class, instance subset, or concept activation.

Potentially, the number of classes is limited, as our visualizations depend on color as the visual channel to discern the classes for the Sankey edges and the histograms. However, the number of classes is typically low ( $< 10$ ) in practice for tabular data.^10,11,53 To increase visual scalability alternative encodings could be explored. These include texture patterns, edge shape variations (e.g. tapered, dashed), or even material-based shading to better distinguish classes when color capacity is exceeded. Hierarchical class groupings or class clustering can also be used to reduce the effective number of categories shown at once.

One advantage of the current visual encoding is that it remains independent of the number of input instances. Since both the Sankey edges and histograms operate on aggregated statistics: ratios, counts, and distribution summaries, large datasts do not introduce additional visual clutter. This makes the approach scalable with respect to data size, allowing the visualization to support interactive exploration even when millions of instances are involved. The computational aggregation of these measures can be outsourced to pre-processing, GPU-acceleration, or incremental update strategies if needed.

The node visualization with activation histograms aims to show and explain the mapping process of the neurons. This method is generalized from previous work²⁹ because it can deal with arbitrary activation functions, does not depend on function specifics, and scales beyond two input features. However, we did not create visualizations for other filters commonly used for interpretable data, such as convolution, pooling, recurrence, or transformers. Our method is potentially also useful for exploring NN models for such data. However, suitable visualizations should be researched, or a mix of our visualization with existing techniques⁷⁴ could be a solution. Extending the approach to convolutional networks is conceptually feasible but non-trivial. Convolutional layers operate on spatial feature maps rather than scalar neurons, so simple histograms would miss spatial structure. Aggregating activations per filter, channel, or region is possible, but determining an appropriate spatial abstraction remains an open challenge. For recurrent networks, temporal dynamics introduce additional complexity. Neurons exhibit activation distributions across time steps rather than a single static distribution, requiring temporal aggregation or small-multiple summaries. Moreover, recurrence introduces cycles, complicating the acyclic layout assumptions of our Sankey edges. Adapting the method to transformers presents both opportunities and constraints. Attention mechanisms align well with our flow-based metaphors, yet activation tensors in multi-head attention are high-dimensional, needing aggregation or projection beyond our node histogram summaries. Overall, while the core principles of our approach generalize in theory, each architecture type introduces structural properties, spatial, temporal, or attentional, that demand dedicated visual abstractions.

Our focus is on NN models that target classification, the same techniques can likely also be applied to regression problems. However, since we rely on qualitative colors for the classes, this should be replaced with ordinal sequential colormaps (to represent regression outputs). Finally, we focus on a post-hoc analysis, but the same techniques could also be used to visualize the model during training, for example, using (controllable) animation or exploration based on small multiples.⁷⁵

Conclusions and future work

In this paper, we presented novel interactive instance-based visualization techniques that enable both local and global analysis of neural networks. The visualization techniques are based on selecting instances of interest. The instance-based visualization shows the flow of items through the network using color-banded Sankey edges and activation histograms. Using Sankey edges and histograms is not new, but showing the associated instances-based contributions is. We aim to provide a global understanding of the model behavior through node clustering, interaction, and various instance selection mechanisms. Instances can be selected with a sortable table, through scented widgets for feature filtering, and using a dimensionality reduction projection to identify instance similarities. Local analysis is supported by back- and forward-tracing of paths starting from a node of interest (i.e. output, input, or any node in between). We compute an importance score for each connected path and enable users to explore these paths with direct manipulation interaction. We have shown the effectiveness of our method using examples, use cases from different real-world datasets, and a qualitative user study. Participants were able to gain insights into model behavior beyond computational methods. With our method, we aim to help machine learning practitioners in visualization and analysis of model behavior and hope it can contribute to further development and adoption of neural networks. While we aim to support model developers and end-users, also non-experts may benefit from the developed techniques. Who want to learn the techniques involved with neural networks.

We identified several directions for future work. First, for (very) large network architectures, visual scalability should be improved beyond a semantically zoomable canvas. A promising direction to explore is using automated methods to aggregate layers and neurons and designing suitable visualizations for these. Second, methods to visualize the training process and interaction methods to perform (real-time) construction and adaption of the model might help in the understanding and lead to better models. Finally, the visualization might benefit from other views showing NN properties, for example, accuracy, training-loss, gradient-descent, and roc-curves.

Footnotes

Acknowledgements

We thank Jarke J. van Wijk and Fernando V. Paulovich for valuable feedback on earlier versions of the manuscript, which greatly improved the quality. We are also grateful to all user study participants for their time, engagement, and thoughtful contributions.

ORCID iD

Stef van den Elzen

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Supplemental material

Supplemental material for this article is available online.

References

Zeiler

Fergus

, et al. Visualizing and understanding convolutional networks. In: Fleet

Pajdla

Schiele

(eds) Computer Vision. Springer International Publishing, 2014, pp. 818–833.

Michelsanti

Tan

Zhang

, et al. An overview of deep-learning-based audio-visual speech enhancement and separation. IEEE Trans Audio Speech Lang Process 2021; 29: 1368–1396.

Johnson

Schuster

, et al. Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans Assoc Comput Linguist 2017; 5: 339–351.

Rumelhart

Hinton

Williams

RJ.

Learning representations by back-propagating errors. Nature 1986; 323(6088): 533–536.

Hohman

Head

Caruana

, et al. Gamut: a design probe to understand how data scientists understand machine learning models. In: Proceedings of the CHI conference on human factors in computing systems, 2019, pp. 1–13. New York, NY: ACM.

van den Elzen

Andrienko

, et al. The flow of trust: a visualization framework to externalize, explore, and explain trust in ML applications. IEEE Comput Graph Appl 2023; 43(2): 78–88.

Wongsuphasawat

Smilkov

Wexler

, et al. Visualizing dataflow graphs of deep learning models in tensorflow. IEEE Trans Vis Comput Graph 2018; 24(1): 1–12.

Strobelt

Gehrmann

Behrisch

, et al. Seq2seq-vis: a visual debugging tool for sequence-to-sequence models. IEEE Trans Vis Comput Graph 2019; 25(1): 353–363.

Hohman

Park

Robinson

, et al. Summit: scaling deep learning interpretability by visualizing activation and attribution summarizations. IEEE Trans Vis Comput Graph 2020; 26(1): 1096–1106.

10.

Kadra

Lindauer

Hutter

, et al. Well-tuned simple nets excel on tabular datasets. In: Ranzato

Beygelzimer

Dauphin

, et al. (eds.) Advances in NeurIPS. 2021, Vol. 34. Curran Associates, Inc, pp.23928–23941.

11.

Wang

Qiao

Lin

Tabular machine learning using conjunctive threshold neural networks. Mach Learn Appl 2022; 10: 100429.

12.

Lipton

ZC.

The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery. Queue 2018; 16(3): 31–57.

13.

Barredo Arrieta

Díaz-Rodríguez

Del Ser

, et al. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion 2020; 58: 82–115.

14.

Han

Moraga

The influence of the sigmoid function parameters on the speed of backpropagation learning. In: Proceedings of the international workshop artificial neural networks: from natural to artificial neural computation, 1995, pp.195–201. Berlin, Heidelberg: Springer-Verlag.

15.

Krizhevsky

Sutskever

Hinton

GE.

Imagenet classification with deep convolutional neural networks. Commun ACM 2017; 60(6): 84–90.

16.

Kalman

Kwasny

Why tanh: choosing a sigmoidal function. In: Proceedings of the international conference on neural networks, 1992, Vol. 4, pp. 578–581.

17.

Liu

Shi

, et al. Towards better analysis of deep convolutional neural networks. IEEE Trans Vis Comput Graph 2017; 23(1): 91–100.

18.

Garcia

Telea

Castro da Silva

, et al. A task-and-technique centered survey on visual analytics for deep learning model engineering. Comput Graph 2018; 77: 30–49.

19.

Liu

Wang

, et al. Filter pruning via geometric median for deep convolutional neural networks acceleration. In: Proceedings of the IEEE conference on computer vision pattern recognition, 2019, pp. 4335–4344.

20.

Pommé

Bourqui

Giot

, et al. Netprune: a sparklines visualization for network pruning. Vis Inform 2023; 7(2): 85–99.

21.

La Rosa

Blasilli

Bourqui

, et al. State of the art of visual analytics for explainable deep learning. Comput Graph Forum 2023; 42: 319–355.

22.

Benois-Pineau

Bourqui

Petkovic

, et al. Explainable Deep Learning AI. Elsevier, 2023.

23.

Prasad

Sloun

RJG

van Elzen

SVD

, et al. The transform-and-perform framework: explainable deep learning beyond classification. IEEE Trans Vis Comput Graph 2024; 30(2): 1502–1515.

24.

Meng

van den Elzen

Vilanova

. Modelwise: interactive model comparison for model diagnosis, improvement and selection. Comput Graph Forum 2022; 41(3): 97–108.

25.

Meng

Elzen

SVD

Pezzotti

, et al. Class-constrained t-sne: combining data features and class probabilities. IEEE Trans Vis Comput Graph 2023; 30(1): 1–11.

26.

Craven

Shavlik

JW.

Visualizing learning and computation in artificial neural networks. Int J Artif Intell Tools 1992; 01(03): 399–425.

27.

Streeter

Ward

Alvarez

, et al. NVIS: an interactive visualization tool for neural networks. In: Erbacher

Chen

Roberts

(eds) Visual data exploration and analysis VIII. 2001, Vol. 4302. International Society for Optics and Photonics, SPIE, 2001, pp. 234–241.

28.

Tzeng

KL.

Opening the black box - data driven visualization of neural networks. In: IEEE visualization, 2005, pp. 383–390.

29.

Smilkov

Carter

Sculley

, et al. Direct-manipulation visualization of deep networks. In: Proceedings of the international conference on machine learning (ICML) workshop on visualization for deep learning, 2017.

30.

Ming

Cao

Zhang

, et al. Understanding hidden memories of recurrent neural networks. In: IEEE conference on visual analytics science and technology, 2017, pp.13–24.

31.

Kahng

Andrews

Kalro

, et al. Activis: visual exploration of industry-scale deep neural network models. IEEE Trans Vis Comput Graph 2018; 24(1): 88–97.

32.

Liu

Shi

Cao

, et al. Analyzing the training processes of deep generative models. IEEE Trans Vis Comput Graph 2018; 24(1): 77–87.

33.

Liu

, et al. Nlize: a perturbation-driven visual interrogation tool for analyzing and interpreting natural language inference models. IEEE Trans Vis Comput Graph 2019; 25(1): 651–660.

34.

, et al. Usevis: visual analytics of attention-based neural embedding in information retrieval. Vis Inform 2021; 5(2): 1–12.

35.

Park

, et al. Sanvis: visual analytics for understanding self-attention networks. In: IEEE visualization conference, pp. 146–150.

36.

Dong

Song

, et al. Interactive attention model explorer for natural language processing tasks with unbalanced data sizes. In: IEEE Pacific visualization symposium, 2020, pp.46–50.

37.

Hoover

Strobelt

Gehrmann

exBERT: a visual analysis tool to explore learned representations in transformer models. In: Proceedings of the 63rd annual meeting of the association for computational linguistics, 2020, pp. 187–196.

38.

Mishra

Soni

Huang

, et al. Why? Why not? When? Visual explanations of agent behaviour in reinforcement learning. In: IEEE Pacific visualization symposium, 2022, pp. 111–120.

39.

Das

Park

Wang

, et al. Bluff: interactively deciphering adversarial attacks on deep neural networks. In: IEEE Visualization Conference, 2020, pp. 271–275.

40.

Park

Das

Duggal

, et al. Neurocartography: scalable automatic visual summarization of concepts in deep neural networks. IEEE Trans Vis Comput Graph 2022; 28(1): 813–823.

41.

DeVries

PMR

Thompson

Meade

BJ.

Enabling large-scale viscoelastic calculations via neural network acceleration. Geophys Res Lett 2017; 44(6): 2662–2669.

42.

Halnaut

Giot

Bourqui

, et al. Deep dive into deep neural networks with flows. In: Proceedings of the international joint conference on computer vision, imaging and computer graphics theory and applications, 2020, Vol. 3, pp. 231–239.

43.

Rauber

Fadel

Falcão

, et al. Visualizing the hidden activity of artificial neural networks. IEEE Trans Vis Comput Graph 2017; 23(1): 101–110.

44.

Chen

Hovy

, et al. Visualizing and understanding neural models in NLP. In: Proceedings of the 2024 conference of the North American chapter of the association for computational linguistics: human language technologies, pp.681–691. San Diego, CA: Association for Computational Linguistics.

45.

Pezzotti

Höllt

Van Gemert

, et al. Deepeyes: progressive visual analytics for designing deep neural networks. IEEE Trans Vis Comput Graph 2018; 24(1): 98–108.

46.

Khan

Sohail

Zahoora

, et al. A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 2020; 53(8): 5455–5516.

47.

Yosinski

Clune

Nguyen

, et al. Understanding neural networks through deep visualization. In: Proceedings of the ICML workshop deep learning, 2015.

48.

Strobelt

Gehrmann

Pfister

, et al. LSTMVis: a tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE Trans Vis Comput Graph 2018; 24(1): 667–676.

49.

, et al. A review of recurrent neural networks: lstm cells and network architectures. Neural Comput 2019; 31(7): 1235–1270.

50.

Ribeiro

Singh

Guestrin

“Why should i trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp.1135–1144. New York, NY: ACM.

51.

Lundberg

Lee

SI.

A unified approach to interpreting model predictions. In: Proceedings of the international conference on neural information processing systems, 2017, pp.4768–4777. Red Hook, NY: Curran Associates Inc.

52.

Hohman

Kahng

Pienta

, et al. Visual analytics in deep learning: an interrogative survey for the next frontiers. IEEE Trans Vis Comput Graph 2019; 25(8): 2674–2693.

53.

Borisov

Leemann

Seßler

, et al. Deep neural networks and tabular data: a survey. IEEE Trans Neural Netw Learn Syst 2022; 34(6): 1–21.

54.

Alicioglu

Sun

A survey of visual analytics for explainable artificial intelligence methods. Comput Graph 2022; 102: 502–520.

55.

Arik

SÖ

Pfister

. Tabnet: attentive interpretable tabular learning. Proc AAAI Conf Artificial Intelligence 2021; 35(8): 6679–6687.

56.

Sankhe

Khabiri

Agrawal

, et al. Tablenn: deep learning framework for learning domain specific tabular data. In: IEEE international conference on big data, 2021, pp.4097–4102.

57.

Fayaz

Zaman

Kaul

, et al. Is deep learning on tabular data enough? An assessment. Int J Adv Comput Sci Appl 2022; 13(4). https://doi.org/10.14569/IJACSA.2022.0130454

58.

Hwang

Song

Recent deep learning methods for tabular data. Commun Stat Appl Methods 2023; 30(2): 215–226.

59.

Meng

Lim

Lee

, et al. Credit card fraud detection using tabnet. In: International conference on information and communication technology, 2023, pp.394–399. IEEE.

60.

Borisov

Broelemann

Kasneci

, et al. Deeptlf: robust deep neural networks for heterogeneous tabular data. Int J Data Sci Anal 2023; 16(1): 85–100.

61.

McInnes

Healy

Saul

, et al. Umap: uniform manifold approximation and projection. J Open Source Softw 2018; 3(29): 861.

62.

Willett

Heer

Agrawala

Scented widgets: improving navigation cues with embedded visualizations. IEEE Trans Vis Comput Graph 2007; 13(6): 1129–1136.

63.

Moore

Schneider

Strickland

How deep learning works, https://spectrum.ieee.org/what-is-deep-learning (2022, accessed 30 April 2025).

64.

Freedman

Diaconis

On the histogram as a density estimator: L2 theory. Probab Theory Relat Fields 1981; 57(4): 453–476.

65.

Ester

Kriegel

Sander

, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the international conference on knowledge discovery and data mining, 1996, pp.226–231. AAAI Press.

66.

Deng

The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 2012; 29(6): 141–142.

67.

Deng

Dong

Socher

, et al. Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, 2009, pp.248–255.

68.

Wall

Agnihotri

Matzen

, et al. A heuristic approach to value-driven evaluation of visualizations. IEEE Trans Vis Comput Graph 2019; 25(1): 491–500.

69.

Gorman

Williams

Fraser

WR.

Ecological sexual dimorphism and environmental variability within a community of antarctic penguins (genus pygoscelis). PLoS One 2014; 9(3): 1–14.

70.

Dua

Graff

UCI machine learning repository, http://archive.ics.uci.edu/ml (2017, accessed 30 April 2025).

71.

Lewis

Using the ‘thinking-aloud’ method in cognitive interface design. Research Report RC9265, IBM TJ Watson Research Center, 1982.

72.

Hutchinson

Jianu

Slingsby

, et al. Foundation model assisted visual analytics: opportunities and challenges. Comput Graph 2025; 130: 104246.

73.

Bommasani

Hudson

Adeli

, et al. On the opportunities and risks of foundation models, https://crfm.stanford.edu/assets/report.pdf (2021, accessed 30 April 2025).

74.

Wang

Turko

Shaikh

, et al. Cnn explainer: learning convolutional neural networks with interactive visualization. IEEE Trans Vis Comput Graph 2021; 27: 1396–1406.

75.

van den Elzen

van Wijk

JJ.

Small multiples, large singles: a new approach for visual data exploration. Comput Graph Forum 2013; 32(32): 191–200.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB