Abstract
Keywords
Introduction
Association rule mining is one of data mining’s greatest success stories, 37 with proven utility in finding associations among items in market basket analysis, identifying which items are commonly bought together and offering retailers insight into optimal product placement, and combinations of products to bundle together on offer. These algorithms have subsequently found applications across a wide variety of sectors, including health and social care. However, with thousands or millions of rules generated by association rule mining of data, the processing of those rules becomes a data analytics challenge in and of itself. 1 To enable an intuitive exploration of the structure of large rule databases, we have turned to the Unity game engine 2 to produce 3D interactive force-directed graphs 3 of the association rule set for a health and social care dataset. The ‘playable’ mechanics include exploring the data in either a ‘whole systems’ view or building up a picture of needs and concerns focused on key input attributes. Additionally, the selection mechanics are underpinned by Boolean operations that allow selection, pruning and expansion of the dataset.
The use of game technologies to promote an understanding of systems to aid planning and understand complex systems is gaining momentum based on the key benefits such as customization, interactivity, aesthetics and optimization. 4 These benefits lead to interactive visualizations of playable simulations affording users the opportunity to explore, engage and extract meaning from the data set. 5 Game engines allow rapid prototyping permitting a powerful way to customize data presentation via programming of shaders. 6 Furthermore, portability and streamlined deployment across many platforms is a key aspect of newer game engines.
Related work
The problem of visualizing association rules has been considered in a variety of ways in the literature. One family of approaches uses matrix visualization, examples of which are found in the Hofmann and Wilhelm representation, 7 the Quest system, 8 DBMiner 9 and MineSet. 10 A typical approach is to have the rows in the matrix represent antecedent items and the columns represent consequent items of the association rules. Rules between items are indicated by symbols at the corresponding element row-column position in the matrix. Graphical characteristics such as size or colour represent rule interestingness metrics.
Other techniques use a directed graph, usually with the items and rules represented as nodes and edges respectively. In these cases, the rule interestingness metrics are shown by the edge properties like thickness or colour.11,12 Such two-dimensional (2D) graph visualizations are provided by DBMiner as well. 9 The Directed Association Visualization (DAV) system performs similar graph visualizations in three dimensions (3D) to handle larger datasets with a mass-spring engine for graph self-organization.13,14
Furthermore, parallel coordinates may be used with one item on each parallel coordinate and continuous polynomial curves connecting items in a rule in 2D. 15 On the other hand, the TwoKey plot adopts a different paradigm, plotting the rule interestingness measures on the horizontal and vertical axes of a 2D plot, in which each point on the plot represents an association rule. 16 This provides a global overview of the distribution of rules. ARVis adopts a similar spatial interestingness mapping but in 3D, coupling a visualizer with an interface that allows the user to guide the rule generation process through a specific constraint-based rule mining algorithm. 14
Another system uses genetic algorithms (GA) to find association rules that are visualized interactively in 3D. 17 In this GA system, rules are represented by large spheres and data are drawn as smaller spheres, with connecting lines coloured according to the validity of the rules.
In addition to association rule visualization, there is the related topic of visualizing frequent itemsets (groups of items that often occur together). The PowerSetViewer 18 groups frequent itemsets by cardinality on a 2D grid. However, if the number of itemsets exceeds the number of available grid squares, multiple itemsets will be mapped to the same grid square. FIsViz 19 displays frequent k-itemsets as polylines connecting k nodes in 2D space, where each node represents one item in the set. These polylines could cross over each other, creating a possibly confusing visualization. To overcome this, WiFIsViz 20 used 2D orthogonal graphs to minimize polyline crossings and improve scalability for visualizing a large number of rules. Alternative 2D designs that have been proposed for itemset visualization include a pyramid (PyramidViz), 21 hierarchical blocks (FpMapViz) 22 and radial charts (RadialViz 23 and its variants 24 including a hue-saturation-value colour model HSVis 25 ).
In summary, most of the published visualization systems are 2D and not 3D, and hence they do not benefit from the additional spatial/depth perception cues that are provided by a 3D system. Such 3D graphs can provide improved intelligibility over their 2D equivalents. 26 Line crossings and overlap of points at the same 2D location may also occur. 19 These can be disentangled and disambiguated more effectively in 3D. Among the 3D systems available, they are often not supported by real-time, performance-optimized physics models of the same capability as the Unity game engine, which add another dimension of realism to our force-directed graphs. For example, the 3D DAV system depicts the directed graph mapped to a visual spherical surface. Our system does not use such topological assumptions and the graph can occupy the 3D space in any topology.
Moreover, in the 3D GA system, the rules and items are both represented as spheres (of different sizes).
17
The 3D ARVis system has each sphere or cone to represent one
Furthermore, a comprehensive review of visualization methods for association rule mining was published fairly recently.
1
This review identified seven main visualization methods for association rules: scatter plot, two-key plot, graph-based, matrix-based, grouped matrix-based, mosaic plot and double decker plot. In addition, there were several new visualization methods covered, namely the Ishikawa diagram, molecular representation, concept lattice, metro map, Sankey diagram, glyph-based and ribbon plot. The reviewed graph-based techniques used one set of vertices to represent items, and another set of vertices to represent association rules. In this regard, our approach in using 3D graph vertices exclusively to represent items and
Methods
Dataset
In an effort to support people affected by cancer, a large charity with support from the government has created one of the best examples of integrated health and social care for cancer patients. Shortly after diagnosis, people with cancer are invited to take part in a person-centred conversation, where they will complete a Holistic Needs Assessment (HNA) with a support worker to identify their concerns, which may include physical, emotional, social, financial, family, spiritual and practical aspects. Through supportive conversation, and depending on the level of concern, appropriate action is agreed including discussing, providing information, advising, signposting or referral onwards to relevant agencies. This HNA and care planning process has resulted in a clinically meaningful and statistically significant improvement in health-related quality of life, as evidenced by EQ-5D scores. 31
Since the launch of the service, a unique and comprehensive individual-based dataset has been amassed incorporating both health and social care data including demographic information, primary cancers and comorbidities, results from the HNA and quality of life scores. There is also data on the number and type of referrals made or actions taken and feedback from service users on the overall service. In tandem, there is now a need to perform more advanced data analysis in order to derive value and extract actionable insights from this existing dataset. 32 This analysis has primarily been done using association rule mining of anonymized data to form association rule sets revealing broad trends in the dataset. There are 19 columns/features we are considering in our data, covering Assessment ID, Organization, Organization type, Region, Action name, Action type, Concern name, Concern/Information, Patient/Clinician, Age, Sex, Setting, Assessment type, Language, Condition, Diagnosis, Pathway stage, Stratified follow up and Condition management. The only modification of the raw data was the bucketization/binning of the age of individuals.
Data mining
In our association rule mining, we utilize the Apriori algorithm33,34 implemented in the open-source library MLxtend 35 (including its transaction encoder) with support functions from scikit-learn. 36 We only use data of assessments for which complete information is available in all 18 columns/features under consideration, which amounts to 73,959 assessments. (We use the term ‘assessments’ because the dataset has been anonymized. In most cases there is a unique relationship between each HNA assessment and an individual, but there are a small number of individuals who have undergone more than one HNA assessment.)
We consider each assessment as a ‘basket’ of ‘items’ (in our context, these ‘items’ include demographic traits, medical diagnosis codes, needs and actions). The ‘support’ of an item is the proportion of baskets that contain that item (the number of baskets containing that item divided by the total number of baskets). The Apriori algorithm discovers association rules by solving a two-step problem decomposition. It first finds all sets of items (itemsets) that have support above a minimum threshold. These itemsets with sufficient support are called ‘large’, and all other itemsets are ‘small’. For every large itemset L, all non-empty subsets are found. For every such subset A, an association rule of the form A→(L-A) is output if support(L)/support(A) exceeds a minimum ‘confidence’ threshold.
By association rule mining, we are able to identify items that correlate with each other. As a simple illustration of an association rule found by the algorithm, in an [antecedent → consequent] format, [Disability → Mobility issues], [25 to 49 years (age) →Computer literate], [Loneliness or isolation → Sadness or depression]. Generally, for a rule A → B, the main metrics that are used are ‘support’, ‘confidence’ and ‘lift’, which estimate the probabilities Pr(A and B), Pr(B|A) and Pr(A and B)/[Pr(A)Pr(B)] respectively. 37
In this application, the algorithm has been able to uncover nuanced and at times surprising associations, which are covered in more detail in the Results and Discussion section. Nevertheless, it is important to keep in mind that some of the resulting association rules can be spurious. As an additional assessment of the validity of the association rules, statistical significance testing was carried out to find the
Graph query mechanics and visualization
As mentioned previously, association rule sets can consist of a large number of detailed and often overlapping rules which are difficult to summarize or communicate. Flexible interactive data visualization, with real-time exploration, has been shown to be effective in facilitating communication around complex data to a diverse range of stakeholders38,39 and so we draw on existing graph visualization for this. 40 We begin by explaining the application we created for visualization. Thereafter, a series of case studies are used to illustrate system functioning and thematic analysis of user testing workshops are performed for system evaluation.
In our association rule visualization (Figure 1), antecedents and consequents are represented as nodes on the graph, and the nodes are linked by edges coloured according to the strength of the association rule (red is strong and blue is weak). This enables the user to both obtain a general understanding of the overall structure of the rules and focus in on regions of interest. The user is given the option of colouring the edges based on support, confidence, lift or leverage. Furthermore, the graph can be pruned dynamically (using a user-controlled slider) to only display nodes within a certain range of support/confidence/lift/leverage or within a certain degree of connectivity with other nodes. In addition, the app gives users the option of visualizing multiple association rules with logical AND/OR/NOT operations on their antecedents/consequents.

This figure is a screenshot of the association rule mining visualization tool, built with the Unity game engine, with PABC (Persons Affected by Cancer) mode active and centred around the ‘Clinic’ node in this example.
The OR function visualizes all nodes with a first-degree connection to at least one of the selected nodes. The AND function only visualizes nodes that are connected to each and every one of the selected nodes. The NOT function shows which nodes are not connected to any of the selected nodes.
In the specific examples given in this paper, some initial filtering of the association rules was done before visualization, to focus on one-to-one rules (one antecedent and one consequent) with support ≥0.05. These thresholds are of course flexible and can be adjusted to suit the dataset and user requirements, subject to computational resources available.
There are two visualization modes that are available to the user: the Systems view that starts with a full view of all the nodes and connections, and a ‘PABC’ mode. This refers to ‘Person Affected by Cancer’ mode and it refers to a visualization mode that only displays nodes that have been directly selected and hides all background information. An example of the PABC mode is in Figure 1. In this mode, the user starts from a blank canvas and can search for and add nodes to build up a visualization of the desired association rules. On the other hand, the Systems view in Figure 2 comes from the other perspective of starting with a visualization of everything and giving the user the chance to strip away and prune the displayed graph as required.

This figure is a typical output from the systems view of the overall graph.
The overall operation of the visualization tool is structured into the underlying graph computation; the Barnes-Hut algorithm for automated graph layout; and the visualization options offered for the graph nodes and edges. For full details of the application design and graph computation, please refer to Appendix E of the Supplemental Material.
Evaluation
In addition to playing a role in the requirements capture phase of the project, a ‘design driven’ approach to the evaluation of the tool was used to determine the extent to which functionality and system behaviour was meeting the needs and expectations of users. The evaluation took the form of in-person and online workshops designed to be participatory, which is an effective approach to facilitate productive creative collaboration in many contexts, including health and care practice and policy innovation. 41 The engagements involved a combination of structured, goal focused interaction with the tool and group discussion relating to aspects of functionality and future opportunities.
The user experience workshops (three online workshops between July to August 2022 and one in person workshop in September 2023) were devised to introduce and demonstrate the system to gather feedback. The first three workshops were conducted online (using Microsoft Teams as the communication tool and Miro boards as the collaborative workshop tool, accessible for all participants) with numbers between 4 and 6 per workshop to curate responses while keeping to a time limit of an hour each. In addition to helping gain access to the participants, Miro has proven to be an effective and easy to use tool in related contexts where there was a need to facilitate online participation, engagement and ‘storytelling’ 42 and was equally useful in this case. The first two workshops had participants from similar backgrounds as health and social care professionals, and these were considered as ‘Group 1’ and ‘Group 2’. These professionals conduct HNAs and formulate care plans for persons affected by cancer as part of their day-to-day work, or line manage the staff who do so. As a result, these staff are collectively responsible for producing the nationwide dataset used in the data mining. By enabling them to visualize and interrogate their own data, they are empowered to better understand the performance of their care services and make improvements. The third workshop had staff from organizational information technology (IT) departments and were denoted as ‘Group 3’. It is these IT staff who would be responsible for the regular maintenance of the system (and tech support) if it was deployed in routine service, hence their input and understanding of the system was equally important. The format was online as there would be team members calling from all over the country, and some Covid pandemic related constraints were still in force. The online web board collaborative system ‘Miro’ was used as it is an online whiteboard that is good for conveying and collecting information/views and ideas in a collaborative setting where each user can add feedback interactively.
The workshop format was to make an online board with Miro and explain and show three examples of the tools’ usage allowing for both immediate verbal and written feedback. The following examples of tool use were presented:
Query 1 – Selecting a cancer type node
The first example (Figure 3) shows the working of the dropdown systems with the example of selecting the ‘Prostate Cancer’ node to see the effect that has on the full graph and then again with the graph in PABC mode. This example uses the main dropdown that contains all Node names.

Selecting a cancer type node (prostate cancer).
Query 2 – Selecting more than one node
This example (Figure 4) shows users one of the ways to select multiple nodes and how that can be useful. Here the selections are first the Concern ‘Sleep problems’ and then the Demographic ‘25 to 49 Years Old’. They have 11 and 12 first-degree connections respectively and when compared using the AND function they have 11 in common. One of those is the ‘Female’ demographic node.

Multiple node selection.
Query 3 – Filtering edges by association strength.
Figure 5 shows the user the ability to filter the graph by using the filter sliders of the adjustment sidebar to change the rendering of a selected section of the graph. This filtering allows the user to find Edges that meet the criteria set (confidence, support or lift).

Filtering edges by association strength (denoted by colours of the graph edges).
The three examples were shown through videos taken earlier to help explain tool operation and resulting visualizations. After each video is shown, each participant was asked about how they felt about what they have seen in the context of providing value and usefulness for the organization.
At the end of all three examples, participants were offered the ability to add any more feedback that they can think of to the Miro board that would remain live and accessible after the event.
An example of a Miro board from one of the workshops is shown in Appendix C of the Supplemental Material. Appendix C also includes four videos of the tool corresponding to workshop Query 1 – Full Graph, Query 1 – PABC mode, Query 2 – Multiple selection and Query 3 – filtering slider.
Thematic analysis and usability study
To gain further understanding of what the participants of the workshops want to see in a data visualization programme, a thematic analysis of the transcripts of the workshops was undertaken using a standard procedure. 43 These steps allow for qualitative analysis of the comments made by the participants of the workshops. As the workshops were done through Microsoft Teams, recordings were made of each one, with participant consent. Then using the built-in tools for video management, a transcript was auto-generated by speech-to-text software before being manually checked for errors. The transcript was used along with the videos to review the comments made. Time codes were used to point to areas where the participants were speaking, discussing their sticky note responses on the board and giving feedback.
Towards the conclusion of this phase of the project, the opportunity arose for a final in-person workshop (‘Group 4’), taking advantage of the lifting of almost all Covid-related restrictions in 2023. This workshop was conducted in-person with five health and social care professionals, using the same three query examples for the online workshop. However, instead of the query examples just being shown to the participants as in the online workshops, they now had the opportunity to try it for themselves on their own laptops. Moreover, time and space were given for freeform use of the tool by participants, along with goal-driven activity – using the tool to explore areas relevant to their domain. This conversation was not recorded, but the Miro board was used as a common ‘whiteboard’ screen at the workshop with virtual post-it notes of comments. A System Usability Scale (SUS) survey 44 was also carried at the end of the event. The SUS is a widely used methodology consisting of 10 questions on a scale of 1 to 5 for each question, and participants tick one option per question. As an additional goal, this workshop was a validation of the compatibility of the web-based interactive visualization, which ran smoothly on all the participants’ employer-issued machines without any intervention, modifications or admin rights required.
Results and discussion
Thematic analysis of online workshops
Using the overarching research question: ‘Is the developed information visualization tool of value to and usable by workers?’ the thematic analysis was created based on a standard published approach 43 where the transcripts and videos of the workshops are reviewed, and codes are identified and reviewed to compile and form common themes and subthemes. The results are presented in Table 1 with a treemap summary in Figure 6 and detailed codes in Appendix A of the Supplemental Material.
Core themes and subthemes after thematic analysis of the workshops.

Treemap summary of thematic analysis themes and subthemes. The area of the rectangles is proportional to the frequency of codes under each subtheme.
The three major themes that are apparent are Patient Experience, Visualization and Usability. These make sense based on the questions asked at the workshop where the focus during the body of the workshop was on Observations, Insight and Actions (Meaning what they would want next.)
Patient experience
Current practices
These codes refer to the existing ways that the participants use systems to help patients. There were clear thoughts that the current systems could be improved after seeing the new data visualization options. ‘Identification of need in improving current practice’, ‘Outdated information’ and ‘Time consuming, frustrating’ all reinforce this. There were references in the more patient focused participants to a specific service that needed improvement that was ‘Struggle with arranging transport for patients’. This issue is not one that this project could help alleviate. The most mentioned code in this subtheme was ‘Patient concern’ which was raised by all three workshops. Given that the main task of the tool is to support the identification of patient concerns and associated actions, it was good that the tool prompted discussion around patient concerns and linked it to person-guided support.
Complementary to person-guided support
The codes under this subtheme all relate to the idea of the tool being a useful addition and adding value to the organization without being seen as a replacement for contact with patients. The tool was never conceived with the idea of replacing any contact between patients and carers, but more as an additional way to explore the data set and surface connections.
Risks
This subtheme is focused on the problems and potential risks that leaning too far into abstract software could add when looking at cancer data. All three groups of participants reinforced the danger of ‘Risk of reinforcing bias’. Part of this is more to do with the data set rather than the actual tool since the tool can only display the data it is given. The workshop facilitators did try to give some of the context about the different metrics for displaying the data like Confidence and Lift, but there was only so much time for that in each 1-hour workshop. Another risk themed code was ‘Assumptions disconnected from patient reality’ and ‘Dehumanisation due to digitisation’. These both appeared in the health/social care-focused groups who have a much closer relationship to cancer patients than the tech team. It was very important for them to make sure that this tool did not try to replace the human element of their jobs and that is not our intention.
Visualization
The theme of visualization is important as that is exactly what the graph shows to the user and is the main thing that is being evaluated by the participants.
Challenges with processing the graph information
Like the code ‘Specialist needed to interpret and draw insights from data’, the subtheme of ‘Challenges with processing the graph information’ shows the difficulty that first-time users face with this system. All the codes from this subtheme were mentioned by the care worker groups, who have less technical prowess with information display programmes. ‘Difficult to process’ is the most common code with 10 mentions, with ‘Overwhelming’ and ‘Outside their technical skill set’ behind that.
Fresh perspective
A positive subtheme is the affirmation that this visualization offered a fresh perspective of the data. The codes ‘Novel way of displaying data’, ‘Novel insight’ and ‘Intriguing concept’ all point towards a positive impression of the visualization tool. In particular ‘Novel way of displaying data’ was mentioned by all groups and nine times in total across all workshops. The rest of the fresh perspective codes however only come from the workshop 1 and 2. The codes ‘Clearer understanding’ and ‘New consideration for current practices’ point to the graph visualization being useful for the staff in giving them fresh ideas about the associations between pieces of data that they have been working with for a long time. This particular subtheme was the motivation for the project from the outset.
Usability
Usability is a theme with both positive and negative subthemes throughout. Accessibility of data and potential value had some very positive subthemes, while the ‘concerns’ subtheme collected a lot of the issues raised by workshop participants.
Accessibility of data
The subtheme ‘Accessibility of data’ contained very positive codes. The most mentioned code was ‘shared resources’ with nine mentions by members from workshops 1 and 2. This referred to internal systems that the participants of those workshops had problems with and had hopes that technologies like the visualization would be able to help with. ‘Efficient access to information’ was mentioned five times, again by the participants from workshops 1 and 2. This showed that the tool was showing use and working well at displaying information. The code ‘Filter Information’ was mentioned by both demographic groups and was the only code under this subtheme mentioned by the participants of workshop group 3. As a group that has a more technical focus it was good to see them pick up on that aspect and ask questions about the different ways that the tool offered to filter information in the tool. The rest of the codes are all positive ideas that were mentioned by the participants. ‘Adjustable specificity of query’ and ‘Filling gaps in information’ show more ideas that the participants had about where the tool could be upgraded further and development paths that they recognized as useful for them.
Potential value
The codes within ‘Potential value’ refer to areas where the tool could add value currently or in the future with additional changes being made. The first code ‘See potential for further applications’ is a very positive response that was recorded. It was mentioned six times by workshop 1 and 2. It is good that the team that was less tech-focused was able to see the potential value of the tool. ‘Area tailored support’ was mentioned and this would potentially be a key change to make the tool more useful at a local level rather than with a nationwide data set that loses some of the ground-level connections. ‘Uncertain of value in current limited form’ was a code that was mentioned four times by Groups 1 and 2 and mentioned once by Group 3. This code showed that the participants recognized the tool’s potential but did not think that it was providing full value to their current roles. A code that gave insight was ‘Time needed to consider further use’ with participants wanting more time to get to grips with the ideas that the tool was presenting to further understand what this technology could do. The final two codes ‘Automation: self-service booking systems for patients’ and ‘Support financial applications’ refer more to features that the participants of workshops 1 and 2 wanted in their workplaces but were not ones that would have implementation space in this version of the tool.
Concerns
The subtheme of ‘Concerns’ had some interesting insights into what problems the participants immediately noticed and thought about when looking at the tool. ‘Concern regarding data relevance’ was mentioned four times and all by workshop 3, as a team that has experience with looking at data sets from organization, it was interesting to hear that they thought that the data did not have relevance and saw flaws in that. These thoughts are possibly connected to the other codes that were mentioned by that group ‘Uncertainty of practical application’ and ‘Purpose unclear’. They did not see the tool as a practical way to look at the data and thought that the purpose was unclear. From the other workshop groups, the codes ‘Uncertain of personal usability’ and ‘More context needed’ reveal that they did not think that the tool would currently be a good fit for their current role, and they needed more information on what they were looking at and how it fits together.
System Usability Scale (SUS) survey from in-person workshop
The System Usability Scale (SUS) survey 44 of the in-person workshop (Group 4) yielded the results shown in Table 2. In this table, the SUS scores are expressed in terms of their quartile ranking. 45 It can be seen that the mean SUS score is in the top (best) quartile, Q4. This is corroborated by the many positive participant comments that while the visualization tool was initially daunting, after trying it out, it was actually easy to use. Participants also saw the potential of the tool to allow for service planning, to complement their existing eHNA dashboards which are very outcomes focused.
System Usability Scale (SUS) survey results from the in-person workshop (Group 4).
Association rule mining as a recommendation engine
The input datasets had significant class imbalances that made it challenging to train traditional artificial intelligence algorithms on it. Although specialized data processing and AI algorithms exist for such imbalanced data, we found that association rule mining provided a fairly simple way to understand and deal with these imbalances. For example, if the ‘confidence’ metric was used to find actions with the highest confidence of association with a given concern, ‘Discussed concern, advice given’ would be the only answer – a trivial result. On the other hand, by finding the associated actions with the highest ‘lift’, meaningful actions are obtained for each concern, as shown in Table 3. Lift corrects for the relative abundance of a consequent in the data, and we found this to be a powerful tool for prioritizing the most meaningful association rules. From a practical point of view, these actions with the highest lift can form the basis of a personalized recommendation to staff or persons affected by cancer in response to their specific concerns. A specific example with more in-depth calculations is provided in Appendix D of the Supplemental Material.
The most frequent concerns with actions in the eHNA database, together with their associated action with the highest confidence and lift.
To check the statistical significance of our association rules, we generated contingency tables based on the rules and used Fisher’s Exact Test to find the
a: number of HNA assessments with both X and Y.
b: number of HNA assessments with X and without Y.
c: number of HNA assessments without X and with Y.
d: number of HNA assessments without X and without Y.
The results on statistical testing of the association rules are reported in Appendix B of the Supplemental Material. A Bonferroni correction is applied to the raw 0.05 significance level to account for the size of the search space.
46
In this case, the rules are of the form concern → action. There are 135 possible concerns and 193 possible actions, giving a search space of 135 × 193 = 26,055. Therefore, the corrected significance level is (0.05/26,055)=1.919E-6, and
Conclusion
The overall ambition of this work was to determine the utility of a games engine to support decision making through ‘playable mechanics’ and visual analytics to provide the relevant data to Health and Care professionals. Towards this, an information visualization tool that interactively and dynamically presents association rule mined data was codesigned, developed using Unity and subsequently evaluated.
The objectives of the research were:
To develop a visualization prototype able to depict electronic holistic needs assessment (eHNA) data in the form of an interactive 3D graph.
Work with stakeholders to design and implement key interactions to filter and navigate the data.
Determine if the interactive visualization, developed using a games engine, is of value to and usable by health and social care professionals.
Objective 1
A 3D Force Directed Node Graph (FNG) was created using the Unity game engine. It used the mined eHNA dataset to create a graph showing the connections between pairs of nodes and allowing users to interact with it using a mouse and/or keyboard.
The Unity game engine allowed for multiple deployment methods with a shift from a Windows executable format to a web-hosted system using Microsoft Azure. This allowed for an online version of the tool to be hosted on servers and accessible using a browser link.
The performance of the tool was enhanced algorithmically with the Barnes-Hut approximation used to approximate the physics of the force-directed network graph and achieve much better performance in terms of both the frame rate and of the amount of CPU usage.
Objective 2
Multiple sessions with stakeholders allowed for a range of features to be designed and added for filtering and exploring the graph in multiple ways. These stakeholders included frontline health and social care professionals whose services directly feed into the source datasets (because they and their patients fill up the forms that are captured in the HNA database), along with their managers and IT department staff.
A user interface (UI) layer with sidebars allowed screen space to be maximized while hiding multiple features when not in use. Features like category selection through dropdown menus were implemented to make the node selection easier. Multiple display filters were created for focusing on data by connection degree number, or by the colour mapping system. A grammar filter was added to perform AND/OR/NOT selection operations on subsets of the graph. Finally, an alternate ‘PABC’ rendering mode was developed that shifts focus away from the full graph to make evaluating local graph regions clear.
Objective 3
To evaluate this tool, online user workshops were repeated with three groups of participants from different departments who were able to see guided demos of the tool’s function and report thoughts and feedback verbally and in written form on an interactive Miro web board. A thematic analysis of the conversations during the workshop was undertaken to reaffirm the ideas and thoughts of the participants. This was followed up by an in-person workshop with the same overall structure as the online workshops but giving participants the opportunity to try the tool for themselves on their own laptops. A System Usability Scale (SUS) survey was conducted at its conclusion, with an excellent mean score of 84, placing it in the top quartile of studies conducted with the same scale.
45
One possible use case is for finding or recommending actions to take in response to particular concerns expressed by persons affected by cancer. In this regard, sorting by the ‘lift’ metric was found to be more useful than ‘confidence’ because it corrects for class imbalances in the dataset. The association rules can also be tested for statistical significance by building contingency tables and applying Fisher’s Exact Test. The rules with the highest lift often have
Future work
We will seek to expand the web and cloud service deployment of our tools, moving from a prototype application towards a system that is fully embedded in the operations and services of our collaborators.
Very early in the project we experimented with a Virtual Reality (VR) interface for using the tool. It used the Oculus Rift headset and handheld controllers to explore the graph in a VR environment. This was another advantage of using the Unity game engine – it had support for a huge range of control systems. However, the VR Version required a whole new interface to interact with the different filtering systems. Instead of a mouse cursor, the handheld controllers allow the user to point at Nodes in the graph with their hands and see what information was revealed. However, the VR mode, especially with the advent of passthrough allows new interaction modes with the 3D dataset. This is a potential direction for future work that can build on the powerful capabilities of the underlying game engine. 47
In tandem with this push towards 3D and VR technologies, the computational cost of the system also tends to increase. This is part of the overall trade-off between rich graphics and computational cost. To mitigate the computational burden, various algorithmic approaches can be implemented, including clustered force-directed graphs.48–50 These could have the added benefit of an improved 3D layout display. By sorting the nodes into clusters, a distinct large-scale structure could be imposed on the graph which could aid its overall interpretability. Furthermore, in future work, we would like to leverage Unity’s new data-oriented technology stack (DOTS) to scale processing in a highly optimal manner. For example, LLVM and Burst compiler technology can be used to achieve near-native code performance from C#.
This work contributes to the fields of data visualization and visual analytics and shows that games engines can be used effectively and with much customization and flexibility for 3D visualization of network/graph data. The innovations presented in the paper translated typical graph visualization and interaction modes – selection, exploration, pruning, connecting and filtering 51 into the 3D game engine space. Further, the real-time interaction aids the interpretability of complex graphs. The raw statistical output of associative rule mining was transformed into interactive representations that can be filtered based on confidence, lift etc and this offered great flexibility with respect to data visualization and interaction modes. This highlights the potential for game engines as a key tool for emerging visual analytic approaches for exploring association relationships in graph data 52 as well as being able to offload computation to GPU via hardware acceleration and compute shaders.
Finally, the visualization techniques pioneered here can be translated to almost any graph or network data, such as node-based programming. We intend to explore more opportunities for expanding this technology to new fields with similar visualization challenges.
Supplemental Material
sj-docx-1-ivi-10.1177_14738716241309243 – Supplemental material for Creating informative experiences through a visual and interactive representation of health and social care data
Supplemental material, sj-docx-1-ivi-10.1177_14738716241309243 for Creating informative experiences through a visual and interactive representation of health and social care data by Kean Lee Kang, Adam Hastings, Alex Danielle Hughes, Karolina Myszkowska, Margaret Greer, Janice Preston, Don McIntyre, Janette Hughes, Kara Mackenzie, James Bown and Ruth Falconer in Information Visualization
Footnotes
Funding
Ethics
Data statement
Supplemental material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
