Sage Journals: Discover world-class research

Abstract

Association rule mining is an established machine learning tool for finding patterns (‘rules’) in big datasets. The algorithm can easily produce a large number of ‘rules’ of how items in a dataset are related to one another. This can pose a significant challenge to the interpretability and usefulness of the results obtained. In this paper, we present how to support decision making through ‘playable mechanics’ powered by a video games engine. The premise is to create aesthetic and informative experiences through a visual and interactive representation of a problem space such as the association rules mined from a health and social care dataset. The Unity game engine is used to create a force-directed graph to be rendered and explored in real-time using the Barnes-Hut method to accelerate computations. A Boolean AND/OR/NOT selection function was implemented, enabling the graph to be explored and pruned to the data points specified by the search query. As a result, users can obtain an overview of the large-scale structure of the dataset, with the option of performing targeted explorations around points of interest. To evaluate the effectiveness of the application, a series of online user testing workshops were conducted. The resultant thematic analysis found the incorporated features to be well-integrated, but a difference was found between the responses of users with high or low technical proficiency within the commissioning organization. The technical users were able to quickly grasp the operation of the system but were unclear about its purpose or practical application. Conversely, the health and social care professionals saw the potential value of the tool but were unsure of their personal ability to use it effectively. Finally, System Usability Scale (SUS) scores were obtained from participants in a final in-person workshop, with excellent results overall (mean 84, top quartile).

Keywords

Data exploration dynamic visualization exploratory visualization force-directed layout graph visualization interactive visualization usability web visualization

Introduction

Association rule mining is one of data mining’s greatest success stories,³⁷ with proven utility in finding associations among items in market basket analysis, identifying which items are commonly bought together and offering retailers insight into optimal product placement, and combinations of products to bundle together on offer. These algorithms have subsequently found applications across a wide variety of sectors, including health and social care. However, with thousands or millions of rules generated by association rule mining of data, the processing of those rules becomes a data analytics challenge in and of itself.¹ To enable an intuitive exploration of the structure of large rule databases, we have turned to the Unity game engine² to produce 3D interactive force-directed graphs³ of the association rule set for a health and social care dataset. The ‘playable’ mechanics include exploring the data in either a ‘whole systems’ view or building up a picture of needs and concerns focused on key input attributes. Additionally, the selection mechanics are underpinned by Boolean operations that allow selection, pruning and expansion of the dataset.

The use of game technologies to promote an understanding of systems to aid planning and understand complex systems is gaining momentum based on the key benefits such as customization, interactivity, aesthetics and optimization.⁴ These benefits lead to interactive visualizations of playable simulations affording users the opportunity to explore, engage and extract meaning from the data set.⁵ Game engines allow rapid prototyping permitting a powerful way to customize data presentation via programming of shaders.⁶ Furthermore, portability and streamlined deployment across many platforms is a key aspect of newer game engines.

Related work

The problem of visualizing association rules has been considered in a variety of ways in the literature. One family of approaches uses matrix visualization, examples of which are found in the Hofmann and Wilhelm representation,⁷ the Quest system,⁸ DBMiner⁹ and MineSet.¹⁰ A typical approach is to have the rows in the matrix represent antecedent items and the columns represent consequent items of the association rules. Rules between items are indicated by symbols at the corresponding element row-column position in the matrix. Graphical characteristics such as size or colour represent rule interestingness metrics.

Other techniques use a directed graph, usually with the items and rules represented as nodes and edges respectively. In these cases, the rule interestingness metrics are shown by the edge properties like thickness or colour.^11,12 Such two-dimensional (2D) graph visualizations are provided by DBMiner as well.⁹ The Directed Association Visualization (DAV) system performs similar graph visualizations in three dimensions (3D) to handle larger datasets with a mass-spring engine for graph self-organization.^13,14

Furthermore, parallel coordinates may be used with one item on each parallel coordinate and continuous polynomial curves connecting items in a rule in 2D.¹⁵ On the other hand, the TwoKey plot adopts a different paradigm, plotting the rule interestingness measures on the horizontal and vertical axes of a 2D plot, in which each point on the plot represents an association rule.¹⁶ This provides a global overview of the distribution of rules. ARVis adopts a similar spatial interestingness mapping but in 3D, coupling a visualizer with an interface that allows the user to guide the rule generation process through a specific constraint-based rule mining algorithm.¹⁴

Another system uses genetic algorithms (GA) to find association rules that are visualized interactively in 3D.¹⁷ In this GA system, rules are represented by large spheres and data are drawn as smaller spheres, with connecting lines coloured according to the validity of the rules.

In addition to association rule visualization, there is the related topic of visualizing frequent itemsets (groups of items that often occur together). The PowerSetViewer¹⁸ groups frequent itemsets by cardinality on a 2D grid. However, if the number of itemsets exceeds the number of available grid squares, multiple itemsets will be mapped to the same grid square. FIsViz¹⁹ displays frequent k-itemsets as polylines connecting k nodes in 2D space, where each node represents one item in the set. These polylines could cross over each other, creating a possibly confusing visualization. To overcome this, WiFIsViz²⁰ used 2D orthogonal graphs to minimize polyline crossings and improve scalability for visualizing a large number of rules. Alternative 2D designs that have been proposed for itemset visualization include a pyramid (PyramidViz),²¹ hierarchical blocks (FpMapViz)²² and radial charts (RadialViz²³ and its variants²⁴ including a hue-saturation-value colour model HSVis²⁵).

In summary, most of the published visualization systems are 2D and not 3D, and hence they do not benefit from the additional spatial/depth perception cues that are provided by a 3D system. Such 3D graphs can provide improved intelligibility over their 2D equivalents.²⁶ Line crossings and overlap of points at the same 2D location may also occur.¹⁹ These can be disentangled and disambiguated more effectively in 3D. Among the 3D systems available, they are often not supported by real-time, performance-optimized physics models of the same capability as the Unity game engine, which add another dimension of realism to our force-directed graphs. For example, the 3D DAV system depicts the directed graph mapped to a visual spherical surface. Our system does not use such topological assumptions and the graph can occupy the 3D space in any topology.

Moreover, in the 3D GA system, the rules and items are both represented as spheres (of different sizes).¹⁷ The 3D ARVis system has each sphere or cone to represent one rule, while our system has each sphere (node) to represent one item, while the connecting edges of the graph are the rules. This provides a distinctly different emphasis for our visualization in comparison with these existing 3D systems. The use of logical AND/OR/NOT operators for pattern navigation is a further distinguishing feature of our work, as will be elaborated upon shortly.

Furthermore, a comprehensive review of visualization methods for association rule mining was published fairly recently.¹ This review identified seven main visualization methods for association rules: scatter plot, two-key plot, graph-based, matrix-based, grouped matrix-based, mosaic plot and double decker plot. In addition, there were several new visualization methods covered, namely the Ishikawa diagram, molecular representation, concept lattice, metro map, Sankey diagram, glyph-based and ribbon plot. The reviewed graph-based techniques used one set of vertices to represent items, and another set of vertices to represent association rules. In this regard, our approach in using 3D graph vertices exclusively to represent items and edges to represent rules is distinctive. (Note that there are 2D systems such as graphs,²⁷ metro maps^28,29 and Sankey diagrams³⁰ that have items as vertices and rules as edges, but these are not 3D. On the other hand, the molecular representation is 3D but each ‘molecule’ represents one rule. If multiple molecules/rules are shown, the same item(s) could appear repeatedly in different molecules without a systematic way of communicating the overall distribution of association rules.) The review paper also comments that ‘graph-based visualization provides a clear representation of rules but they tend to become cluttered easily, and, thus, are only viable for very small sets of rules’. In our work, we include specialist methods for alleviating the clutter on association rule graphs through force-directed self-organization along with computational optimizations to facilitate scaling to large graphs.

Methods

Dataset

In an effort to support people affected by cancer, a large charity with support from the government has created one of the best examples of integrated health and social care for cancer patients. Shortly after diagnosis, people with cancer are invited to take part in a person-centred conversation, where they will complete a Holistic Needs Assessment (HNA) with a support worker to identify their concerns, which may include physical, emotional, social, financial, family, spiritual and practical aspects. Through supportive conversation, and depending on the level of concern, appropriate action is agreed including discussing, providing information, advising, signposting or referral onwards to relevant agencies. This HNA and care planning process has resulted in a clinically meaningful and statistically significant improvement in health-related quality of life, as evidenced by EQ-5D scores.³¹

Since the launch of the service, a unique and comprehensive individual-based dataset has been amassed incorporating both health and social care data including demographic information, primary cancers and comorbidities, results from the HNA and quality of life scores. There is also data on the number and type of referrals made or actions taken and feedback from service users on the overall service. In tandem, there is now a need to perform more advanced data analysis in order to derive value and extract actionable insights from this existing dataset.³² This analysis has primarily been done using association rule mining of anonymized data to form association rule sets revealing broad trends in the dataset. There are 19 columns/features we are considering in our data, covering Assessment ID, Organization, Organization type, Region, Action name, Action type, Concern name, Concern/Information, Patient/Clinician, Age, Sex, Setting, Assessment type, Language, Condition, Diagnosis, Pathway stage, Stratified follow up and Condition management. The only modification of the raw data was the bucketization/binning of the age of individuals.

Data mining

In our association rule mining, we utilize the Apriori algorithm^33,34 implemented in the open-source library MLxtend³⁵ (including its transaction encoder) with support functions from scikit-learn.³⁶ We only use data of assessments for which complete information is available in all 18 columns/features under consideration, which amounts to 73,959 assessments. (We use the term ‘assessments’ because the dataset has been anonymized. In most cases there is a unique relationship between each HNA assessment and an individual, but there are a small number of individuals who have undergone more than one HNA assessment.)

We consider each assessment as a ‘basket’ of ‘items’ (in our context, these ‘items’ include demographic traits, medical diagnosis codes, needs and actions). The ‘support’ of an item is the proportion of baskets that contain that item (the number of baskets containing that item divided by the total number of baskets). The Apriori algorithm discovers association rules by solving a two-step problem decomposition. It first finds all sets of items (itemsets) that have support above a minimum threshold. These itemsets with sufficient support are called ‘large’, and all other itemsets are ‘small’. For every large itemset L, all non-empty subsets are found. For every such subset A, an association rule of the form A→(L-A) is output if support(L)/support(A) exceeds a minimum ‘confidence’ threshold.

By association rule mining, we are able to identify items that correlate with each other. As a simple illustration of an association rule found by the algorithm, in an [antecedent → consequent] format, [Disability → Mobility issues], [25 to 49 years (age) →Computer literate], [Loneliness or isolation → Sadness or depression]. Generally, for a rule A → B, the main metrics that are used are ‘support’, ‘confidence’ and ‘lift’, which estimate the probabilities Pr(A and B), Pr(B|A) and Pr(A and B)/[Pr(A)Pr(B)] respectively.³⁷

In this application, the algorithm has been able to uncover nuanced and at times surprising associations, which are covered in more detail in the Results and Discussion section. Nevertheless, it is important to keep in mind that some of the resulting association rules can be spurious. As an additional assessment of the validity of the association rules, statistical significance testing was carried out to find the p-values for the association rules.

Graph query mechanics and visualization

As mentioned previously, association rule sets can consist of a large number of detailed and often overlapping rules which are difficult to summarize or communicate. Flexible interactive data visualization, with real-time exploration, has been shown to be effective in facilitating communication around complex data to a diverse range of stakeholders^38,39 and so we draw on existing graph visualization for this.⁴⁰ We begin by explaining the application we created for visualization. Thereafter, a series of case studies are used to illustrate system functioning and thematic analysis of user testing workshops are performed for system evaluation.

In our association rule visualization (Figure 1), antecedents and consequents are represented as nodes on the graph, and the nodes are linked by edges coloured according to the strength of the association rule (red is strong and blue is weak). This enables the user to both obtain a general understanding of the overall structure of the rules and focus in on regions of interest. The user is given the option of colouring the edges based on support, confidence, lift or leverage. Furthermore, the graph can be pruned dynamically (using a user-controlled slider) to only display nodes within a certain range of support/confidence/lift/leverage or within a certain degree of connectivity with other nodes. In addition, the app gives users the option of visualizing multiple association rules with logical AND/OR/NOT operations on their antecedents/consequents.

Figure 1.

This figure is a screenshot of the association rule mining visualization tool, built with the Unity game engine, with PABC (Persons Affected by Cancer) mode active and centred around the ‘Clinic’ node in this example.

The OR function visualizes all nodes with a first-degree connection to at least one of the selected nodes. The AND function only visualizes nodes that are connected to each and every one of the selected nodes. The NOT function shows which nodes are not connected to any of the selected nodes.

In the specific examples given in this paper, some initial filtering of the association rules was done before visualization, to focus on one-to-one rules (one antecedent and one consequent) with support ≥0.05. These thresholds are of course flexible and can be adjusted to suit the dataset and user requirements, subject to computational resources available.

There are two visualization modes that are available to the user: the Systems view that starts with a full view of all the nodes and connections, and a ‘PABC’ mode. This refers to ‘Person Affected by Cancer’ mode and it refers to a visualization mode that only displays nodes that have been directly selected and hides all background information. An example of the PABC mode is in Figure 1. In this mode, the user starts from a blank canvas and can search for and add nodes to build up a visualization of the desired association rules. On the other hand, the Systems view in Figure 2 comes from the other perspective of starting with a visualization of everything and giving the user the chance to strip away and prune the displayed graph as required.

Figure 2.

This figure is a typical output from the systems view of the overall graph.

The overall operation of the visualization tool is structured into the underlying graph computation; the Barnes-Hut algorithm for automated graph layout; and the visualization options offered for the graph nodes and edges. For full details of the application design and graph computation, please refer to Appendix E of the Supplemental Material.

Evaluation

In addition to playing a role in the requirements capture phase of the project, a ‘design driven’ approach to the evaluation of the tool was used to determine the extent to which functionality and system behaviour was meeting the needs and expectations of users. The evaluation took the form of in-person and online workshops designed to be participatory, which is an effective approach to facilitate productive creative collaboration in many contexts, including health and care practice and policy innovation.⁴¹ The engagements involved a combination of structured, goal focused interaction with the tool and group discussion relating to aspects of functionality and future opportunities.

The user experience workshops (three online workshops between July to August 2022 and one in person workshop in September 2023) were devised to introduce and demonstrate the system to gather feedback. The first three workshops were conducted online (using Microsoft Teams as the communication tool and Miro boards as the collaborative workshop tool, accessible for all participants) with numbers between 4 and 6 per workshop to curate responses while keeping to a time limit of an hour each. In addition to helping gain access to the participants, Miro has proven to be an effective and easy to use tool in related contexts where there was a need to facilitate online participation, engagement and ‘storytelling’⁴² and was equally useful in this case. The first two workshops had participants from similar backgrounds as health and social care professionals, and these were considered as ‘Group 1’ and ‘Group 2’. These professionals conduct HNAs and formulate care plans for persons affected by cancer as part of their day-to-day work, or line manage the staff who do so. As a result, these staff are collectively responsible for producing the nationwide dataset used in the data mining. By enabling them to visualize and interrogate their own data, they are empowered to better understand the performance of their care services and make improvements. The third workshop had staff from organizational information technology (IT) departments and were denoted as ‘Group 3’. It is these IT staff who would be responsible for the regular maintenance of the system (and tech support) if it was deployed in routine service, hence their input and understanding of the system was equally important. The format was online as there would be team members calling from all over the country, and some Covid pandemic related constraints were still in force. The online web board collaborative system ‘Miro’ was used as it is an online whiteboard that is good for conveying and collecting information/views and ideas in a collaborative setting where each user can add feedback interactively.

The workshop format was to make an online board with Miro and explain and show three examples of the tools’ usage allowing for both immediate verbal and written feedback. The following examples of tool use were presented:

Query 1 – Selecting a cancer type node

The first example (Figure 3) shows the working of the dropdown systems with the example of selecting the ‘Prostate Cancer’ node to see the effect that has on the full graph and then again with the graph in PABC mode. This example uses the main dropdown that contains all Node names.

Figure 3.

Selecting a cancer type node (prostate cancer).

Query 2 – Selecting more than one node

This example (Figure 4) shows users one of the ways to select multiple nodes and how that can be useful. Here the selections are first the Concern ‘Sleep problems’ and then the Demographic ‘25 to 49 Years Old’. They have 11 and 12 first-degree connections respectively and when compared using the AND function they have 11 in common. One of those is the ‘Female’ demographic node.

Figure 4.

Multiple node selection.

Query 3 – Filtering edges by association strength.

Figure 5 shows the user the ability to filter the graph by using the filter sliders of the adjustment sidebar to change the rendering of a selected section of the graph. This filtering allows the user to find Edges that meet the criteria set (confidence, support or lift).

Figure 5.

Filtering edges by association strength (denoted by colours of the graph edges).

The three examples were shown through videos taken earlier to help explain tool operation and resulting visualizations. After each video is shown, each participant was asked about how they felt about what they have seen in the context of providing value and usefulness for the organization.

At the end of all three examples, participants were offered the ability to add any more feedback that they can think of to the Miro board that would remain live and accessible after the event.

An example of a Miro board from one of the workshops is shown in Appendix C of the Supplemental Material. Appendix C also includes four videos of the tool corresponding to workshop Query 1 – Full Graph, Query 1 – PABC mode, Query 2 – Multiple selection and Query 3 – filtering slider.

Thematic analysis and usability study

To gain further understanding of what the participants of the workshops want to see in a data visualization programme, a thematic analysis of the transcripts of the workshops was undertaken using a standard procedure.⁴³ These steps allow for qualitative analysis of the comments made by the participants of the workshops. As the workshops were done through Microsoft Teams, recordings were made of each one, with participant consent. Then using the built-in tools for video management, a transcript was auto-generated by speech-to-text software before being manually checked for errors. The transcript was used along with the videos to review the comments made. Time codes were used to point to areas where the participants were speaking, discussing their sticky note responses on the board and giving feedback.

Towards the conclusion of this phase of the project, the opportunity arose for a final in-person workshop (‘Group 4’), taking advantage of the lifting of almost all Covid-related restrictions in 2023. This workshop was conducted in-person with five health and social care professionals, using the same three query examples for the online workshop. However, instead of the query examples just being shown to the participants as in the online workshops, they now had the opportunity to try it for themselves on their own laptops. Moreover, time and space were given for freeform use of the tool by participants, along with goal-driven activity – using the tool to explore areas relevant to their domain. This conversation was not recorded, but the Miro board was used as a common ‘whiteboard’ screen at the workshop with virtual post-it notes of comments. A System Usability Scale (SUS) survey⁴⁴ was also carried at the end of the event. The SUS is a widely used methodology consisting of 10 questions on a scale of 1 to 5 for each question, and participants tick one option per question. As an additional goal, this workshop was a validation of the compatibility of the web-based interactive visualization, which ran smoothly on all the participants’ employer-issued machines without any intervention, modifications or admin rights required.

Results and discussion

Thematic analysis of online workshops

Using the overarching research question: ‘Is the developed information visualization tool of value to and usable by workers?’ the thematic analysis was created based on a standard published approach⁴³ where the transcripts and videos of the workshops are reviewed, and codes are identified and reviewed to compile and form common themes and subthemes. The results are presented in Table 1 with a treemap summary in Figure 6 and detailed codes in Appendix A of the Supplemental Material.

Table 1.

Core themes and subthemes after thematic analysis of the workshops.

Theme: Patient experience	Theme: Visualization	Theme: Usability
Subtheme: Current practices Patient concern,Identification of need in improving current practice,Outdated information,Time management,Struggle with arranging transport for patients,Human interaction,Time consuming, frustrating.	Subtheme: Challenges with processing the graph information Difficult to process,Overwhelming,Outside their technical skill set,Clarification of visual graph,Intimidating format,Unfamiliar format,Learning curve,Abstract/vague.	Subtheme: Accessibility of data Shared resources,Efficient access to information,Filter information,Adjustable specificity of query,Filling gaps in information,Parallel to experience with similar systems,Identify trends in data,Dynamically updated.
Subtheme: Complementary to person-guided support Value of human interaction,Support connection between service and patient,Tool to supplement not replace human factor,Potential for improving accessibility to patient services,Specialist needed to interpret and draw insights from data,Supplement information received from patient.	Subtheme: Fresh perspective Novel way of displaying data,Novel insight,Intriguing concept,Clearer understanding,New consideration for current practices,Hard to put into own context,Good visual.	Subtheme: Potential value See potential for further applications,Uncertain of value in current limited form,Time needed to consider further use,Area tailored support,Automation: self-service booking systems for patients,Support financial applications.
Subtheme: Risks Risk of reinforcing bias,Increased risk of bias,Assumptions disconnected from patient reality,Dehumanization due to digitization.		Subtheme: Concerns Uncertain of personal usability,Concern regarding data relevance,More context needed,Uncertainty of practical application,Limitation in query selection,Missing details in visual graph,Purpose unclear,Primary target audience

Figure 6.

Treemap summary of thematic analysis themes and subthemes. The area of the rectangles is proportional to the frequency of codes under each subtheme.

The three major themes that are apparent are Patient Experience, Visualization and Usability. These make sense based on the questions asked at the workshop where the focus during the body of the workshop was on Observations, Insight and Actions (Meaning what they would want next.)

Patient experience

Current practices

These codes refer to the existing ways that the participants use systems to help patients. There were clear thoughts that the current systems could be improved after seeing the new data visualization options. ‘Identification of need in improving current practice’, ‘Outdated information’ and ‘Time consuming, frustrating’ all reinforce this. There were references in the more patient focused participants to a specific service that needed improvement that was ‘Struggle with arranging transport for patients’. This issue is not one that this project could help alleviate. The most mentioned code in this subtheme was ‘Patient concern’ which was raised by all three workshops. Given that the main task of the tool is to support the identification of patient concerns and associated actions, it was good that the tool prompted discussion around patient concerns and linked it to person-guided support.

Complementary to person-guided support

The codes under this subtheme all relate to the idea of the tool being a useful addition and adding value to the organization without being seen as a replacement for contact with patients. The tool was never conceived with the idea of replacing any contact between patients and carers, but more as an additional way to explore the data set and surface connections.

Risks

This subtheme is focused on the problems and potential risks that leaning too far into abstract software could add when looking at cancer data. All three groups of participants reinforced the danger of ‘Risk of reinforcing bias’. Part of this is more to do with the data set rather than the actual tool since the tool can only display the data it is given. The workshop facilitators did try to give some of the context about the different metrics for displaying the data like Confidence and Lift, but there was only so much time for that in each 1-hour workshop. Another risk themed code was ‘Assumptions disconnected from patient reality’ and ‘Dehumanisation due to digitisation’. These both appeared in the health/social care-focused groups who have a much closer relationship to cancer patients than the tech team. It was very important for them to make sure that this tool did not try to replace the human element of their jobs and that is not our intention.

Visualization

The theme of visualization is important as that is exactly what the graph shows to the user and is the main thing that is being evaluated by the participants.

Challenges with processing the graph information

Like the code ‘Specialist needed to interpret and draw insights from data’, the subtheme of ‘Challenges with processing the graph information’ shows the difficulty that first-time users face with this system. All the codes from this subtheme were mentioned by the care worker groups, who have less technical prowess with information display programmes. ‘Difficult to process’ is the most common code with 10 mentions, with ‘Overwhelming’ and ‘Outside their technical skill set’ behind that.

Fresh perspective

A positive subtheme is the affirmation that this visualization offered a fresh perspective of the data. The codes ‘Novel way of displaying data’, ‘Novel insight’ and ‘Intriguing concept’ all point towards a positive impression of the visualization tool. In particular ‘Novel way of displaying data’ was mentioned by all groups and nine times in total across all workshops. The rest of the fresh perspective codes however only come from the workshop 1 and 2. The codes ‘Clearer understanding’ and ‘New consideration for current practices’ point to the graph visualization being useful for the staff in giving them fresh ideas about the associations between pieces of data that they have been working with for a long time. This particular subtheme was the motivation for the project from the outset.

Usability

Usability is a theme with both positive and negative subthemes throughout. Accessibility of data and potential value had some very positive subthemes, while the ‘concerns’ subtheme collected a lot of the issues raised by workshop participants.

Accessibility of data

The subtheme ‘Accessibility of data’ contained very positive codes. The most mentioned code was ‘shared resources’ with nine mentions by members from workshops 1 and 2. This referred to internal systems that the participants of those workshops had problems with and had hopes that technologies like the visualization would be able to help with. ‘Efficient access to information’ was mentioned five times, again by the participants from workshops 1 and 2. This showed that the tool was showing use and working well at displaying information. The code ‘Filter Information’ was mentioned by both demographic groups and was the only code under this subtheme mentioned by the participants of workshop group 3. As a group that has a more technical focus it was good to see them pick up on that aspect and ask questions about the different ways that the tool offered to filter information in the tool. The rest of the codes are all positive ideas that were mentioned by the participants. ‘Adjustable specificity of query’ and ‘Filling gaps in information’ show more ideas that the participants had about where the tool could be upgraded further and development paths that they recognized as useful for them.

Potential value

The codes within ‘Potential value’ refer to areas where the tool could add value currently or in the future with additional changes being made. The first code ‘See potential for further applications’ is a very positive response that was recorded. It was mentioned six times by workshop 1 and 2. It is good that the team that was less tech-focused was able to see the potential value of the tool. ‘Area tailored support’ was mentioned and this would potentially be a key change to make the tool more useful at a local level rather than with a nationwide data set that loses some of the ground-level connections. ‘Uncertain of value in current limited form’ was a code that was mentioned four times by Groups 1 and 2 and mentioned once by Group 3. This code showed that the participants recognized the tool’s potential but did not think that it was providing full value to their current roles. A code that gave insight was ‘Time needed to consider further use’ with participants wanting more time to get to grips with the ideas that the tool was presenting to further understand what this technology could do. The final two codes ‘Automation: self-service booking systems for patients’ and ‘Support financial applications’ refer more to features that the participants of workshops 1 and 2 wanted in their workplaces but were not ones that would have implementation space in this version of the tool.

Concerns

The subtheme of ‘Concerns’ had some interesting insights into what problems the participants immediately noticed and thought about when looking at the tool. ‘Concern regarding data relevance’ was mentioned four times and all by workshop 3, as a team that has experience with looking at data sets from organization, it was interesting to hear that they thought that the data did not have relevance and saw flaws in that. These thoughts are possibly connected to the other codes that were mentioned by that group ‘Uncertainty of practical application’ and ‘Purpose unclear’. They did not see the tool as a practical way to look at the data and thought that the purpose was unclear. From the other workshop groups, the codes ‘Uncertain of personal usability’ and ‘More context needed’ reveal that they did not think that the tool would currently be a good fit for their current role, and they needed more information on what they were looking at and how it fits together.

System Usability Scale (SUS) survey from in-person workshop

The System Usability Scale (SUS) survey⁴⁴ of the in-person workshop (Group 4) yielded the results shown in Table 2. In this table, the SUS scores are expressed in terms of their quartile ranking.⁴⁵ It can be seen that the mean SUS score is in the top (best) quartile, Q4. This is corroborated by the many positive participant comments that while the visualization tool was initially daunting, after trying it out, it was actually easy to use. Participants also saw the potential of the tool to allow for service planning, to complement their existing eHNA dashboards which are very outcomes focused.

Table 2.

System Usability Scale (SUS) survey results from the in-person workshop (Group 4).

Mean SUS score	Standard deviation	Maximum	Minimum	Quartile of the mean SUS score⁴⁵
84	11.1	92.5	67.5	Q4 (best)

Association rule mining as a recommendation engine

The input datasets had significant class imbalances that made it challenging to train traditional artificial intelligence algorithms on it. Although specialized data processing and AI algorithms exist for such imbalanced data, we found that association rule mining provided a fairly simple way to understand and deal with these imbalances. For example, if the ‘confidence’ metric was used to find actions with the highest confidence of association with a given concern, ‘Discussed concern, advice given’ would be the only answer – a trivial result. On the other hand, by finding the associated actions with the highest ‘lift’, meaningful actions are obtained for each concern, as shown in Table 3. Lift corrects for the relative abundance of a consequent in the data, and we found this to be a powerful tool for prioritizing the most meaningful association rules. From a practical point of view, these actions with the highest lift can form the basis of a personalized recommendation to staff or persons affected by cancer in response to their specific concerns. A specific example with more in-depth calculations is provided in Appendix D of the Supplemental Material.

Table 3.

The most frequent concerns with actions in the eHNA database, together with their associated action with the highest confidence and lift.

Concern name	Associated action with highest confidence	Associated action with highest lift
Tired, exhausted or fatigued	Discussed concern, advice given	Advised to increase physical activity levels
Worry, fear or anxiety	Discussed concern, advice given	Referral to psychologist
Money or finance	Discussed concern, advice given	Referral to financial advice service
I have questions about my diagnosis, treatments or effects	Discussed concern, advice given	Referral to CNS
Pain or discomfort	Discussed concern, advice given	Advised to keep a pain diary or chart
Sleep problems	Discussed concern, advice given	Advised to keep a sleep diary
Eating, appetite or taste	Discussed concern, advice given	Advised to keep a food diary
Sadness or depression	Discussed concern, advice given	Referral to psychologist
Thinking about the future	Discussed concern, advice given	Referral to counselling service
Uncertainty	Discussed concern, advice given	Discussed as part of another concern
Moving around (walking)	Discussed concern, advice given	Referral to physiotherapy service
Changes in weight	Discussed concern, advice given	Referral to dietitian
Breathing difficulties	Discussed concern, advice given	Advised to contact GP
Partner	Discussed concern, advice given	Signposted to Maggie’s
Hot flushes or sweating	Discussed concern, advice given	Signposted to complementary therapy
Anger or frustration	Discussed concern, advice given	Referral to counselling service
Children	Discussed concern, advice given	Discussed concern
Constipation	Discussed concern, advice given	Advised to contact GP
Passing urine	Discussed concern, advice given	Advised to contact GP
Memory or concentration	Discussed concern, advice given	Advised to contact GP

To check the statistical significance of our association rules, we generated contingency tables based on the rules and used Fisher’s Exact Test to find the p-value. For a rule X→Y, its contingency table would be:

	Y	Not Y
X	a	b
Not X	c	d

a: number of HNA assessments with both X and Y.

b: number of HNA assessments with X and without Y.

c: number of HNA assessments without X and with Y.

d: number of HNA assessments without X and without Y.

The results on statistical testing of the association rules are reported in Appendix B of the Supplemental Material. A Bonferroni correction is applied to the raw 0.05 significance level to account for the size of the search space.⁴⁶ In this case, the rules are of the form concern → action. There are 135 possible concerns and 193 possible actions, giving a search space of 135 × 193 = 26,055. Therefore, the corrected significance level is (0.05/26,055)=1.919E-6, and p-values below 1.919E-6 show statistical significance.

Conclusion

The overall ambition of this work was to determine the utility of a games engine to support decision making through ‘playable mechanics’ and visual analytics to provide the relevant data to Health and Care professionals. Towards this, an information visualization tool that interactively and dynamically presents association rule mined data was codesigned, developed using Unity and subsequently evaluated.

The objectives of the research were:

To develop a visualization prototype able to depict electronic holistic needs assessment (eHNA) data in the form of an interactive 3D graph.

Work with stakeholders to design and implement key interactions to filter and navigate the data.

Determine if the interactive visualization, developed using a games engine, is of value to and usable by health and social care professionals.

Objective 1

A 3D Force Directed Node Graph (FNG) was created using the Unity game engine. It used the mined eHNA dataset to create a graph showing the connections between pairs of nodes and allowing users to interact with it using a mouse and/or keyboard.

The Unity game engine allowed for multiple deployment methods with a shift from a Windows executable format to a web-hosted system using Microsoft Azure. This allowed for an online version of the tool to be hosted on servers and accessible using a browser link.

The performance of the tool was enhanced algorithmically with the Barnes-Hut approximation used to approximate the physics of the force-directed network graph and achieve much better performance in terms of both the frame rate and of the amount of CPU usage.

Objective 2

Multiple sessions with stakeholders allowed for a range of features to be designed and added for filtering and exploring the graph in multiple ways. These stakeholders included frontline health and social care professionals whose services directly feed into the source datasets (because they and their patients fill up the forms that are captured in the HNA database), along with their managers and IT department staff.

A user interface (UI) layer with sidebars allowed screen space to be maximized while hiding multiple features when not in use. Features like category selection through dropdown menus were implemented to make the node selection easier. Multiple display filters were created for focusing on data by connection degree number, or by the colour mapping system. A grammar filter was added to perform AND/OR/NOT selection operations on subsets of the graph. Finally, an alternate ‘PABC’ rendering mode was developed that shifts focus away from the full graph to make evaluating local graph regions clear.

Objective 3

To evaluate this tool, online user workshops were repeated with three groups of participants from different departments who were able to see guided demos of the tool’s function and report thoughts and feedback verbally and in written form on an interactive Miro web board. A thematic analysis of the conversations during the workshop was undertaken to reaffirm the ideas and thoughts of the participants. This was followed up by an in-person workshop with the same overall structure as the online workshops but giving participants the opportunity to try the tool for themselves on their own laptops. A System Usability Scale (SUS) survey was conducted at its conclusion, with an excellent mean score of 84, placing it in the top quartile of studies conducted with the same scale.⁴⁵ One possible use case is for finding or recommending actions to take in response to particular concerns expressed by persons affected by cancer. In this regard, sorting by the ‘lift’ metric was found to be more useful than ‘confidence’ because it corrects for class imbalances in the dataset. The association rules can also be tested for statistical significance by building contingency tables and applying Fisher’s Exact Test. The rules with the highest lift often have p-values that are statistically significant.

Future work

We will seek to expand the web and cloud service deployment of our tools, moving from a prototype application towards a system that is fully embedded in the operations and services of our collaborators.

Very early in the project we experimented with a Virtual Reality (VR) interface for using the tool. It used the Oculus Rift headset and handheld controllers to explore the graph in a VR environment. This was another advantage of using the Unity game engine – it had support for a huge range of control systems. However, the VR Version required a whole new interface to interact with the different filtering systems. Instead of a mouse cursor, the handheld controllers allow the user to point at Nodes in the graph with their hands and see what information was revealed. However, the VR mode, especially with the advent of passthrough allows new interaction modes with the 3D dataset. This is a potential direction for future work that can build on the powerful capabilities of the underlying game engine.⁴⁷

In tandem with this push towards 3D and VR technologies, the computational cost of the system also tends to increase. This is part of the overall trade-off between rich graphics and computational cost. To mitigate the computational burden, various algorithmic approaches can be implemented, including clustered force-directed graphs.^48–50 These could have the added benefit of an improved 3D layout display. By sorting the nodes into clusters, a distinct large-scale structure could be imposed on the graph which could aid its overall interpretability. Furthermore, in future work, we would like to leverage Unity’s new data-oriented technology stack (DOTS) to scale processing in a highly optimal manner. For example, LLVM and Burst compiler technology can be used to achieve near-native code performance from C#.

This work contributes to the fields of data visualization and visual analytics and shows that games engines can be used effectively and with much customization and flexibility for 3D visualization of network/graph data. The innovations presented in the paper translated typical graph visualization and interaction modes – selection, exploration, pruning, connecting and filtering⁵¹ into the 3D game engine space. Further, the real-time interaction aids the interpretability of complex graphs. The raw statistical output of associative rule mining was transformed into interactive representations that can be filtered based on confidence, lift etc and this offered great flexibility with respect to data visualization and interaction modes. This highlights the potential for game engines as a key tool for emerging visual analytic approaches for exploring association relationships in graph data⁵² as well as being able to offload computation to GPU via hardware acceleration and compute shaders.

Finally, the visualization techniques pioneered here can be translated to almost any graph or network data, such as node-based programming. We intend to explore more opportunities for expanding this technology to new fields with similar visualization challenges.

Supplemental Material

sj-docx-1-ivi-10.1177_14738716241309243 – Supplemental material for Creating informative experiences through a visual and interactive representation of health and social care data

Supplemental material, sj-docx-1-ivi-10.1177_14738716241309243 for Creating informative experiences through a visual and interactive representation of health and social care data by Kean Lee Kang, Adam Hastings, Alex Danielle Hughes, Karolina Myszkowska, Margaret Greer, Janice Preston, Don McIntyre, Janette Hughes, Kara Mackenzie, James Bown and Ruth Falconer in Information Visualization

Footnotes

Andrew Brittle,Anna Morton,Anne Savage,Callum Keddie,Claire LeBlanc,Craig Mason,Dany Bell,Emelie Edholm,Fiona Smith,Gregory Colgan,Gwen Davidson,Jackie Rodger,Jagdeep Ahluwalia,Javad Zarrin,Jay Bradley,Judith Mabelis,Julie Gowans,Kelly Shiell-Davis,Leigh-Anne Hepburn,Miriam Fisher,Nick Palmer,Pamela Mooney,Rebecca Gorringe,Rhona Auckland,Sandra McDermott,Sharon Reilly,Steve Mecrow,Stuart Deed and Veronica Arias.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This work was funded by Macmillan Cancer Support.

Ethics

Ethical approval for this study was granted by the university research ethics committee. This work uses data provided with informed consent by persons affected by cancer.

ORCID iD

Kean Lee Kang

Data statement

The health and social care data of persons affected by cancer used in this publication is confidential and unavailable to the public. However,certain portions of the study data,in anonymized form,may be made available upon request and approval.

Supplemental material

Supplemental material for this article is available online.

References

Fister

, et al. A comprehensive review of visualization methods for association rule mining: taxonomy, challenges, open problems and future ideas. Expert Syst Appl 2023; 233: 120901.

Bamfax. ForceDirectedNodeGraph3DUnity, https://github.com/Bamfax/ForceDirectedNodeGraph3DUnity (2016, accessed 14 August 2019).

Cheong

YW.

Force-directed algorithms for schematic drawings and placement: a survey. Inf Vis 2020; 19(1): 65–91.

Kosmadoudi

Lim

Ritchie

, et al. Engineering design using game-enhanced CAD: the potential to augment the user experience with game elements. Comput Aided Des 2013; 45(3): 777–795.

Falconer

Haltas

Varga

, et al. Anaerobic digestion of food waste: eliciting sustainable water-energy-food nexus practices with agent based modelling and visual analytics. J Clean Prod 2020; 255: 120060.

Stojanovic

Falconer

Isaacs

, et al. Streaming and 3D mapping of AGRI-data on mobile devices. Comput Electron Agric 2017; 138:188–199.

Hofmann

Wilhelm

Visual comparison of association rules. Comput Stat 2001; 16(3): 399–415.

Agrawal

Mehta

Shafer

, et al. The quest data mining system. In: Proceedings of the knowledge discovery and data mining, Portland, OR, 2 August 1996.

Han

Chiang

Chee

, et al. DBMiner: a system for data mining in relational databases and data warehouses. In: Proceedings of the 1997 conference of the centre for advanced studies on collaborative research, Toronto, Ontario, Canada, 1997, pp.8. IBM Press.

10.

Brunk

Kelly

Kohavi

. MineSet: an integrated system for data mining. In: Proceedings of the third international conference on knowledge discovery and data mining, Newport Beach, CA, 1997, pp.135–138. Menlo Park, CA: AAAI Press.

11.

Liu

Hsu

Chen

, et al. Analyzing the subjective interestingness of association rules. IEEE Intell Syst Their Appl 2000; 15(5): 47–55.

12.

Klemettinen

Mannila

Ronkainen

, et al. Finding interesting rules from large sets of discovered association rules. In: Proceedings of the third international conference on information and knowledge management, Gaithersburg, MD, 1994, pp.401–407. New York, NY: Association for Computing Machinery.

13.

Hao

Dayal

Hsu

, et al. Visualization of directed associations in e-commerce transaction data. In: Eurographics/IEEE VGTC symposium on visualization (eds Ebert DS, Favre JM and Peikert R), Ascona, Switzerland, May 28–30, 2001. The Eurographics Association.

14.

Blanchard

Guillet

Briand

Interactive visual exploration of association rules with rule-focusing methodology. Knowl Inf Syst 2007; 13: 43–75.

15.

Yang

Pruning and visualizing generalized association rules in parallel coordinates. IEEE Trans Knowl Data Eng 2005; 17(1): 60–70.

16.

Unwin

Hofmann

Bernt

. The TwoKey plot for multiple association rules control. In: Proceedings of the 5th European Conference on principles of data mining and knowledge discovery (eds De Raedt L and Siebes A), 2001, pp.472-483. Berlin, Heidelberg: Springer.

17.

Pryke

Beale

Interactive comprehensible data mining. In: Cai

(ed.) Ambient intelligence for scientific discovery: foundations, theories, and systems. Berlin, Heidelberg: Springer, 2005, p.48.

18.

Muzner

Kong

, et al. Visual mining of power sets with large alphabets. Technical report TR-2005-25. University of British Columbia Computer Science, 2005.

19.

Leung

Irani

Carmichael

CL.

FIsViz: a frequent itemset visualizer. In: Advances in knowledge discovery and data mining (eds Washio

Suzuki

Ting

, et al), 2008, pp.644–652. Berlin, Heidelberg: Springer.

20.

Leung

CKS

Irani

Carmichael

. WiFIsViz: Effective visualization of frequent itemsets. In: 2008 eighth IEEE international conference on data mining, 2008, pp.875-880. New York, NY: IEEE.

21.

Leung

Kononov

Pazdor

AGM

, et al. PyramidViz: visual analytics and big data visualization for frequent patterns. In: 2016 IEEE 14th international conference on dependable, autonomic and secure computing, 14th international conference on pervasive intelligence and computing, 2nd international conference on big data intelligence and computing and cyber science and technology congress, 2016. New York, NY: IEEE.

22.

Leung

Jiang

Irani

. FpMapViz: a space-filling visualization for frequent patterns. In: Proceedings of the 2011 IEEE 11th international conference on data mining workshops, 2011, pp.804–811. New York, NY: IEEE.

23.

Leung

Jiang

RadialViz: an orientation-free frequent pattern visualizer. In: Proceedings of the 16th Pacific-Asia conference on advances in knowledge discovery and data mining - Volume Part II, Kuala Lumpur, Malaysia, 2012, pp.322–334. Berlin, Heidelberg: Springer-Verlag.

24.

Dubois

PMJ

Han

Jiang

, et al. An interactive circular visual analytic tool for visualization of web data. In: 2016 IEEE/WIC/ACM international conference on web intelligence (WI), 13 October 2016, pp.709–712. New York, NY: IEEE.

25.

Barkwell

Cuzzocrea

Leung

, et al. Big data visualisation and visual analytics for music data mining. In: 2018 22nd international conference information visualisation (IV), 2018, pp.235–240. New York, NY: IEEE.

26.

Ware

Franck

Evaluating stereo and motion cues for visualizing information nets in three dimensions. ACM Trans Graph 1996; 15(2): 121–140.

27.

Hlosta

Šebek

Zendulka

. Approach to visualisation of evolving association rule models. In: 2013 second international conference on informatics & applications (ICIA), 23–25 September 2013, pp.47–52. New York, NY: IEEE.

28.

Shahaf

Guestrin

Horvitz

, et al. Information cartography. Commun ACM 2015; 58: 62–73.

29.

Shahaf

Guestrin

Horvitz

. Trains of thought: generating information maps. In: Proceedings of the 21st international conference on World Wide Web, 2012, pp.899–908. Lyon, France: Association for Computing Machinery.

30.

Fister

Association rules over time. In: Khosravy

Gupta

Patel

(eds) Frontiers in nature-inspired industrial optimization. Singapore: Springer Singapore, 2022, pp.1–16.

31.

Snowden

Young

Savinc

Meeting psychosocial needs to improve health: a prospective cohort study. BMC Cancer 2020; 20: 1–10.

32.

Kang

Greer

Bown

, et al. Unsupervised machine learning of integrated health and social care data from the Macmillan Improving the Cancer Journey service in Glasgow. In: 2018 NCRI cancer conference, Glasgow, UK, 4–6 November 2018, pp.1–49. London, UK: Springer Nature.

33.

Agrawal

Imielinski

Swami

. Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, Washington, DC, pp.207–216. New York, NY: ACM.

34.

Agrawal

Srikant

. Fast algorithms for mining association rules. In: 20th very large data bases conference, Santiago, Chile: VLDB Endowment.

35.

Raschka

MLxtend: providing machine learning and data science utilities and extensions to Python’s scientific computing stack. J Open Source Softw 2018; 3(24): 638.

36.

Pedregosa

Varoquaux

Gramfort

, et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011; 12: 2825–2830.

37.

Hastie

Tibshirani

Friedman

The elements of statistical learning. 2nd ed. New York, NY: Springer-Verlag, 2009.

38.

Hoolohan

Larkin

McLachlan

, et al. Engaging stakeholders in research to address water–energy–food (WEF) nexus challenges. Sustain Sci 2018; 13: 1415–1426.

39.

Isaacs

Falconer

Gilmour

, et al. Enhancing urban sustainability using 3D visualisation. Proc Inst Civ Eng Urban Des Plan 2011; 164(3): 163–173.

40.

Bown

Shovman

Robertson

, et al. A signaling visualization toolkit to support rational design of combination therapies and biomarker discovery: SiViT. Oncotarget 2016; 8(18): 29657.

41.

Teal

French

. Spaces for participatory design innovation. In: Proceedings of the 16th participatory design conference 2020 - Participation(s) otherwise - Volume 1, Manizales, Colombia, pp.64–74. New York, NY: Association for Computing Machinery.

42.

Mcara

Broadley

Prosser

, et al. Designing distributed community participation knowledge exchange report, https://radar.gsa.ac.uk/7833/1/DDCP-Report-2021.pdf (2021, accessed 7 May 2024).

43.

Maguire

Delahunt

Doing a thematic analysis: a practical, step-by-step guide for learning and teaching scholars. All Irel J Higher Educ 2017; 9(3): 3351–33514.

44.

Brooke

SUS: a quick and dirty usability scale. Usability Eval Ind 1996; 189(194): 4–7.

45.

Bangor

Kortum

Miller

Determining what individual SUS scores mean: adding an adjective rating scale. J Usability Stud 2009; 4(3): 114–123.

46.

Webb

GI.

Discovering significant patterns. Mach Learn 2007; 68: 1–33.

47.

Erra

Malandrino

Pepe

Virtual reality interfaces for interacting with three-dimensional graphs. Int J Hum Comput Interact 2019; 35(1): 75–88.

48.

Eades

Huang

ML.

Navigating clustered graphs using force-directed methods. Graph Algorithms Appl 2004; 2: 191–215.

49.

Hua

Huang

Nguyen

(eds). Drawing large weighted graphs using clustered force-directed algorithm. In: 2014 18th international conference on information visualisation, 16–18 July 2014. New York, NY: IEEE.

50.

Huang

Eades

Wang

On-line animated visualization of huge graphs using a modified spring algorithm. J Vis Lang Comput 1998; 9(6): 623–645.

51.

Kang

Stasko

Toward a deeper understanding of the role of interaction in information visualization. IEEE Trans Vis Comput Graph 2007; 13(6): 1224–1231.

52.

Chen

Guan

Zhang

, et al. A survey on visualization approaches for exploring association relationships in graph data. J Vis 2019; 22: 625–639.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

4.08 MB