Abstract
Keywords
Introduction
Autonomous Shipping and Human Supervisory Control
Systems with autonomous capabilities, typically based on Artificial Intelligence (AI) and Machine Learning algorithms, are proliferating across society and industries. In the maritime domain, ships are envisioned to deploy advanced automation, or ‘agents’, capable of sensing their environment and executing goal-directed behaviour using actuators, allowing for advanced functions to be performed with increasing levels of autonomy (IMO, 2018; Russell & Norvig, 2022). For example, in Japan, a commercial container ship conducted a 790-km trial to test its autonomous navigation capabilities without human intervention (Nippon Yusen Kaisha; NYK, 2022). In Norway, the Yara Birkeland container ship and the ASKO barges have commenced operation with the aim to navigate autonomously within a few years (AS Kolonialgrossistene; ASKO, 2022; Yara International, 2022). Here, operators are envisioned to work in positions from which single or multiple autonomous ships can be continuously monitored and supervised (e.g. see Massterly, 2023). In this context, supervisory performance is dependent on the operator’s ability to ‘[perceive] elements in the environment within a volume of time and space, [comprehend] their meaning, and [project] their status in the near future’, that is, to obtain and maintain situation awareness (SA; Endsley, 1995, p. 36). In other words, this means that operators should be able to perceive critical parameters made available through the control and safety systems, analyse the ship’s current and planned behaviour, and evaluate the plan’s adequacy considering its context (van de Merwe, et al, 2024a). To support operators in achieving and maintaining SA of an autonomous ship’s performance, it is critical to understand how effective human supervisory performance can be achieved whilst avoiding potential human performance pitfalls.
Challenges related to the human supervision of highly automated systems are well documented in the scientific literature (Endsley, 2017). For example, the out-of-the-loop (OOTL) performance problem is attributed to a loss of skills and SA, and occurs when operators are no longer an active part of a system’s information loop (Endsley & Kiris, 1995; Metzger & Parasuraman, 1999). In addition, transitioning back into the information loop often results in high workload because of the need to build SA and regain manual control (Endsley, 2017; Onnasch, Wickens, Li, & Manzey, 2014; Weaver & DeLucia, 2020). Taken together, these challenges are described as the ‘automation conundrum’ which states that ‘the more automation is added to a system, and the more reliable and robust that automation is, the less likely the human operators overseeing the automation will be aware of critical information and able to take over manual control when needed’ (Endsley, 2017, p. 8). This means the safe implementation of systems with autonomous capabilities thus depends on the degree to which humans can oversee the agent’s decisions and actions, and the agent’s ability to afford humans insight into its reasoning processes (J. Y. C. Chen, Procci, et al., 2014a).
Human Performance and Agent Transparency
‘Agent transparency’ (J. Y. C. Chen, Procci, et al., 2014a), ‘system transparency’ (Ososky, et al, 2014), ‘display transparency’ (National Academies of Sciences, Engineering and Medicine, 2022), ‘automation transparency’ (Skraaning & Jamieson, 2021), or simply ‘transparency’ are terms used to describe the ‘understandability and predictability of [a] system’ (Endsley, 2023; Endsley, Bolté, & Jones, 2003, p. 146). Endsley (2017) defined transparency as a means to enhance the understandability and predictability of systems by making observable what it is doing, why it is doing it, and what it will do next. J.Y.C. Chen et al. described agent transparency as ‘the descriptive quality of an interface pertaining to its abilities to afford an operator’s comprehension about an intelligent agent’s intent, performance, future plans, and reasoning process’ (2014b, p. 2). Finally, Lyons depicted transparency as the ability of an operator to perceive an agent’s abilities, intents, and situational constraints (2013). The aim of agent transparency is to provide ‘a real-time understanding of the actions of the AI system’ (National Academies of Sciences, Engineering and Medicine, 2022, p. 31) and enable ‘the operator to maintain proper SA of the system in its tasking environment without becoming overloaded’ (Mercado et al., 2016, p. 402). In addition, transparency intends to facilitate human-agent collaboration when humans are tasked with supervising automated systems. That is, when an agent communicates what it does, why it does it, and what it will do next, human supervision should be supported (Endsley, 2023). Conversely, opaque agents can be challenging to supervise as they may be difficult to interpret because of a lack of information provision (Doshi-Velez & Kim, 2017; Lipton, 2017). In other words, when the agent’s inner workings are made apparent to the user, the user’s comprehension of the agent may be enhanced (Ososky et al., 2014).
In recent years, there has been an increasing interest in understanding the effect of transparency on selected human performance variables including, SA (Selkowitz, Lakhmani, & Chen, 2017; Skraaning & Jamieson, 2021; Wright, Chen, & Lakhmani, 2020), decision making (Bhaskara et al., 2021; Loft et al., 2023), mental workload (Mercado et al., 2016; Stowers et al., 2020), and automation trust (J. Y. C. Chen et al., 2018; Ezenyilimba et al., 2023; Schmidt, Biessmann, & Teubner, 2020). Furthermore, a recent review of the transparency literature studied the relation between agent transparency and human performance variables finding positive effects on SA and task performance, without negatively affecting mental workload, for increasing levels of transparency (van de Merwe et al., 2024b). These findings indicate the potential benefit transparency can have in cases where operators need to understand the behaviour of a system and perform manual intervention when required. Thus, transparency can be especially relevant in safety critical domains where understandability and predictability are essential for safe and effective control of processes (Endsley, 2023; Jamieson, Skraaning, & Joe, 2022).
Agent Transparency and Autonomous Shipping
Several recent studies have addressed agent transparency within the autonomous shipping domain. For example, Ramos et al. (2019) performed a task analysis to derive potential human failures when monitoring autonomous ships. Here, the study identified the importance of the supervisors’ ability to collect and evaluate information from the autonomous ship through ‘an adequate HMI’ (human-machine interface), such that a strategy for intervention could be determined should the automation fail (Ramos et al., 2019, p. 43). Van de Merwe et al. (2024a) identified specific information requirements for supervising autonomous collision and grounding avoidance (CAGA) systems based on a Goal-Directed Task Analysis (GDTA; Endsley et al., 2003). The study highlighted the need for continuous, sufficient, and adequate information about the CAGA system’s decisions, planned actions, and underlying information processing, that is, transparency information, to alleviate some of the human performance issues in supervision and support safe and effective oversight of CAGA systems. Furthermore, Porathe (2021) discussed the use of ‘expert systems’ to aid operators in supervising one or more autonomous ships. Here, HMI concepts were proposed aiding operators to obtain at-a-glance understanding of how the system perceives and understands the nearby traffic and its intentions for solving collision situations. This includes showing how the CAGA system plans to solve a situation by graphically displaying the various options it has considered, and which solution it intends to execute. Also, Van de Merwe et al. (2023a) operationalised transparency for autonomous ships by developing concepts for how an autonomous CAGA system may display its perception and analysis of its environment, determination of collision risk, and plans to resolve the situation. Moreover, Alsos et al. (2022) examined how the transparency concept could be operationalised for autonomous ships. Here, the aim was to assess how autonomous ships can share intent information to external stakeholders, such as passengers, traffic services, and other nearby ships. Finally, operationalising this idea, Simic and Alsos (2023) developed a concept for autonomous urban ferries in which the ship’s perceptions, current state, and future intentions are communicated to external stakeholders through light strips and displays mounted on the outside of the ferry.
Although these studies address the potential benefits of agent transparency in relation to human supervisory control in an autonomous shipping context, they fall short on measuring its purported effects. That is, to the best of our knowledge, no studies have been performed that have empirically tested the effect of transparency on human supervisory performance in an autonomous shipping context and have considered the complexities that can arise in realistic traffic-dense environments. As such, given the concrete developments towards autonomy in the maritime domain, there is a need for knowledge with regards to the application of transparency within this context and study its effect on human performance variables. Therefore, this study aims to extend the literature by empirically evaluating the application of transparency in a maritime autonomous shipping context. Specifically, this study asks what the effects of agent transparency and traffic complexity are on the supervisor’s (1) SA, (2) mental workload, and (3) task performance.
Situation Awareness, Mental Workload, and Task Performance
Summary of Predictions Regarding the Effect of Transparency and Complexity on Situation Awareness, Mental Workload, and Task Performance.
As agent transparency is about disclosing system-internal information, the degree of transparency can typically be varied by increasing or decreasing the amount of information it presents about its internal processes, decisions, and planned actions (see Bhaskara et al. (2021) and Pokam et al. (2019) for examples). Although increased levels of agent transparency imply increased insight into the agent’s reasoning, full disclosure of the system’s internal state may pose challenges in terms of the user’s cognitive processing capabilities (Bhaskara et al., 2020; Wickens, 2018). That is, although increased transparency may benefit SA, this may also add an additional cognitive processing burden due to the resources required for selecting and dividing attention and keeping information in working memory (Wickens & Carswell, 2021). This may be exacerbated in situations where the baseline level of information is already high, that is, in complex traffic situations (Moacdieh & Sarter, 2017). Here, increased levels of transparency information may provide an additional information burden and the risk of overloading the operator with information that supports transparency is high, especially when increased information leads to display clutter (Moacdieh & Sarter, 2015a). However, despite risks of increased workload with agent transparency, recent studies have not found a clear relationship between these variables (Ezenyilimba et al., 2023; Loïck, Guérin, Rauffet, Chauvin, & Éric, 2023; Tatasciore, Bowden, & Loft, 2023), possibly because of the use of graphical symbols and integration of transparency information in task displays (Gegoff, Tatasciore, Bowden, McCarley, & Loft, 2023; van de Merwe, et al, 2024a; van Doorn, Horváth, & Rusák, 2021). Building on these findings, this study anticipates that when, first, information requirements are identified based on an iterative human-centred design approach (Endsley et al., 2003; ISO, 2019), second, symbology is developed based on context-specific industry standards (IEC, 2022), and third, transparency information is integrated in the primary task display (Skraaning & Jamieson, 2021; van Doorn et al., 2021), mental workload will not be affected by agent transparency (see Table 1).
Future supervisors of autonomous systems are likely to divide their attention between multiple units and/or have other concurrent tasks to perform (Cummings & Guerlain, 2007; Mercado et al., 2016; Wohleber, Stowers, Barnes, & Chen, 2023). Such roles may require shifting attention between one unit and another, or between one task and another, emphasising the need for rapid assessment of agent performance and ‘to quickly get in-the-loop’ (Porathe, Fjortoft, & Bratbergsengen, 2020, p. 3). Assuming that human-centred design principles are adequately applied in this study (Endsley et al., 2003; ISO, 2019), this study anticipates that the availability of information that supports transparency, in the form of SA knowledge directly perceivable on the CAGA system’s interface, expedites the supervisor’s attainment of SA (van Doorn et al., 2021). Therefore, it is hypothesized that users spend less time to comprehend the CAGA system’s reasoning when this information is available (Kunze, Summerskill, Marshall, & Filtness, 2019; Roth, Schulte, Schmitt, & Brand, 2020) (see Table 1). Furthermore, it is hypothesised that transparency has a benefit in situations where essential system-internal information may get lost among other information elements, that is, in complex traffic situations. In these cases, provided that transparency information is made salient, presented in a well-organised manner, and integrated in the user’s primary task display, this should facilitate comprehension of the system, despite increased complexity (Moacdieh & Sarter, 2015b, 2017).
Method
Participants
Participant Demographics and Selected Experience With Technologies.
Technical Setup
To maximise recruitment, the first author travelled to locations most suitable for the participants to perform the study, including onboard a passenger ferry where participants worked or at various national nautical training institutes. Nevertheless, the technical setup, conditions, and conduct of the experiment was standardised and consistent regardless of the location where the data was gathered (see Figure 1). The experiment was conducted on a standard portable office computer using a 24″ screen with 1920x1200 resolution running Windows 10. E-Prime 3.0 served as the experimental platform in which the experimental stimuli were provided and primary data was recorded (Psychology Software Tools, Inc, 2023). Finally, post-experiment interviews were recorded using pen and paper. The technical setup used for the experiment: on location onboard of one of the passenger ferries, and at the university’s experimental lab.
Execution of the Experiment
Procedure
Figure 2 depicts the execution of the experiment. After a brief introduction, participants signed an informed consent form stating that participation is voluntary and that they had the liberty to withdraw at any stage during the experiment, without reason or penalty. This research complied with the American Psychological Association Code of Ethics and was approved by the Norwegian Centre for Research Data reference number 986652. Informed consent was obtained from each participant. Participants were briefed on the experimental procedure, what was expected of them, and the HMI used in the experiment. A practice session was performed to familiarise the participants of the execution of the experiment, including stimuli and questionnaires. After this, the experiment commenced, and the experimental trials and measurements were performed. Two trials were performed that were identical in set up, but with new traffic situations to avoid familiarisation. After the trials, the pairwise comparisons, as part of the workload measurements, were performed, and a semi-structured interview was conducted. Depending on the participant’s progress, the entire experiment lasted between one and 2 hours and the experimental trials between 10 and 30 minutes each. An illustration of the procedure for the experiment.
Experimental Tasks
Participants took the role of a supervisor of a ship equipped with an autonomous CAGA system. They were tasked with observing and understanding a traffic situation depicting own ship in conflict with a target ship and own ship’s proposed solution to resolve it. Once the participant felt they had sufficiently understood the situation, including the system’s solution, they were to press a button on the keyboard after which the screen was blanked, and questions were presented concerning SA and mental workload. To provide participants with a sense of urgency, participants were told they had a 90 second time limit to evaluate the traffic situation after which the radar image would disappear automatically. However, in practice, there was no time limit imposed by the researchers to avoid a ceiling effect in the measurements. No time keeping device was available to the participants. Once the questions were answered, the participant pressed a key to continue, and a new traffic situation was shown. This process was repeated until all traffic situations for all experimental conditions were completed.
The traffic situations were developed by a licensed navigator and reviewed by two independent, and licensed navigators (see Van de Merwe et al. (2023a) for further details). The traffic situations were created on a desktop simulator at a maritime education and training institution. Each traffic situation was configured such that they represented a potential collision situation involving own ship and one other vessel in either a head-on-, crossing-, or overtaking situation. To avoid familiarisation with the traffic situations, multiple variations were developed including conflict situations in coastal- and confined waters, restrictions in target ship’s ability to manoeuvre, and own ship as a stand-on vessel (IMO, 1977). However, to ensure equivalence in difficulty between the situations, they only consisted of one-to-one ship encounters. This meant that, although traffic situations could depict multiple ships, own ship was only in conflict with one other target ship. As such, traffic situations were created with variations in terms of type of conflict situation (head-on, crossing, overtaking/overtaken), who has right of way (own ship is the give-way vessel or the stand-on vessel), type of relevant avoidance actions proposed by the CAGA system (route- and/or speed change), and any restrictions in target ship manoeuvrability (restricted in ability to manoeuvre). In total, 20 unique traffic situations were used for the experiment: four for the familiarisation phase, eight for trial one and eight for trial two, that is, 16 situations for the experimental trials in total. For readers interested in the traffic situations and their configurations, a table is made available as Supplemental Material on the journal’s web site.
Experimental Design
This study used a repeated measures approach in which all participants performed all eight experimental conditions: four transparency levels x two complexity levels. Participants were shown one traffic situation for each condition in each trial. Since the experiment comprised of two trials, participants performed 16 experimental runs in total. The data for each experimental condition was averaged between trial one and two. To avoid familiarisation and order effects, the conditions were administered in random order within each trial.
Independent Variables
Transparency
For this study, four levels of transparency were defined based on the amount and type of information to disclose to the supervisor. Which information to disclose was identified in an earlier study based on a GDTA of collision avoidance manoeuvring (van de Merwe, et al., 2024b). These information requirements were subsequently structured based on an information processing model (Parasuraman, Sheridan, & Wickens, 2000; van de Merwe, et al, 2023b) (see Figure 3). The framework for establishing transparency requirement for a CAGA system based on a model of human information processing (adapted from Parasuraman et al., 2000; and adopted from van de Merwe et al., 2023b).
Information Elements Corresponding to Each Information Processing Step (van de Merwe et al., 2023b). Key: OT = overtaking/overtaken, HO = head-On, CR = crossing, GW = give-way, SO = stand-on.
Levels of Transparency.
Traffic Complexity
Two levels of complexity were defined for this study: traffic situations with low- and with high complexity. Traffic complexity was defined by the degree to which own ship had the space to perform an avoidance manoeuvre. In cases where there was limited manoeuvring space, for example, because of another ship, the vessel was considered ‘boxed in’ and own ship may needed to postpone an avoidance manoeuvre until the obstruction had been passed, change speed, or choose an alternative solution. Given the additional analysis and decision making that was required for such cases, these were considered more complex than those where a single and unobstructed solution could be implemented. As such, complexity was operationalised by adding objects to the traffic situation and ensuring own ship is boxed in.
Human-Machine Interface
During the experimental trials, participants were shown traffic situations in the form of a static radar image depicted on a radar display from a popular maritime equipment manufacturer (see Figure 4 for an example). On this image, vessels, objects and other radar echoes were shown representing a realistic traffic situation. Information such as settings, range, targets, and (time to) closest point of approach limits were also available and could be freely used by the participant to make sense of the traffic situation. A typical traffic situation representing a collision situation (overtaking) with high complexity (without transparency).
Information about the CAGA system’s information processing was added to the radar display (see Figure 5 for an example) and integrated in the primary task display as much as possible (Endsley, 2023; Endsley et al., 2003). The symbology representing information that supports transparency was developed by a licensed navigator using an iterative development process (ISO, 2019), based on the IEC 62288 standard for maritime navigation and radiocommunication equipment (IEC, 2022), and reviewed by two independent and licensed navigators (see Van de Merwe et al., 2023a for more details on the development process). In this case, central information regarding own ship actions, risk analysis, and detections, were overlaid onto the primary information source for collision avoidance, that is, the radar display. Depending on the experimental condition, this information varied depending on the relevant level of transparency (see Table 4) and thereby which elements of the system’s information processing were depicted (see Table 3). An example of a traffic situation with four different levels of transparency is made available as Supplemental Material on the journal’s web site. A typical traffic situation representing a collision situation (overtaking) with high complexity (with transparency).
Dependent Variables
Situation Awareness
Examples Situation Awareness Queries for the Traffic Situation Depicted in the Above figures. Correct Answers are in Bold Font.
Workload
Workload was measured using the NASA-TLX (Hart & Staveland, 1988). This scale measures self-reported subjective experience of workload across six dimensions (mental demand, physical demand, temporal demand, performance, effort, and frustration level). As part of this scale, participants perform a pairwise comparison to create weights for the dimensions. The sum of the weighted workload scores for all dimensions defines the total workload score. However, as setting the weights after each run is somewhat time-consuming and as the type of task is constant across the experiment, a version of the NASA-TLX was used where participants only perform pairwise comparisons once, and only after all experimental trials were performed. As such, the weights derived from the pairwise comparison applied to all workload scores for the individual runs.
Task Performance
Task performance was defined as the time required for participants to feel they had obtained an understanding of the traffic situation through the information provided by the CAGA system, that is, time-to-comprehension (TTC). Similar to other time-related performance measures, such as eye-tracking, reading speed, search time, and time to task completion, this variable was chosen as an indicator of how quickly humans are able to process information (Gawron, 2019). TTC was self-guided and consisted of the participant deciding that the traffic situation and the visualised solution was sufficiently understood. The time measurement started at the moment the traffic situation was displayed and ended upon a key press by the participant after which the screen was blanked. Time was measured in seconds with no time limit imposed. Still, the participants were urged to be as quick and accurate as possible.
Ranking
After the experimental trials, one representative high complexity traffic situation from the experiment was shown but with different levels of transparency presented. Participants were asked to rank the four variants for each of the dimensions of transparency: observability and predictability (MITRE, 2018). Definitions for these dimensions were read verbatim to the participants and were available on paper, including an example of its application in the collision avoidance context. A think-aloud protocol was used to record the participant’s verbal reasoning of the ranking (Eccles & Arsal, 2017). The traffic situation with four levels of transparency that was used for the ranking is made available as Supplemental Material on the journal’s web site.
Results
Data Analysis and Statistics
In the experiment, two trials were performed (trial 1 and trial 2) that were identical in experimental setup and execution, but for which different traffic situations were used. The data from these trials were averaged, and screened for missing values, outliers, and tested for normality. Due to technical issues with the experimental setup, recording of TTC was incomplete for the initial set of participants, and led to missing values for six participants. This issue was corrected, and no missing values were reported for the remaining participants. As a result, of the 272 measurements for TTC, 20 measurements (7%) were missing. Finally, there were three participants with outliers for the TTC variable that were removed in the final data analysis. An outlier was defined as a data point lying outside 1.5 times the inter-quartile range of that variable. Thus, the data of 25 participants were used in the analysis of this variable.
The dependent variables were tested for normality using the Shapiro-Wilk test (Shapiro & Wilk, 1965). Significant deviations from normality were found for the SA scores. However, the number of observations per cell for these variables was sufficient and equal for each cell (
Overall Means, Standard Deviations, and Pearson Correlations Between the Dependent Variables.
*
Means and Standard Deviations for the Dependent Variables as a Function of Transparency Level only. Note That TTC is Measured in Seconds.
Means and Standard Deviations for the Dependent Variables as a Function of Complexity only. Note That TTC is Measured in Seconds.
Means and Standard Deviations for the Dependent Variables as a Function of Level of Transparency and Complexity. Note That TTC is Measured in Seconds.
Situation Awareness
A main effect for transparency was found for level 1 SA ( Mean scores for level 1 SA as a function of transparency and complexity. Note the error bars represent the 95% confidence interval.
A main effect of transparency on level 2 SA was found ( Mean scores for level 2 SA as a function of transparency and complexity. Note the error bars represent the 95% confidence interval.
A main effect of transparency on level 3 SA was found ( Mean scores for level 3 SA as a function of transparency and complexity. Note the error bars represent the 95% confidence interval.
Mental Workload
No main effect of transparency on mental workload was found (see Table 6). However, individual dimensions as measured through the NASA-TLX were analysed and showed an effect on the ‘Performance’ sub-dimension ( Mean scores for mental workload as a function of transparency and complexity. Note the error bars represent the 95% confidence interval.
Task Performance
A main effect for transparency was found for mean TTC ( Mean scores for TTC as a function of transparency and complexity. Note the error bars represent the 95% confidence interval.
Preference
A main effect of transparency was found on the subjective ranking of the transparency levels (F (3, 31) = 616.64, The participants’ preferences for the transparency levels. Note that a lower score indicates a higher preference (most preferred = 1 and least preferred score = 4).
Results Summary
To summarise, the results from the experiment showed that SA improved with transparency, indicating that level 1 SA was highest for the high transparency condition, level 2 SA was highest in the medium (A) transparency condition, and level 3 SA was highest in the high transparency condition. For all SA measurements, high complexity traffic situations resulted in reduced levels of SA. Moreover, no significant effect of transparency on mental workload was observed, although a significant effect for complexity was found showing that higher traffic complexity resulted in higher perceived mental workload. Furthermore, TTC was highest for the medium (A)- and high level. TTC was also highest for the high complexity traffic situations. Finally, the medium (A)- and high transparency levels were rated as the most preferred by the participants.
Discussion
Summary of Predictions and Results Regarding the Effect of Transparency and Complexity on Situation Awareness, Mental Workload, and Task Performance.
Situation Awareness
For level 1 SA, the highest SAGAT scores were achieved with the highest level of transparency. In Endsley’s definition of SA (1995), level 1 SA is concerned with the perception of elements in their environment and provides the foundation for the higher levels of SA. In this study, it was anticipated that when the system provided information regarding its perception of its environment, that is, ‘condition detection’ (see Table 4), this would support level 1 SA. In this level of transparency, the CAGA system depicts which targets it has detected in the short and long range, the type of conflict with all detected targets, uncertainties in the sensor data, and the status of its sensors (see Table 3). This study anticipated that level 1 SA would be best for transparency levels in which the ‘condition detection’ information would be presented, that is, the medium (B)- and the high conditions. However, the results indicate that the highest level 1 SA scores were achieved only in the high transparency condition and not in the medium (B) condition. Furthermore, no significant difference was found between the high- and the medium (A) transparency condition in terms of level 1 SA, indicating similar SAGAT scores. This may indicate that the information depicted in the ‘condition analysis’ step (e.g. risk objects, intended trajectories, and priorities; absent in the medium (B) transparency condition yet present in the medium (A) condition) may have played a role in achieving improved level 1 SA. Possibly, the additional information regarding collision risk have made the participants more observant of the ship’s surrounding traffic and thus better able to achieve level 1 SA.
For level 2 SA, the highest level of SA was achieved with the medium (A) level of transparency regardless of complexity level. Again, this is as hypothesized as it is at this level the system’s analysis is depicted on the HMI and made available to the supervisor, for example, depicting risk objects, risk priorities, intended trajectories, conflict type, and safe speed parameters (see Table 3 and Table 4). However, the same level of level 2 SA was also achieved for the medium (B) level of transparency compared to the medium (A) level. In the medium (B) level of transparency, the CAGA system depicts which objects were detected in the short and long range, target type, relative motion, status and uncertainties of sensor data, that is no analytical information, yet participants were able to achieve equally high level 2 scores compared to the medium (A) level, where the system’s analytical information was readily available. For example, at the medium (A) level of transparency, the system depicts which objects it sees as posing a collision danger by extrapolating the objects’ current vector and highlighting the level of risk using specific symbology and colours. This way, participants could directly perceive the outcomes of the system’s risk analysis process and use this information to understand the system’s interpretation of the traffic situation. In addition to the medium (B) results, what is somewhat unexpected is that the same level of SA was not achieved in the high level of transparency condition. As the high transparency level includes all information from the medium (A) transparency level, that is, also the analytical information (see Table 3 and Table 4), one could reasonably expect that participants would score equally well on level 2 SA for both the medium (A)- and high transparency conditions. As this is not the case, one explanation may be that the additional information about the system’s detection and sensor information, as shown in the high transparency case (see Figure 5), may have distracted the participants in establishing an understanding of the system’s analysis.
Finally, for level 3 SA, the highest level was achieved with the highest transparency level. No differences were observed between the low and medium (A) transparency levels. To support level 3 SA, the system provided the future state prediction of own ship and target objects. The future state of own ship, that is, its future track and speed (see Table 4), was depicted for each level of transparency. The future state of target ships was depicted for the medium (A)- and high transparency levels but not for the other levels. As such, it would follow that either all transparency levels scored equal on level 3 SA, or that the medium (A)- and high transparency levels scored equal. However, given that only the high transparency level resulted in the highest level 3 SA scores makes this finding somewhat challenging to interpret. One explanation is that the high transparency level provided the complete picture of the system’s interpretation of the traffic situation: its decision and future actions, its analysis, and its object detections, including sensors states. Possibly, providing participants with a complete information overview allowed them to understand own ship’s future state more adequately, considering that they now have a more comprehensive information basis to build this on. In addition, based on the full picture, participants may be better able to reason towards the correct answer when answering the SAGAT.
For traffic complexity, SAGAT scores were lower for the high complexity traffic situations indicating it was more challenging to achieve a similar level of SA in the high complexity cases compared to the low complexity ones. This finding is consistent with earlier observations where increased number of objects presented to a supervisor, including their interactions, increases the number of goals and decisions to be made which, given the limitations of human information processing capabilities, will have an effect on how well SA can be achieved (Endsley, 1995). In terms of interactions between transparency and complexity, an effect was found for level 2 SA pointing towards a positive contribution of the depiction of the system’s reasoning, for example, risk objects, intended trajectories, and priorities, as present in the medium (A) transparency level, for high complexity cases.
Comparing our results to similar studies in which the relationship between transparency and SA was investigated, we find comparable results. For example, Roth et al. (2020) found improvements in SAGAT scores when participants were evaluating agent-generated proposals in an unmanned-manned helicopter teaming operation. In their study, level 3 SA was most improved in the high transparency condition compared to the low condition. Chen et al. (2014b, 2015) found improvements in SA when participants were supervising unmanned aerial vehicles in a search operation, and Selkowitz et al. (2017) reported improved SAGAT scores when monitoring an autonomous robot for level 2 and 3 SA, but not for level 1. However, some studies failed to identify a relationship between transparency and SA for supervision (Skraaning & Jamieson, 2021; Experiment 3) and monitoring tasks (Pokam et al., 2019; Selkowitz, Lakhmani, Chen, & Boyce, 2015; Wright et al., 2020). Overall, these studies point towards an overall neutral to positive relationship between transparency and SA, and this study has strengthened these findings.
Mental Workload
No effect of transparency on mental workload was found. For complexity, increased workload scores were found for all high complexity traffic situations, but there was no interaction effect with transparency.
Still, for one sub-dimension of the NASA-TLX scale: ‘Performance’ a significant relationship between transparency and mental workload was found. Here, participants rated their own performance in relation to the experimental task as better for the medium (A) transparency level compared to the other transparency levels. In other words, as the experimental task was to understand the traffic situation and the system’s handling of it, participants felt they achieved this best in the medium (A) transparency condition. Possibly, participants felt they had sufficient information in the medium (A) condition and therefore felt they were able to meet the goals of the experiment.
When comparing these results to similar studies where participants were tasked with monitoring an autonomous agent only, limited effects of transparency on mental workload were also reported (e.g. Du et al., 2019; Selkowitz et al., 2015, 2017; Wright et al., 2020). A study by Panganiban et al. (2020) found a reduction in mental workload as measured through the NASA-TLX when an autonomous agent communicated its intensions to support the participant in its task execution. Conversely, a study by Selkowitz et al. (2017) reported an increase in eye-fixation duration, a measure of visual search and mental processing (Di Nocera, Camilli, & Terenzi, 2007; Harris, Glover, & Spady, 1986), when monitoring an autonomous robot’s display for its actions.
In studies where participants took the role as a supervisor of an autonomous agent, mostly reductions in workload were found (T. Chen et al., 2014b, e.g. 2015; Skraaning & Jamieson, 2021; Experiment 1 and 2), although an increase (Guznov et al., 2020) and no effect (Skraaning & Jamieson, 2021; Experiment 3) were also reported. Finally, in studies where participants were asked to respond to system-generated proposals, no effect on mental workload was reported (e.g. Bhaskara et al., 2021; Loft et al., 2023; Mercado et al., 2016; Roth et al., 2020; Stowers et al., 2020).
This may imply that the relationship between transparency and mental workload depends on the type of task and role given to the participant (van de Merwe, et al, 2024a). In this experiment, participants did not interact with the autonomous CAGA system as they were only asked to perceive and comprehend its information. Although several of the studies mentioned above found a relationship between transparency and mental workload, 17 out of 23 indicators, as reported in the study by van de Merwe, et al, 2024a did not. This experiment’s result does not change the overall conclusion that adding information that supports transparency to an HMI has a limited effect on mental workload.
Task Performance
The results indicate participants take more time in building up a mental picture in the medium (A)- and high transparency conditions and less time in the low- and medium (B) transparency conditions. Participants consistently took more time to comprehend the traffic situation in the medium (A)- and high levels compared to the low transparency level. This was the case for both the low- and high complexity conditions indicating an equal effect of traffic complexity regardless of transparency level. The results were inconsistent with the hypothesis that the cognitive processes associated with developing a mental picture of the traffic situation would be supported when much of the information needed was readily available on the HMI for the higher transparency cases. It was also hypothesized that this effect would be stronger for the high complexity condition than the low complexity condition, but this was not the case.
Earlier studies have shown inconsistent effects for time-related performance measures associated with transparency. A recent study investigating the impact of transparency on decision risk in human-agent teams measured the time it took for participants to choose between two options suggested by a recommender system (Loft et al., 2023). No differences between various levels of transparency and decision time were found, except for an interaction between decision time and decision risk indicating that transparency alleviated the negative effect of increased risk on response time. A study performed by Skraaning and Jamieson (2021) found reduced response times to events in a nuclear control room simulation study. Here, control room operators were tasked with controlling a simulated nuclear power plant and handle small to large system upsets, including taking corrective action. A reduction in response time to system upsets were found in the transparency condition indicating a better task performance when information that supports transparency was integrated in the primary task HMI. Conversely, a study by Stowers et al. (2020) found an increase in response time with increased levels of transparency. In this study, participants were tasked with monitoring and controlling multiple unmanned vehicles and evaluate plans for these provided by an intelligent agent. Here, the addition of information that supports transparency in the form of basic projection and uncertainty information significantly increased response time, albeit with a small effect size. Finally, Wright et al. (2020) found no difference in the time participants took to identify and assess events when monitoring an autonomous robot.
In our study, response time was driven by the instruction for the participants to ‘continue to the next step when you feel you have built up a sufficient understanding of the traffic situation’, that is, the time needed for comprehension. In contrast with the aforementioned studies, in which participants were asked to evaluate plans, respond to events, or monitor autonomous agents, this study asked participants to build a mental representation of the traffic situation only. Considering that there were no significant differences in TTC between the medium (A)- and high transparency conditions and that both showed significantly higher TTCs than the low- and medium (B) conditions indicates that the analytical information contributed to the participants’ time needed to comprehend the traffic situations. Conversely, this also implies that the addition of the system’s detection information did not contribute to the participants’ TTC.
Considering Table 3 and Table 4, the information presented in the condition analysis step, represented in the medium (A)- and high transparency conditions, depicts elements primarily concerned with collision risk, for example, objects that pose a risk, risk object priority, conflict type, and their predicted course and speed. This information is essential in understanding the CAGA system’s risk determination and is the primary basis for interpreting the reasoning behind its avoidance actions. The information in the condition detection step, represented in the low- and medium (B) transparency conditions, primarily consists of elements depicting what the ship has detected, for example, objects in the short and long range, object type and size, and basic classification of relative motion. That is, whereas the analytical information is specific to objects posing a risk, the detection information covers all objects irrespective of risk.
In this experiment, the participants, all experienced navigators, took the role of a supervisor of a ship equipped with a CAGA system with the task to observe and understand the system’s depicted solutions to traffic conflict situations. Since the system’s analysis and avoidance actions are the most safety critical information to understand, participants may have taken additional time to evaluate the analytical information provided by the CAGA system, as presented in the medium (A)- and high transparency conditions, because they wanted to understand the situation as accurately as possible. The correlational results between TTC and SA support this assumption as participants with higher TTC values also have higher level 1 SA and level 3 SA scores. In other words, those that spent more time observing, interpreting, and understanding the traffic situations also scored better on the SAGAT. Similar results have been reported in eye-tracking studies where increased focus on critical information elements was correlated with improved SA (van de Merwe, et al, 2012). Alternatively, participants in the medium (A)- and high transparency conditions may also have taken more time to analyse the traffic situations because they were comparing CAGA’s analysis with their own. That is, rather than taking the system’s interpretation of the traffic situation at face value, the participants may have performed their own analysis first to ensure they were equipped with sufficient knowledge to be able to scrutinise the systems. Also, given that the CAGA system’s analytical information was not depicted in the low- and medium (B) transparency conditions, the TTC was less than the medium (A) and high transparency conditions because there was less critical information to evaluate and compare. Similar observations have been reported when operators are required to evaluate recommendations and need to compare these to system information and other information sources (Endsley, 2017). As such, considering the potential role of humans in the ship autonomy context where a thorough understanding of the CAGA system’s performance is essential for supervisory performance (van de Merwe et al., 2024b), this finding demonstrates the importance of addressing the type of information in developing transparent agents and not only the amount.
Practical Considerations
The results of this study imply that transparency has value as a design principle for designing CAGA systems given the positive results for SA. In addition, the qualitative feedback from the navigators about which of the levels of transparency they prefer clearly indicates a positive attitude towards HMIs depicting the system’s analytical information at minimum. Conversely, these results also clearly indicate which of the transparency levels were not preferred. For example, the low transparency level, that is, where the system only showed its decisions and planned actions, was the least preferred. In addition, the medium (B) transparency level, that is, where the system’s analytical information was not depicted, ranked just slightly better than the low level. Clearly, our participants preferred to have information about the system’s analytical information in addition to its decisions and planned actions, as indicated by the shared highest ranking of the medium (A) and high transparency levels. Nevertheless, there is no clear result pointing towards the optimal level of transparency across our dependent variables. This means that, when designing for transparency, it may be challenging to decide on which level to implement. Possibly, a more demand-driven transparency, that is, where users adjust the level of transparency depending on the task and context, can be used to provide control to the supervisor over the amount of system information presented. A study by Vered et al. (2020) demonstrated that such an approach could avoid the downsides of presenting transparency information whilst maintaining its benefits. For example, when applied to autonomous shipping, supervisors may only depict a low level of transparency in situations with little to no traffic whilst ‘dialling up’ the level of transparency for situations that require closer supervision. This way, this approach may improve comprehension times compared to the sequential transparency approach as used in our study. However, a potential risk associated with this approach is the potential for choosing an inappropriate transparency level and thereby overseeing important information. Furthermore, this approach allows for potentially large variation in how information is presented on the HMI and the possibility for confusion regarding which level is active. Although an iterative and human-centred design process should address these concerns when developing HMIs, future studies should investigate these risks further.
Limitations and Future Work
This experiment adjusted the transparency of a CAGA system for which information was overlaid onto static radar images. Our approach assumed that future operators of autonomous ships may need to divide their attention between multiple ships and/or tasks and may not continuously monitor a single ship. Therefore, when a ship requires attention, the supervisor may be ‘dropped-into’ the specifics of the operational traffic situation. Our study hypothesized that transparency facilitates this sense-making process needed to quickly build SA. However, despite significant effort put into making the traffic situations as realistic as possible, real-world situations are, of course, dynamic. As such, in dynamic situations supervisors would be able to build a mental representation of the developing traffic situation over time. Although this study provided insights into the effects of transparency on human performance variables in a maritime collision avoidance setting, future research should focus on the application of transparency implementation in dynamic settings, for example, by using real-time simulation facilities.
In this experiment, the CAGA system provided information about its perceptions, analysis, and future intentions regarding a traffic situation to the participants. Participants were only required to answer SA queries about the traffic situation and the system’s proposed handling of it. Through the development of the traffic situations and the transparency levels, significant effort was put into ensuring that the system provided sound conflict resolutions such that disagreements between the participants’ solution to a situation and the system’s solution were kept at a minimum and would not confound the results (van de Merwe et al., 2023a). As such, this experiment did not study the effects of incorrect resolutions or solutions that which the supervisor disagreed with. However, given the body of knowledge available about the potential pitfalls for humans in supervising automation (Endsley, 2017; Onnasch et al., 2014; Strauch, 2018), future work should elaborate on the effect of transparency on the supervisor’s ability to detect and resolve performance deviations, especially when performing under concurrent task demands, such as supervising multiple autonomous ships (Burmeister et al., 2014; Gegoff et al., 2023; Porathe, 2014; Tatasciore et al., 2023).
Conclusions
This study highlighted the relationships between agent transparency and human performance variables, SA, mental workload, and task performance. Our overall findings point towards improvements in all levels of SA as a consequence of transparency, albeit that different levels of transparency affect different levels of SA. In addition, this study found that more time was needed to create a mental representation of the situation when the system’s reasoning was depicted. Interestingly, no significant correlations between mental workload and SA, and mental workload and TTC were found. Given the relationship between task performance, SA, and mental workload (Wickens, Hollands, Banbury, & Parasuraman, 2013), these findings indicate an effort-performance trade-off where participants with increased SA scores also used more time to comprehend the traffic situations, albeit without increased mental workload ratings. Moreover, this study showed clear and consistent effects of complexity on both SA scores, workload ratings, and TTC, consistent with predictions from earlier models (e.g. Endsley, 1995, 2017). No interaction effects between transparency and complexity were found, except for level 2 SA, where transparency negated the effect of traffic complexity. Finally, the medium (A)- and high transparency levels were also the most preferred by the participants.
To summarise, as agent transparency is frequently operationalised through an HMI, our results imply that agent transparency has merits as a design philosophy when developing highly automated systems that require human supervision (e.g. see MITRE, 2018 for guidance). However, implementing transparency ‘is as much an art as it is a science’ given the risk of visual clutter and potential distractions caused by additional information (Wickens, 2018, p. 39). Also, the exact operationalisation of transparency depends on the domain it is applied to and the function allocation between humans and systems (Holder, Huang, Chiou, Jeon, & Lyons, 2021). Although, there is limited evidence-based guidance available for designers to develop transparent agents (Jamieson et al., 2022), this study demonstrated that, by basing the transparency design on a structured human-centred design approach, the purported effects of clutter and information overload were kept to a minimum whilst achieving improvements in SA. Hence, given supervisors have sufficient time available to process the additional transparency information, improved levels of SA may be achieved without burdening supervisors with additional mental workload. As such, if effort is made to integrate information supporting transparency in the primary task interface, human performance benefits can be expected.
Supplemental Material
Supplemental Material - The Influence of Agent Transparency and Complexity on Situation Awareness, Mental Workload, and Task Performance
Supplemental Material for The Influence of Agent Transparency and Complexity on Situation Awareness, Mental Workload, and Task Performance by Koen van de Merwe, Steven Mallam, Salman Nazir, and Øystein Engelhardtsen in Journal of Cognitive Engineering and Decision Making
Footnotes
Acknowledgements
Declaration of conflicting interests
Funding
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
