Abstract
Keywords
INTRODUCTION
The adoption of algorithmic and Artificial Intelligence (AI) decision support systems has been accelerating, particularly when their predictions and forecasts yield superior accuracy compared to human-derived estimates. This is notably evident in tasks possessing well-defined objectives, such as stock market optimization (Zuckerman, 2019) and chess (Silver et al., 2017), where a substantial body of evidence supports the enhanced performance of intelligent systems. As a result, these systems are taking over various domains, including flight management systems (Parasuraman et al., 2000), GPS route planners (Hoff & Bashir, 2015), forecasts of employee performance (Highhouse, 2008), financial advice (Javelosa, 2017), sales processes (Davenport et al., 2020), and many more. These systems can be used to acquire and analyze information, make decisions, carry out actions, or even monitor other systems (Parasuraman et al., 2000), and reduce errors (Aron et al., 2011).
However, inevitably, systems occasionally fail. Although there is substantial evidence demonstrating that intelligent systems outperform human judgment, users often exhibit reluctance to depend on algorithmic solutions (Fildes & Goodwin, 2007) and generally exhibit hesitancy to restore trust in such systems after experiencing a service failure (Dietvorst et al., 2015). That is, seeing algorithmic systems err makes people less confident in them and less likely to choose the system again as compared to relying on a human, even though the latter’s performance is likely to be inferior.
While users’ algorithm aversion after a system error is well documented, we find several unaddressed areas within the prevailing literature, which we aim to address in this present research. First, trust dynamics in recurring interactions have been largely overlooked, leaving questions regarding the evolution of trust and users' behavioral responses to system errors. Second, the impact of system terminology on consumer responses to system failures remains underexplored. Third, the literature has not thoroughly investigated the underlying mechanism that drives trust reformation and satisfaction. Lastly, there is limited research on mitigating negative responses towards nonAI systems without resorting to “AI washing” (Bini, 2018; Moore, 2017). Although communicators often use terms such as algorithm or AI to convey similar concepts, the terminology chosen influences how people perceive and assess these systems. For instance, companies identified as operating in the “AI” domain secure 15%–50% more funding in their rounds compared to other technology startups (Vincent, 2019).
The present research aims to address these gaps. In particular, we will examine the underlying factors that influence users' responses to system failures, trust reformation, satisfaction, and reactions towards nonAI systems. Hereby, we will focus on the role of system terminology intrust reformation at varying levels of task difficulty and examine users' causal attributions of system failures as an underlying mechanism. By examining these factors, we aim to contribute to a deeper understanding of user responses to system failures and to inform practical applications for mitigating negative reactions.
Drawing from attribution theory (Weiner, 1985), we show how the perceived stability and complexity of a system affect trust reformation after a system failure. We show that the use of varying terminology and descriptions for algorithmic versus AI systems creates different expectations of system stability, which in turn affects the rebuilding of trust in a system after a failure. We argue that the effect of system type on trust is mediated by stability attributions, whereby AI systems are ascribed lower stability and rigidness than algorithmic systems, ascriptions which increase trust in the AI system. Additionally, we propose that this mediation is further influenced by the specific task environment, particularly task difficulty.
In a series of studies, we show that when seeing an algorithmic system err, users tend to ascribe higher stability to the system’s error, leading them to be more reluctant to trust the system again and resulting in an overall lower satisfaction with the system’s performance. This research thus makes an important contribution to the current understanding of algorithm aversion and its underlying mechanisms, ultimately contributing to the development of effective strategies for promoting user acceptance and trust in such systems.
THEORETICAL DEVELOPMENT
System Types
In this paper, we use the term “system” to denote any computational method that produces output for the user without human intervention. Systems can vary in complexity depending on the tasks they are designed to perform. The concept of “artificial intelligence” (AI) is prevalent in this field, but its definition remains contested. Prior research has identified definitions of AI that vary in inclusiveness (Bini, 2018; De Bruyn et al., 2020).
The broadest definition of AI is the intelligence exhibited by machines (Shieber, 2004). Brooks (1991) stated that AI aims to enable computers to perform tasks that, when executed by humans, are deemed intelligent. However, this broad definition can encompass anything from simple regression analysis to more complex AI applications. This ambiguity has led to companies claiming to offer AI-based products and services, a strategy known as “AI washing” (Moore, 2017), which is often met with skepticism by AI researchers. For instance, some elements of Salesforce’s “Einstein AI” rely on basic regression analyses.
The narrowest definition of AI proposes using the term “AI” exclusively for “artificial general intelligence” (AGI), defined as a machine’s ability to understand or learn any intellectual task a human can perform (Goertzel, 2014; Thórisson et al., 2015). Current AI applications focus on specific, limited tasks, such as chess, facial recognition, or ad click prediction. These applications, referred to as “weak AI” or “narrow AI,” do not learn beyond their programmed domains. In contrast, “strong AI” or “artificial general intelligence” involves machines that can “learn to learn” (Kurzweil, 2014).
De Bruyn et al. (2020) propose a balanced definition of AI as “machines that emulate human intelligence in tasks such as learning, planning, and problem-solving through higher-level, autonomous knowledge creation.” This definition offers several advantages, including the restriction of AI to algorithms that autonomously generate new constructs and knowledge structures. Anderson and Krathwohl’s (2021) revision of Bloom’s taxonomy of educational learning objectives places creation at the highest level of learning, which is key to distinguishing AI from traditional statistical techniques.
In this research, we adopt this balanced definition of AI and differentiate between
Trust in Systems
Research on trust in systems has identified three key aspects impacting users’ trust: system characteristics, user characteristics, and the external environment (Hancock et al., 2011). Focusing on system characteristics, trust can be influenced by
The capabilities of a system that most strongly affect trust are the system’s consistency and reliability over time (McLeod et al., 2005). A system’s predictability and dependability engender trust and enable users to continue trusting the system over time (Cahour & Forzy, 2009; Muir, 1987). Users expect near-perfect performance from automation (Dzindolet et al., 2001), with system errors causing a rapid decline in trust (Dzindolet et al., 2001; Lee & Moray, 1994; Wiegmann et al., 2001). This decline in response to an error is stronger for algorithms than humans and is referred to as
Second, user characteristics, such as self-confidence (Lee & See, 2004) and expertise (Fan et al., 2008), have been shown to significantly affect users’ level of trust in automation. Nonetheless, these enduring traits are not the primary focus of this study.
Third, external environmental factors impact users’ trust in automation. For example, task difficulty plays a role, with trust diminishing more when a system fails at simple tasks rather than complex ones, despite overall system performance surpassing human capabilities (Dzindolet et al., 2003; Madhavan et al., 2006). Additionally, task-related risk affects users’ trust in the system, although results have been ambiguous. While some researchers have found that users tend to reduce their reliance on automation as compared to humans when greater risk is involved (Ezer et al., 2008; Rajaonah et al., 2008), others have found the opposite effect (Lyons & Stokes, 2012).
Common to most of the above research on the influence of human, system, and environmental factors on users’ trust in systems is a view of trust as a steady-state variable rather than a dynamic attitude that evolves along with the interaction (Dzindolet et al., 2003). Indeed, only a few researchers have considered the dynamic nature of trust (Desai et al., 2013; J. D. Lee & Moray, 1994; J. Lee & Moray, 1992; Manzey et al., 2012; J. X. Yang et al., 2021; X. J. Yang et al., 2016, 2017). One of our objectives in the present study is to examine how trust evolves as a user undergoes repeated interactions with a system after the system has erred. In this way, we seek to gain a better understanding of how trust in systems may be rebuilt over long-term interactions.
Another consideration largely missing from prior research on trust in systems is what effect may arise from the choice of terminology used to describe the system. Studies in related fields have demonstrated that the choice of terminology can influence user behavior (Abraham et al., 2017). In the context of this study, terms like “smart systems,” “algorithms,” “artificial intelligence,” and “automated systems” are often used interchangeably to describe a computational formula that interacts autonomously with humans by processing inputs and generating outputs based on statistical models or decision rules. Most commonly, researchers have employed the terminology of traditional rule-based systems (e.g., “algorithms,” “automated systems,” and “decision support systems”). These researchers have found that users perceive problems arising from such systems to be more permanent than problems due to human error, as the former type of problem is perceived to result from standardization (Diaz & Ruiz, 2002). That is, traditional automated agents are thought to be invariant, whereas human agents are viewed as adaptable (Dijkstra et al., 1998; Höddinghaus et al., 2021). Automated systems are thus perceived to be incapable of learning (Dawes, 1979), incapable of considering individual targets (Grove & Meehl, 1996; Hamilton et al., 2021), and incapable of incorporating qualitative data (Grove & Meehl, 1996), but highly efficient (Galiere, 2020).
However, Langer et al. (2021) have shown that terminological differences used to refer to systems may directly affect users’ expectations, evaluations, and perceptions of a system. Specifically, referring to a system as “AI” seems to lead users to perceive the system as more complex in comparison to “decision support systems,” “automated systems,” “algorithms,” “computers,” or “technical systems” (Langer et al., 2021). A comparison of two studies illustrates this point: Lee (2018) and Marcinkowski et al. (2020) both had the same research objective, namely, to analyze the use of a system that automatically evaluates applicant information and recommends rejection or approval of applicants. Lee (2018) referred to the system as an “algorithm” and found that people preferred a human manager over the algorithm. Marcinkowski et al. (2020), on the other hand, framed the system as “AI” and found that users prefer the AI system over the human in hiring. Given Langer et al.’s findings, it is quite possible that the differing terminology played a substantial role in determining the disparate outcomes of these two studies.
In the present study, building on the research of Langer et al. (2021), we aim to offer additional empirical support for the hypothesis that what a system is called (i.e., the terminology used to refer to it) directly affects users’ level of trust in that system. We will also suggest ways in which previously established findings may need to be reevaluated considering the increasing popularity of terminology such as “artificial intelligence” (or “AI”).
Attribution Theory and System Stability
To understand the dynamic nature of trust formation following a system failure, we draw on Weiner’s well-established causal attribution theory (1985), which is concerned with the cognitive processes by which people draw conclusions (“attributions”) as to the cause of a behavior (“causal attributions”). The theory has provided a useful framework for examining consumers’ behavioral response to service failures, including the impact of a failure on their level of trust and the rebuilding of that trust (Tomlinson and Mayer 2016) as well as the impact on customer satisfaction (Folkes et al., 1987; Hess et al., 2003).
In Weiner’s (1985) model, following a negative outcome, a trustor determines the outcome’s cause, evaluating that cause along the two attributional dimensions
Weiner (2000) concluded that attributions along the two dimensions of controllability and stability drive consumer reactions towards service errors. Although Weiner’s model was designed to study human–human interactions, the model puts little constraint on whom exactly is observed, and the theory is applicable in describing outcomes of any type of agent (van der Woerdt & Haselager, 2019). Thus, researchers have used Weiner’s attribution model to explain human-technology interactions. Findings show that users make stronger attributions of stability to technology-related causes than to human-related causes, resulting in stronger trust reduction in the former case (Belanche et al., 2020; Dijkstra et al., 1998). Hence, following a poor outcome of technology, users expect the system to be more likely to fail again, as the system cannot learn from its mistakes (Dawes, 1979), resulting in a negative effect on customers’ reactions. When the failure is due to a human, users expect the performance to be less stable, minimizing the negative effect on trust (Belanche et al., 2020).
Other scholars have argued along similar lines that users are reluctant to rely on systems because they believe that such systems will not fully consider the users’ unique individual circumstances (Longoni et al., 2019) and are unable to incorporate qualitative data (Grove & Meehl, 1996). Trust in systems is also lower because responsibility and blame cannot be shifted to a system and users have the implicit assumption that a system’s prediction should be perfect as the future is highly predictable, which is often not the case (Dawes, 1979). Further, users perceive systems to suffer from reductionism, meaning users think that certain qualitative information or contextualization is not being taken into account by the system which results in less accurate decisions (Newman et al., 2020). In contrast, unlike systems, humans are believed to be capable of improving through experience (Highhouse, 2008), detecting exceptions to the rule (Dietvorst et al., 2015), and providing explanations for their decisions (Armstrong, 1980).
Besides influencing trust, distinction in attributional controllability and stability also affects another user response, namely, satisfaction. Satisfaction following product failure is largely influenced by attributions of stability (Folkes, 1984). If a failure is attributed to stable factors, it leads to negative future expectations and lower satisfaction. However, if the failure is seen as having arisen from unstable causes, such as an isolated incident, it leads to positive expectations and higher satisfaction.
Modern systems are evolving beyond rigid rule-based structures, increasingly embracing self-learning capabilities. Such systems are predominantly labeled as Artificial Intelligence, or AI. Among ten terms most commonly used to refer to a system (e.g., “algorithms” and “decision support system”), “artificial intelligence” was found to be associated with the highest level of machine complexity (Langer et al., 2021), where the complexity of a system refers to the length of a concise description of a set of the entity’s regularities. Low complexity is ascribed to systems that are completely regular, whereas more complex systems are adaptive and are built to evolve strategies (Adami, 2002).
Drawing from attribution theory, then, the perceived high complexity of AI systems as compared to traditional rule-based algorithmic systems could potentially reduce stability ascription, which in turn could lead to a minimized effect on trust reduction as compared to algorithmic systems in case of failures, which are defined as a system’s performance falling below the user’s expectations (Hess et al., 2003). More specifically, the outcome variables of the present study are trust in the system, trust rebuilding, and overall satisfaction following the series of interactions, all variables which have previously been used as measures of behavioral response (Hess et al., 2003; Tomlinson and Mayer 2016). Hence, we suggest:
Users’ behavioral response after system failure will be more negative for systems described as “algorithmic systems” than for those described as “AI systems”: (a) trust will be lower, (b) trust will be rebuilt more slowly, and (c) satisfaction will be lower.
Attribution of stability mediates the effect of system type on behavioral response: stability attributions will be lower for systems described as “AI systems” than for “algorithmic systems.”
Task Difficulty
Another relevant factor affecting trust after a system failure is the difficulty of the task in regard to which the failure occurred. Madhavan et al. (2006) found that automation failures on tasks that are perceived as easy have a greater negative impact on trust than failures on tasks that are perceived as difficult. This finding can be explained by attribution theory as follows. Weiner (1985) and Kelley (1973) suggest that consumers make controllability attributions either to internal or external causes. Effort and ability are seen as internal causes, whereas luck and task difficulty are perceived as external causes (Weiner, 1985). The importance of the internal properties of an agent might be discounted if there is high external justification for the task failure. That is, the presence of an external factor that impedes success (e.g., a difficult task) means that failure can be attributed, at least in part, to the task or environment rather than the actor, providing a basis for weaker attributions towards the actor in case of failure. Thus, failure at a more difficult task leads to a heightened external attribution (lower controllability) and suppresses internal attributions. If the failure is judged to be caused by the environment in this way and not by the properties of the actor, the cause is perceived to be less stable (Anderson, 1983; Kelley, 1973). Building on this, one way to increase trust in algorithmic systems and reduce the difference between trust in AI systems and algorithmic systems is to increase task difficulty. Hence, we suggest:
Task difficulty mitigates the effect of system type on behavioral response.
When task difficulty is high, controllability attributions will decrease, which subsequently leads to reduced stability attribution compared to scenarios with lower task difficulty.
Task difficulty acts as a mitigating factor, causing the difference in behavioral responses between algorithmic and AI systems to be less pronounced in more difficult tasks. This is attributed to the dampening effect of task difficulty on the influence of system type on behavioral response.
OVERVIEW OF STUDIES
Across three studies we examine the effect of different system types on consumer behavior following a system failure. In our studies, we use interactions prior to the failure to form user expectations. Study 1 shows how the two key dimensions of attribution theory, namely, controllability and stability, can explain differing reactions users have towards what they perceive to be AI versus algorithmic systems. We show that system type affects consumers’ reactions solely through the stability attribution. Stability attributions are further significantly predicted by controllability attributions, which are lower for more difficult tasks. Study 2 tests the theorized model in a real world setting and shows that the difference can be mitigated by changing causal attributions to the controllability dimension. In the domain of stocks, behavioral response differs for system types. In the more complex task of cryptocurrencies, the difference is eliminated. Hence, we show that errors at more difficult tasks (less controllability attribution) mitigate the difference in behavioral response between the system types. Study 3 demonstrates how changing causal attributions to the stability dimension by providing social account also eliminates the difference in user behavior. This research complied with the American Psychological Association Code of Ethics. Informed consent was obtained from each participant.
Study 1
Overview
The aim of Study 1 was to analyze the effect of system type on the dynamic nature of trust, trust restoration, and satisfaction. Further, we measured users’ perceptions of the key dimensions of attribution theory, namely, stability and controllability, to provide an account of our findings. Prior research has shown that trust is lost faster and more persistently (Dietvorst et al., 2015) with technological agents as compared to human agents. However, the systems in focus in most of those studies were rule-based systems. Drawing from attribution theory, these systems are ascribed high stability, leading to a strong and persistent decline in trust after a failure. The increasing predominance of self-learning AI systems in place of rule-based algorithmic systems may be expected to lead to consumer reactions of trust more similar to those observed after the failure of human agents.
Study 1 was an online study in which participants were asked to use real data to forecast students’ GPA scores. In every round, participants received information on a student and the forecast of an expert system on the respective student’s GPA score (Online Appendix A). Half of the participants were told that they received the forecast of an algorithmic system, the other half that they received the forecast of an AI system. In point of fact, the forecasts received by the two groups were identical (see below for details of the participant selection process and overall methodology). In consideration of the dynamic nature of trust establishment, loss, and rebuilding, we measured three aspects of participants’ behavioral response: trust in the system measured in every round (both before and after the error occurred), the pace at which trust was subsequently regained, and the level of overall satisfaction with the system.
Analyzing the results, we first conducted an independent t-test with trust after the error as the dependent variable and system type as the independent variable. Then, we considered trust rebuilding by conducting a mixed-model analysis with trust measured after every round as an independent variable. To explain our findings using attribution theory, we conducted a mediation model with satisfaction as the outcome variable, system type as the predictor (AI vs. algorithm), and the two attributional dimensions (stability and controllability) as mediators. We included perceived task difficulty as a second predictor.
Participants
We recruited 300 participants through Amazon’s Mechanical Turk (hereafter “MTurk”) and randomly assigned them to one of the two system type conditions. Participants received $1.50 for their participation, and the four participants with the best overall forecasting accuracy additionally each received a $5 bonus payment. For all studies, participants were excluded as outliers if (1) they failed to pass a simple attention check at the beginning of the study, (2) they were identified as speeders using Mahalanobis Distance (with threshold at .999 quantile of the χ2p) (Hair, 2009), or (3) their response quality was identified to be degraded using intraindividual response variability (IRV). Using IRV we calculated the “standard deviation of responses across a set of consecutive item responses for an individual” across all 12 trials (Dunn et al., 2018, p. 108). The IRV can detect degraded response quality, and research suggests removing respondents with unreasonably low IRV scores, as these may reflect straightlining responses (Dunn et al., 2018). Respondents with unreasonably high IRV scores were also removed, as these may reflect highly random responses (Marjanovic et al., 2015). On these grounds, 14 participants were excluded from the data analyses in Study 1 yielding a final sample size of 286 participants (Mage = 38 years, 60% male).
Procedure
Manipulation of the System Description Used in Study 1
To ensure that our manipulation was successful, we measured the participants’ understanding of the mechanical or analytical nature of the system that had generated their prediction (Belanche et al., 2020). Participants were asked to rate each system’s behavioral base on a 7-point semantic differential scale consisting of “low learning skills/high learning skills” and “based on repetition/based on analyzing and adapting” (
Then, the actual forecasting task began. In order to increase engagement in the task, participants were told that the four participants with the highest forecasting accuracy would each receive a $5 bonus. After each round of forecasting, participants received feedback consisting of the respective student’s actual GPA, the participant’s prediction error, and the average prediction error across all previous predictions. To establish a baseline and familiarize each participant with the average performance of the respective system from which they received forecasts, the system’s prediction over the first five rounds (prior to the system’s failure) was kept close to perfect. Then in round six the system’s forecast provided to each participant was off by over 20%. After this error, from round seven onward, the system’s performance was again kept close to perfect. Participants’ trust in the system in every round was measured using a scale for trust in human-machine systems adapted from and validated by Jian et al. (2000). That is, after each round participants were asked
Next, participants were asked questions about attributional dimensions, satisfaction, and demographics. The items used to measure attributions were adapted from previous studies (Higgins & LaPointe, 2012; Peterson et al., 1982). Controllability was measured by asking respondents to evaluate the system’s controllability over the major cause that led to the error on a scale from 1 (= a cause over which the system has no control) to 7 (= a cause over which the system has complete control). Stability was measured with two items involving the likelihood of a similar error occurring again on a scale from 1 (= not at all likely) to 7 (= very likely),
Results and Discussion
First, we confirmed that our manipulation of system type was successful, and that the algorithmic system was perceived to be more rule-based and mechanical (
Next, we analyzed the effect of system type on trust rebuilding. To analyze how trust is rebuilt after seeing the system err, we ran a mixed-effects model of the six rounds that followed the mistake. Our outcome variable was trust in the system’s forecast as measured after each round. We added random intercepts for participants, allowing intercepts to vary across participants. The results reveal a nonsignificant main effect of type of system on trust ( Trust regaining process after the occurrence of a system failure.
Finally, in order to explain the underlying process we conducted a mediation analysis, which included as mediators (Hayes, 2017) the causal attribution dimensions of Weiner’s attribution theory responsible for explaining postconsumption reaction following a service failure (Weiner, 2000). Perceived task difficulty was also included as a predictor. The effect of system type on satisfaction with the system was tested through the two attributional dimensions controllability and stability as mediators. As Preacher et al. recommend (Preacher et al., 2007), we used bootstrapping to test multiple mediator models, and we report bootstrap estimates derived from 10,000 bootstrap samples along with bias-corrected 95% confidence intervals.
Figure 2 shows the paths diagram of the parallel mediation analysis. System type was coded as 0 = AI and 1 = algorithm. Task difficulty predicts controllability attributions ( Attribution of stability mediates the effect of system type on satisfaction (*
Thus, as expected, there is a significant indirect effect of system type on satisfaction rating, an effect mediated by stability attribution. Purported algorithmic systems are attributed higher stability, which in turn decreases satisfaction ratings. In addition, tasks that are perceived to be more difficult result in lower attribution of control, which leads to lower stability attributions, and in turn to higher satisfaction after seeing a system err.
Taking the results together, we demonstrate that an error by what users think to be an algorithmic system has a more negative impact on users’ attitudes and behavioral intentions towards the system than the same error has by what users perceive to be an AI system. We show that this phenomenon can be explained through attribution theory, whereby reduced stability attribution towards novel, self-learning systems mitigates the negative impact of the system’s error on behavioral intentions towards the systems. Our results show that system type does not affect the attributional dimension control but is mediated by stability attributions. Based on these results, we suggest that more difficult tasks will lead to decreased stability attributions, which mitigate the negative effect of a system’s error on behavioral intentions. These findings are consistent with prior findings on behavioral reactions towards human agents versus algorithms (Dietvorst et al., 2015) and suggest the same underlying functioning: Decreased stability ascription towards humans as compared to algorithms results in the same effect as decreased stability ascriptions towards AI systems.
Study 2
Overview
Study 2 tests our proposed framework in another domain (finance). With this study we wanted to confirm whether increasing task difficulty changes the causal attribution of the first attributional dimension, controllability, and mitigates the differing effect of system type on behavioral responses caused by the second attributional dimension, stability. We used a 2 × 2 between-subjects design, which included the same manipulation of system type (algorithmic system vs. AI system) as Study 1. In addition, task difficulty was manipulated by using price forecasts in the domain of stocks versus the domain of cryptocurrencies (the latter forecasts being more difficult). Again, as dependent variables we included trust in the system (measured in every round before and after the error occurred), the pace at which trust is subsequently regained, and overall satisfaction with the system as central measures of behavioral response. When analyzing the results, we first conducted two independent t-tests, one for each domain, with trust after the error as the dependent variable and system type as the independent variable. Then, we analyzed trust rebuilding by conducting two mixed-model analyses, again one for each domain, with trust measured after every round as an independent variable. Finally, we conducted two independent t-tests for the domains with satisfaction as the dependent variable and system type as the independent variable.
Participants
700 were recruited to fill out an online survey via Amazon’s MTurk. Participants received $1.50 for their participation, and the participant who achieved the best overall forecasting accuracy additionally received a $20 bonus payment. Using the same rules as in study 1, 43 participants were removed from the sample resulting in a final sample size of 657 participants (Mage = 43 years, 59% male).
Procedure
The task procedures and measurements used for the study were adapted from prior research by Prahl and van Swol (2017) but using a financial task similar to the one used by Önkal et al. (2009). Participants were first provided with a short task description. Half of the participants were randomly assigned the task of forecasting weekly closing prices of stocks, the other half of cryptocurrencies. The game consisted of 12 rounds in total. In each round, participants received information on the performance history over the past 10 weeks of one stock or cryptocurrency. The 10-week closing prices and the final closing price in week 11 were based on real data obtained from the publicly available source MarketWatch. In this way, we kept the scenario as close to reality as possible, however leading to a difference between the stock and cryptocurrency data. The particular stock’s/currency’s name was not disclosed, but all of them are publicly traded. Based on the information provided, participants had to forecast the respective stock’s/cryptocurrency’s price at the regular market closing time. Half of the participants in each of the above subject pools were then provided with an estimate from what they were told was an algorithmic system, the other half from what was described as an AI system. Based on the system estimates provided, participants could reestimate the closing price and rate their trust in the system. To reduce potential biases, the system’s prediction was calculated so that the forecasting error in percentage was equal between the two system groups. Trust in the system was measured using the same item and scale from Jian et al. (2000) used in Study 1. Trust was measured in this way after each round and was the central dependent variable for the study. After each round, participants also received feedback on the actual price at closing time, the accuracy of their postadvice forecast, and their average forecast error across all previous forecasts.
To ensure that the manipulation of task difficulty would be successful, all participants received the following information following the task description: The key differences between investing in stocks and investing in cryptocurrencies are: The market for cryptocurrencies is generally a highly unstable market and prices move very aggressively. In comparison, the stock market is a much more stable market with less price fluctuations.
The exact description for the stock condition can be found in Online Appendix B. (If participants were ascribed to the cryptocurrency condition, the word “stock” was exchanged by “cryptocurrency.”).
Similarly, each participant was introduced to the characteristics of their respective system, either algorithmic or AI, the same as in Study 1. Also as in Study 1, the same scale from Belanche et al. (2020) was again used to ensure participants’ understanding of the mechanical or analytical nature of the system (
After this, the real trials began. Forecasts given over the first five rounds were kept close to perfect for both types of systems in order to establish a baseline and familiarize participants with the system and its average performance. The mean absolute percentage error in every trial is between .7% and 1.09%, which were comparable to the forecasting errors of a real AI model based on the same financial data of other stocks (Vijh et al., 2020). As in Study 1, the forecasts presented were identical across system groups, i.e., for that half of the participants being told that their system was algorithmic, and the other half being told that their advisor was AI. However, the absolute forecast numbers differed between cryptocurrency and stock groups, because the data was based on real data. Then, in round six the system’s forecast was off by almost 20%. After this failure, the systems performed again close to perfect for the remaining rounds.
After completing the 12 rounds, we measured perceived task difficulty using three adapted measures from Diacon (2004) (
Results and Discussion
Results showed that our manipulation of system characteristics was successful, and that the system identified as algorithmic was perceived to be more mechanical and based on repetition (
We then ran the same ANOVA analysis but additionally controlled for trust ratings in the round prior to the mistake. The results confirmed the significant main effect of system type (
To analyze how trust is rebuilt after consumers have seen the system err, we ran two mixed-effects models of the six rounds that followed the mistake: one for high task difficulty and one low task difficulty. We ran two models because the underlying data presented to our participants differed between the cryptocurrency and stock groups. Therefore, directly comparing the two groups could lead to erroneously disregarding other underlying factors, such as the effect of different absolute values presented and the effect of greater variations between the closing price and the forecasts. Our outcome variable was trust in the system’s forecast as measured in each round. We added random intercepts for participants, which allowed intercepts to vary across participants.
For low task difficulty, the results showed a nonsignificant main effect of type of system on trust (
The same analysis for high task difficulty showed slightly different results: a nonsignificant main effect of type of system on trust (
We also ran a 2 × 2 ANOVA with task difficulty and system type as independent variables and satisfaction as the dependent variable. Results showed a significant main effect for system type (
In Study 1, we showed how the effect of system type can be explained in terms of attribution theory and suggested that more difficult tasks should mitigate the effect of system type on behavioral response. In Study 2, we have shown that increasing task difficulty indeed mitigates the negative effect on behavioral response of errors by what users think to be algorithmic systems. Study 2 aimed to examine the influence of modifying the first attribution dimension, assumed to be controllability, on behavioral response to different system types, without measuring controllability or stability. The combination of Studies 1 and 2 might imply that the results observed could be attributed to controllability.
Study 3
Overview
Study 3 aims to show how altering causal attributions to the system, the second attributional domain, affects the user’s behavioral response.
Literature has identified four immediate actions that may improve the restoration of trust after it is lost (Lewicki et al., 1996). These are acknowledging the violation, determining the cause of the violation and admitting fault, admitting the act was destructive, and accepting responsibility for the consequences. Likewise, research by Dzindolet et al. (2003) has shown that offering a rationale for a machine’s failure may improve trust restoration.
According to attribution theory, initial judgments of a source’s (in this case, the system’s) trustworthiness after a negative outcome are not always final but can potentially be modified by taking into account additional information (Weiner, 1985). One type of reparative effort by the trustee that has been shown to be effective is the provision of
As Study 1 shows and Study 2 suggests, trust repair is higher for what users perceive to be AI systems than for algorithmic systems due to the lower stability attribution made towards AI systems. Reducing stability attribution towards algorithmic systems by the provision of social accounts should, therefore, more strongly affect trust repair for these systems. For these reasons, we hypothesize the following:
The effect of system type on behavioral response is mitigated by providing social accounts for the mistake. For Study 3, we used the same stock price forecasting task as in Study 2 (now without the cryptocurrency condition). We also used the same manipulation of system type, but for Study 3 we additionally manipulated social account by providing explanations to half of the participants of why the mistake occurred and no explanation to the other half of the participants. To analyze participants’ trust after the error, we ran an ANOVA with trust after the error as the dependent variable and system type and social account as independent variables. As with the other studies, to document the dynamic nature of trust we ran a mixed-model analysis with trust level after each round after the error as the dependent variable and system type and social account as the independent variables. Finally, we analyzed satisfaction ratings via two independent t-tests, one for the condition with social account and one for the condition without social account, with system type as the dependent variable.
Participants
Six hundred participants were recruited to fill out an online survey via Amazon’s MTurk. All participants received $1.50 for their participation, and the participant with the best overall forecasting accuracy additionally received a $20 bonus payment. Based on the same rules as in the two previous studies, 39 participants were removed from the sample. These eliminations resulted in a final sample size of 561 participants (Mage = 44 years, 56% male).
Procedure
Participants were randomly assigned to one of four conditions in a 2 × 2 between-subjects design (AI system vs. algorithmic system; social account vs. no social account). The manipulation of system type was identical to the one used in Study 2, and the overall procedure was the same as the one for Study 2 (in the stock condition), except that after the error occurred, half of the participants received a brief social account explaining the unanticipated behavior and bridging the gap between expectations and outcome. The exact information given to participants in the social account condition can be found in Online Appendix D. Satisfaction was measured using the same three items from Vaccaro et al. (Vaccaro et al., 2018) as before (
Results and Discussion
We ran a 2 × 2 between-subjects ANOVA with social account (yes/no) and system type (AI/algorithm) as independent variables, and trust rating after the mistake as a dependent variable. The results reveal a significant main effect of system type (
To rule out an effect of prior trust in the system, we ran the same model again, but now additionally controlling for the participants’ trust rating in the round before the mistake occurred. In this way, we could isolate the effect of the mistake from the overall effect of the system type. The results confirm the significant main effect of system type on trust rating (
To analyze whether social account has the expected effect on the regaining of trust, we ran a mixed-effects model. Our outcome variable was trust in the system’s forecast as measured in each round after the mistake. We added random intercepts for participants, thus allowing intercepts to vary across participants. Results show that the main effects for system type (
We also computed two independent t-tests for satisfaction ratings, one for the group with social account and one for the group without social account. These tests showed that when no social account was provided, satisfaction was higher for the AI system (
Based on these results, Figure 3 illustrates that for purported algorithmic systems, trust can be regained faster by providing a social account. However, for purported AI systems this is not the case, thus confirming H4. Trust regaining process after the occurrence of a system failure. Plotting level of trust over 7 consecutive rounds shows that social account (social account = yes) positively affects the regaining of trust in algorithmic systems (with high stability ascription) but not AI systems (with low stability ascription).
DISCUSSION
Theoretical Contributions
Our research offers four notable contributions to the literature on trust in systems. First, we go beyond the traditional view of trust as a static variable by examining the dynamic nature of trust in recurring interactions. We focus on analyzing user behavioral responses to a system’s error, considering trust in system advisors as an evolving variable over time. Consequently, we investigate the levels of trust before and after a system’s predictive error, the sequential process of rebuilding trust, and consumer satisfaction throughout the entire interaction series.
Second, we demonstrate that the terminology and descriptions used for systems significantly influence consumer responses to system failures. In various contexts, such as media, policy making, or research, systems are referred to by different terms like “artificial intelligence” (AI), “algorithms,” or “decision support systems.” However, previous research indicates that the characterization of a system (e.g., “AI” vs. “algorithms”) impacts user expectations (Langer et al., 2021). Building on Dietvorst et al. (2015), we reveal that portraying a system as “AI” (a self-learning and more complex system) rather than “algorithmic” (a simpler rule-based system) results in more positive user responses after system errors.
Third, our research uncovers new insights into how system terminology influences causal attributions, which subsequently affects trust rebuilding and satisfaction. We also explore the impact of perceived system complexity variations on algorithm aversion. By contrasting AI systems (perceived as more complex) with algorithmic systems, we determine that user reactions to errors depend on the perceived system category. Mediation analysis further explains this effect using attribution theory (Kelley, 1973; Weiner et al., 1976). Stability ascription is higher for algorithmic systems than AI systems, and providing reasons for errors can change causal attributions. However, this adjustment is less effective for AI systems since stability ascription does not limit trust reestablishment. This shift in attribution also influences satisfaction ratings, with the difference between systems only being significant when no social account is offered.
Lastly, our fourth goal is to identify how the difference due to system classification can be mitigated. We showed that altering causal attributions towards lower controllability (by increasing task difficulty) as well as decreasing the system’s stability attribution (by offering social accounts) can mitigate the effect of system type on user behavior. As some systems are not self-learning and hence cannot be referred to as “AI,” we show how users’ tendency to respond relatively more negatively to algorithmic systems can be mitigated, either by changing controllability attributions (e.g., by noting the high complexity of the task involved) or causal attributions (e.g., by providing an explanation of the reasons for the system failure). The growing trend of firms falsely labeling their technology as “AI” when using rule-based systems, known as “AI washing” (Bini, 2018), erodes consumer and investor trust and hinders AI adoption (Vincent, 2019). This phenomenon can be compared to “autonowashing” (Dixon, 2020) in the driving automation domain, where misleading system naming impacts user expectations and trust. Studies have shown that the name of a driver assistance system influences user behavior (Abraham et al., 2017). Consequently, our findings offer guidance on how to lessen the negative reaction towards nonAI systems without resorting to “AI washing.”
These results have important implications for our understanding of algorithm aversion. Previous research results regarding trust in systems have been ambiguous (Castelo et al., 2019; Dietvorst et al., 2015, 2018; Logg et al., 2019; Marcinkowski et al., 2020). We suggest that part of the differences in findings may be due to the terminology used to describe the systems in question. A wide variety of terms have been used in previous studies to describe a system, including “artificial intelligence” (Marcinkowski et al., 2020), “algorithm” (Dietvorst et al., 2015, 2018; Lee, 2018), and “automated system” (Keel et al., 2018). Although these terms are similar, we have shown that the use of different terms may shape consumers’ perceptions of systems and consumers’ responses to these systems. Using the term “algorithmic system” may increase algorithm aversion, while using the term “artificial intelligence” may mitigate this same phenomenon. We thus highlight the importance of terminology, as this can unintentionally impact the robustness and replicability of research findings.
Practical Contributions
Given the ubiquitous nature of automation and AI in various fields, our research findings bear significant practical implications across a range of applications. In media, policy making, research, customer care, and other domains, systems are variously referred to as “algorithms,” “decision systems,” “artificial intelligence,” and “automation,” amongst other terms. In our own present research, by comparing consumer behaviors in interactions with what are presented as “AI” versus “algorithmic” systems, we have shown that describing a system as “AI” triggers more beneficial responses. Following a service failure, customers interacting with what they believe to be an AI system are more willing to continue trusting the system, are more prone to rebuild any loss trust, and are more satisfied after the interaction than are customers who have been told they are interacting with an algorithmic system.
Emphasizing the practical relevance, it is crucial to recognize that such dynamics of trust and responses are not limited to mere consumer products but extend to domains where the stakes are high, such as medical diagnostics, financial advisory, and autonomous transportation. In healthcare, for instance, designating decision support systems as AI could foster higher levels of trust among patients and medical staff, especially following errors and inaccuracies, thus facilitating the adoption and efficient utilization of such systems in critical care settings. Likewise, in the financial sector, designating algorithmic trading systems as AI can potentially lead to increased investor confidence and resilience in the face of market volatility. Furthermore, in the realm of autonomous vehicles, clear and strategically chosen terminologies could help in establishing public trust which is vital for the broad acceptance of autonomous technologies.
However, it is important to acknowledge that not all systems are self-learning and can be labelled “AI” systems. This fact, together with the general perception of “AI” as desirable, has led to the practice of so-called “AI washing”: the mislabeling of technology to suggest that it delivers AI when in fact it does not. AI washing is harmful to the adoption of AI, as it undermines trust, acceptance, and development of the technology (Bini, 2018; Moore, 2017; Vincent, 2019). When a system’s characteristics do not allow for it to be called “AI,” our research suggests that it would be beneficial to provide users with an explanation of why the system’s error occurred. Similarly, our findings indicate that accounting for errors in terms of greater task complexity will likewise reduce consumer’s potentially negative responses to errors made by a less complex, algorithmic system.
In summary, our research suggests that companies should be strategically sensitive to the choice of terminology used to refer to advisor systems, as this choice may have a direct impact on consumer behavior toward the system and on the level of consumer satisfaction with the same. When transparency dictates that more desirable descriptors are unavailable for use, our research suggests that there are concrete ways to substantially mitigate the disadvantages this might otherwise entail.
Limitations and Future Research
The present research raises several additional issues for researchers to consider. First, in the above studies we not only presented the terms “AI” and “algorithmic” as labels for the systems with which the participants interacted, but we also provided brief descriptions of the characteristics/capabilities of these systems. Thus, it may be that the participants were responding not simply to the use of the terms “AI” or “algorithmic,” per se, but instead to these participants’ understanding of the nature and capacities of the systems based on
Second, our study designs did not include measures of trust in human agents. Adding a control group of participants who received forecasts by a human agent would have opened the possibility to directly test the degree of similarity between the response towards AI systems and human agents. We chose not to do so because prior research has already compared the behavioral responses to errors of systems versus humans (Dietvorst et al., 2015).
Third, the present research did not consider individual and contextual differences. However, individual differences in, for example, self-confidence (Lee & See, 2004) and expertise (Fan et al., 2008) have been shown to affect trust in systems. We encourage scholars to look further into the impact of these and similar factors.
Fourth, in our research, system failure was treated as a binary variable. In contrast to this research design, future research could manipulate system performance in more continuous ways (e.g., degree of forecasting accuracy and deviation from usual accuracy) and thereby possibly uncover other interesting differences between system types.
Lastly, it is crucial to exercise caution when relying on AI systems, as their showcased capabilities may not accurately reflect their real-world performance (Woods, 2016). As automation becomes more autonomous and takes on roles similar to team members, a relational approach to trust is needed, focusing on trusting rather than trust calibration (Chiou & Lee, 2023).
This approach considers not only the social influence of automation but also the social implications of people adjusting to and interacting with it. Measures of responsivity and the ability to resolve conflicting goals may be more relevant than reliability and reliance actions for future automation. By adopting a relational framework centered on situation, semiotics, sequence, and strategy, we can better understand the mechanisms and broader effects of trusting increasingly capable automation, fostering more resilient human-automation partnerships (Chiou & Lee, 2023).
KEY POINTS
Different terminology (e.g., “AI” vs. “algorithmic”) significantly influences consumer perceptions and responses, emphasizing the importance of strategic terminology use for trust and acceptance in industries using such systems. Describing a system as “AI” instead of “algorithmic” elicits more favorable behavioral responses during system errors, underscoring the impact of terminology on user behavior. When a system cannot be labeled as “AI”, providing users with an explanation of the error and highlighting task complexity can help mitigate negative responses and enhance system acceptance.
Supplemental Material
Supplemental Material - How Terminology Affects Users’ Responses to System Failures
Supplemental Material for How Terminology Affects Users’ Responses to System Failures by Cindy Candrian and Anne Scherer in Human Factors
ORCID iDs
Cindy Candrian https://orcid.org/0000-0002-2177-8052
Anne Scherer https://orcid.org/0000-0003-4074-4859
Footnotes
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
