Abstract
Keywords
Although complex health interventions (CHIs)—those with several interacting components—are common in contemporary health care worldwide (Campbell et al., 2007;raig et al., 2013), clinicians often face considerable challenges implementing these interventions due to their design complexity (Greenhalgh et al., 2004; Perez Jolles et al., 2019). For this reason, intervention usability—the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction (International Organization for Standardization [ISO], 1998)—has been identified as a key “upstream” determinant of implementation (Lyon & Bruns, 2019). This is particularly true in mental and behavioral health, where most effective practices are complex evidence-based psychosocial interventions (EBPIs; Institute of Medicine [IoM], 2015).
In contrast to standard perceptual implementation outcomes such as acceptability, appropriateness, and feasibility, usability is largely a characteristic of the
Frequently, EBPIs are delivered in integrated or non-specialty care settings (e.g., primary care and schools) that differ markedly from the contexts where they were originally developed. For instance, mismatches between EBPI design and the constraints of primary care make intervention fidelity and patient outcomes difficult to sustain (Alexopoulos et al., 2016; Areán et al., 2008). Despite being critical to implementation, aspects of intervention design quality, such as usability, have been insufficiently assessed in primary care and other non-specialty contexts (Lyon et al., 2019).
Human-centered design and usability
Human-centered design (HCD) reflects an approach and set of methods for creating and refining systems so that they are usable and useful for their stakeholders. HCD is also closely related to the fields of human-computer interaction, user experience, and human factors (ISO, 2019; Norman & Draper, 1986; Rubin & Chisnell, 2008). Relevant to our research aims, each of these fields includes a focus on usability and provides methods to evaluate and improve the extent to which people can reliably use a system to achieve their goals, without error, safely, and with an enjoyable experience (Dumas et al., 1999; Nielsen, 1994). Usability is a key aspect in the design and refinement of CHIs (Burchett et al., 2018; Harte et al., 2017; Horsky et al., 2012), and EBPIs in particular (Lyon et al., 2019; Lyon, Koerner, & Chung, 2020).
System Usability Scale
Originally designed for assessing digital systems, the System Usability Scale (SUS) is among the most widely applied usability instruments (Sauro & Lewis, 2009) and has been found to be significantly related to task success in both laboratory and field studies (Kortum & Peres, 2014). Some applications of the SUS have even extended beyond digital products to technologies such as automatic teller machines and microwave ovens (Kortum & Bangor, 2013). Brooke (1986) initially developed the 10-item scale as a brief and reliable instrument to allow comparisons among products and across versions of a product. Bangor et al. (2008) evaluated 2,324 SUS questionnaires and found high internal reliability (
Current aims
In light of the criticality of CHI usability, and the paucity of psychometrically sound instruments for its evaluation, we revised the SUS to evaluate EBPI usability. This study assessed the revised instrument’s structural validity using confirmatory factor analytic methods. This study was carried out with primary-care providers, given the importance of primary care for the delivery of contemporary mental health services worldwide (Centers for Medicare & Medicaid Services, 2018; World Health Organization Regional Office for Europe, 2016) and the likelihood that many traditional EBPIs may demonstrate problematic usability in that novel context. The Intervention Usability Scale (IUS) was evaluated using primary-care providers’ ratings of Motivational Interviewing (MI), a client-centered and directive approach designed to help service recipients resolve ambivalence and build motivation to change (Miller & Rollnick, 2012). MI has a strong evidence base, including in primary care (Hettema et al., 2005; VanBuskirk & Wetherell, 2014), but its usability has never been assessed.
Method
Participants
Table 1 provides sample demographics. The study sample consisted of 136 medical providers who selected MI as the intervention they delivered most often (see Procedures). A slight majority (56.6%) of participants self-reported their gender as female; most participants were white (86.0%); and the median age was between 30 and 39.
Study sample demographics.
Racial/ethnic categories and degrees held are not mutually exclusive.
Procedures
The data in this study were collected as a component of a larger survey completed by the WWAMI (Washington, Wyoming, Alaska, Montana, Idaho) region Practice and Research Network (WPRN)—a group of primary-care clinics committed to conducting practical research—in January 2019. At the time of the survey, WPRN clinics were spread across 25 diverse parent organizations in both urban and rural communities. The survey was originally conducted for quality improvement purposes. As a result, participation was not incentivized and providers were not required to provide informed consent. This study’s analyses were approved by the authors’ Institutional Review Board. The full survey consisted of questions in four areas: (1) clinician role/demographics, (2) perspectives on behavioral health in primary care, (3) anxiety treatment, and (4) provider burnout/resilience. Survey completion took approximately 10 minutes.
All WPRN member organizations (
Measures
Survey of commonly used behavioral health interventions
As a component of the larger survey, respondents were asked to select which intervention they provided most frequently from a list of evidence-based behavioral health interventions known to be commonly used in the WRPN (including cognitive behavioral therapy, behavioral activation, and MI, among others). That intervention served as the referent for the IUS.
IUS
The 10-item SUS (Brooke, 1986) was adapted to create the IUS (Lyon, 2016; Lyon, Koerner, & Chung, 2020). The term “system” was replaced with “intervention” in each item, but no other modifications were made to maintain consistency with the SUS operationalization of usability. Items are rated on a Likert-type scale from 0 (strongly disagree) to 4 (strongly agree), with half of the items reverse-scored. The total score was calculated by multiplying the sum of these scores by 2.5 (possible range: 0–100).
Analyses
To allow for direct comparability of studies, we replicated prior procedures. To replicate Sauro and Lewis (2009), we used principal axis factoring (a.k.a., common factor analysis) and, to replicate Lewis and Sauro (2017), we used principal components and unweighted least squares factor analysis to examine the factor structure of the IUS and evaluate possible subscales. All analyses used varimax rotation. We then assessed scale correlations and internal consistency for each subscale and the total score. We compared IUS descriptive statistics with prior research on the SUS.
Results
Item descriptives and correlations
Scores on IUS items included the full range (0–4) for all items besides item 4 (range: 0–3), item 5 (range: 1–4), and item 9 (range: 1–4). The mean scores for items ranged from 0.88 to 3.04; median scores were 3 for all regularly scored items and 1 for all reverse-scored items. See Table 2 for more detail on individual item descriptive statistics. Inter-item correlation absolute values (half of the IUS items are reverse scored) ranged from 0.05 to 0.64. Five pairs of items were not significantly correlated: items 1 and 6, items 1 and 7, items 1 and 10, items 6 and 9, and items 6 and 10. See Table 3 for full item correlation results.
IUS Individual Item Descriptives for Motivational Interviewing (MI).
For all items, 0 = strongly disagree, 1 = disagree; 2 = neither agree nor disagree; 3 = agree; 4 = strongly agree.
Items 2, 4, 6, 8, and 10 are reverse scored.
Item Correlations.
Significant at
IUS factor structure
Examining the point of inflection on a scree plot indicated that a two-factor solution best fit the data. Factor 1 had an eigenvalue of 4.31, therefore accounting for 43.1% of the variance, and Factor 2 had an eigenvalue of 1.1, accounting for 11.0% of the variance, for a total of 54.1% of variance accounted for in the two-factor solution. For all three approaches to analysis, items 1, 2, 3, 5, 6, 7, 8, and 9 loaded onto the first factor, whereas items 4 and 10 loaded onto the second. See Table 4 for the rotated component matrix. The item-factor alignment was nearly identical to that of Sauro and Lewis (2009), which found a two-factor structure but inconsistent with studies that followed (J. R. Lewis & Sauro, 2017). Therefore, we named these factors the same as in the 2009 study; “Usable” and “Learnable.” To place the Usable and Learnable scores on a comparable 0 to 100 scale as the Overall IUS score, we multiplied their summed score contributions by 3.125 and 12.50, respectively.
Rotated Component Matrices for three exploratory structure analyses.
Scale correlations
The correlations between the subscales and the Overall IUS score were
Reliability/internal consistency
The overall IUS items had good internal consistency (α = .83). Coefficient alphas for Usable and Learnable were α = .84 and α = .67, respectively. Only two items loaded onto Learnable, contributing to its low alpha. The Learnable subscale was slightly lower than the typical minimum standard of .70 (Landauer, 1997; Nunnally, 1978).
Distribution of IUS subscale scores
Table 5 shows descriptive statistics on distributions of scores on the IUS, alongside a comparison with prior SUS data. Of note, the current sample had significantly higher overall scores (
Comparison with Prior System Usability Scale Data.
Discussion
No psychometrically sound instruments exist to evaluate the usability of CHIs. Our study of the IUS revealed a factor structure that was nearly identical to Sauro and Lewis (2009). However, our results differed from subsequent studies (Kortum & Sorber, 2015; J. R. Lewis, Brown, & Mayes, 2015; J. R. Lewis et al., 2013; J. R. Lewis & Sauro, 2017; Lewis, Utesch, & Maher, 2015; Sauro & Lewis, 2011). The moderate correlation between the subscales indicates that the measure can be used as a total scale score, as well as decomposed into Usable and Learnable subscales. The overall IUS score for MI was 68.70, which was slightly below the SUS cutoff of 70 for “acceptable” (Bangor et al., 2008). However, until additional research is conducted with the IUS, it is unknown the extent to which IUS cutoffs translated to CHIs.
Future research should apply the IUS to better understand factors that contribute to high and low scores and the relationships between those scores and related constructs. First, applications with a broader range of EBPI types are indicated to confirm the current results, and these investigations are likely to reveal EBPI qualities that result in higher and lower scores (e.g., EBPI complexity). Second, the IUS should be applied with additional service provider and service recipient populations with a range of experience in the interventions evaluated. Prior research with both the SUS and the IUS suggests that greater experience with products or expertise in domains can result in higher scores (Lyon, Koerner, & Chung, 2020; Mclellan et al., 2012; Sauro, 2011). Third, initial evidence for the structural validity of the IUS creates opportunities to evaluate the relationships between usability and perceptual implementation outcomes (e.g., acceptability and feasibility) using established measures for those constructs (e.g., Weiner et al., 2017). Fourth, there may be opportunities to improve the IUS itself, such as targeted item developed to enhance the robustness of the two-item Learnable subscale or revised wording including exploration of the utility of including specific examples of EBPIs that demonstrate high or low usability. Finally, future research also should assess the extent to which differences in implementation supports (e.g., training and consultation) impact experiences of EBPI usability.
Limitations
This study had several limitations. First, it occurred only in primary care with one user group and may be difficult to generalize to other contexts. Although the WPRN represents a wide variety of types of health care systems, clinic sizes, and geography (and is broadly representative of primary care clinics in the region), sampling within the WPRN was non-systematic and may have yielded a sample that was more engaged with, or had positive attitudes toward, behavioral health services in general or EBPIs in particular. Second, due to a survey programming error, we did not obtain consistent data about the amount of training respondents had received for MI.
Conclusions and next steps
Overall, the adapted IUS demonstrated good psychometric quality and a structure consistent with some prior research. Overall, intervention usability has been conceptualized as a key determinant of both perceptual (e.g., appropriateness and feasibility) and behavioral (e.g., fidelity) implementation outcomes, as well as patient outcomes (Lyon & Bruns, 2019). Application of the IUS to a broader range of EBPIs, settings, and professional roles would allow this proposition to be explicitly tested.
