Abstract
Prior research confirms abundant variation in using health interventions to promote behavior change and demonstrates the challenges in translating lab findings into the real-world setting.1,2 This is illustrated by many practice-level interventions showing only limited success in behavior change, as based on recent systematic reviews.3,4 According to the complexity science approach (complexity theory), interventions effectively addressing barriers and encouraging desired behaviors in one setting may fail to generate similar results elsewhere since “opportunities for change vary at each practice.” 2 Thus, interventions that are individually tailored and are developed to be adaptive to changing contexts may be more effective in promoting healthy behaviors than non-adaptive interventions.3,5,6 In this vein, just-in-time adaptive interventions (JITAIs), which allow individuals to receive the right amount of tailored support at the right time and place, hold enormous potential for promoting the desired behavior change.
A JITAI is an emerging intervention design that adjusts the timing and intensity of message delivery based on an individual's evolving internal and contextual state with the goal to provide support at the moment when individuals need it most. 7 For example, a JITAI for preventing alcohol use disorders sends individuals tailored messages based on the risk level of alcohol use at a particular moment. If an individual is close to a high-risk location, such as a bar that is frequently visited in the past, the risk level is considered high and the person may receive warning messages. If the risk level is deemed low, individuals may not receive messages. 8 So far, JITAIs have mainly been used to change behaviors related to physical activity, 9 substance use, 8 and obesity. 10 However, research on JITAIs’ implementation and evaluation is still in its early stages, and more empirical evidence regarding the use of JITAIs is needed. 11 In this study, we will take a complexity science approach to evaluate the effectiveness of JITAIs that promote healthy behaviors and assess whether key design principles and components can increase JITAIs’ impacts.
Literature review
Complexity science focuses on dynamic issues and problems that are often unpredictable and consist of interconnected relationships. It is a collection of theories and conceptual tools from different disciplines. 12 Two aspects of the complexity science approach that are closely related to health interventions include mathematical complexity and aggregate complexity. Both explain the promise of JITAIs for health behavior change and provide insights on health intervention design. 2
Mathematical complexity and aggregate complexity
Mathematical complexity refers to various complex systems that exist in behavior change. 13 As opposed to complex systems, linear or deterministic systems assume that the same results will occur over and over again, as long as the intervention starts with the same condition and applies the same equation. 2 For instance, if a lab study demonstrates the effectiveness of an anti-smoking campaign in reducing tobacco use among a certain population, then, when recruiting a similar population, using the same campaign, and following the same procedure, another study should generate a similar effect. In contrast, complex or non-linear systems refer to systems that often follow random and unpredictable trajectories. 2 As a result, an intervention that works in a lab study may not work in another setting. For example, in the natural setting, smoking decisions may be based not only on the anti-smoking campaign but also—and perhaps primarily—on people's changing internal and contextual state.
Although it is impractical, if not impossible, to fully characterize all internal and external factors that drive behavior change, one can capture opportunities and moments that are most likely to influence behavioral decisions. For instance, it is challenging to predict all aspects that affect decisions in smoking cessation, yet it is feasible to detect high-risk locations that may lead individuals to smoke again 14 or identify hand gestures that suggest these individuals are about to smoke. 15 Both high-risk locations and hand gestures are tailoring variables and can be used to determine when and which intervention messages are delivered. 5
Aggregate complexity refers to the interplay among all the various factors that affect behaviors. 13 Dynamic interactions among internal and external factors may lead to unexpected outcomes. For example, at the beginning of a real-world intervention, differences in initial conditions, such as “the larger health care system to which each belongs and the social and economic conditions within the local community,” 2 may result in a disproportionately large effect on forthcoming events. During the intervention, unplanned occurrences (e.g. distractions) may take place and disrupt the decision-making process. Moreover, individuals may modify their behaviors based on the feedback they receive from or other interactions with the intervention. 13 Since the interplay among various factors cannot be fully foreseen, adaptation is needed in intervention design. Adaptation refers to the ongoing use of information to alter the content, timing, and frequency of messages and is emphasized in JITAI design. 16 It requires monitoring (a) whether individuals are “in a state that requires support,” (b) what type (or amount) of support is needed given the individual's state, and (c) whether providing this support has the potential to disrupt the desired process. 5
Passive and active assessments of tailoring variables
Based on the complexity science approach, JITAIs aim to adjust intervention timing and content to individuals’ changing internal and contextual states by targeting certain tailoring variables to capture the opportunities for behavior change. Tailoring variables allow the timing of intervention delivery to be modified. Timing “reflects the particular condition(s) that someone or something is in at a particular point or period of time.” 5 It is based on the events or conditions, such as when individuals approximate a high-risk location, which may be unexpected. JITAIs will deliver intervention messages when “states of opportunity” or the “right time” occurs. 5 For instance, if the time when individuals approximate a high-risk location is deemed to be the tailoring variable and indicative of the right timing to send out a risk warning message, participants will receive instant messages when they approach these locations. If individuals stay in low-risk locations, they will not receive any messages or will receive different types of messages.
To ensure timely and frequent adaption, JITAIs often use mobile devices, which allow people to access interventions anytime and anywhere. 17 Another advantage of using mobile devices to deliver JITAIs is that the sound or vibration associated with mobile messages can capture an individual's attention immediately. 18 Besides mobile devices, some JITAIs use sensors (e.g. accelerometer, camera) either with mobile devices or separately to capture raw data (i.e. heartbeat, walking distance, etc.) and utilize machine learning algorithms to translate sensor-based raw data to classify health outcomes, such as stress episodes, 19 smoking puffs, 20 and physical activities. 21
Sensors and machine learning algorithms make passive assessments of tailoring variables feasible. Passive assessments refer to assessments of tailoring variables passively without the active engagement of participants. For example, participants’ locations can be passively monitored and used to determine when to send an alert. 8 Besides passive assessments, some JITAIs also use active assessments to measure tailoring variables. Active assessments require participants to frequently self-report their thoughts and behaviors. For example, participants self-report three times a day about their difficulties performing tasks to regulate their moods, improve their sleep, etc. 22 However, active assessments are more prone to subjective interpretation and may cause fatigue when participants need to repeatedly answer similar surveys multiple times a day. 5 A recent study showed that a combination of passive and active assessments may be more persuasive than passive assessments alone. 11 This meta-analysis will examine the moderating effects of passive and active assessments of tailoring variables on JITAIs’ effectiveness.
JITAIs with or without a theoretical basis
Besides being tailored in terms of delivery timing and frequency, the content of JITAIs is also often adapted to the dynamic contexts an individual might find themselves in. For example, JITAIs’ messages have been tailored to participants’ stages of behavioral change 14 or possibilities for performing a risky behavior. 23 Thus, several recent JITAIs have been developed according to motivational and/or socio-cognitive theories, such as utilizing the concept of self-efficacy that is emphasized in socio-cognitive theories.24,25 These theories explain how to achieve positive health behavior change. However, to date, many JITAIs have been developed with little use of theories. 5
Theoretically grounded JITAIs have the potential to generate better outcomes than those lacking a theoretical basis.
26
However, most existing theoretical frameworks are not dynamic and only treat underlying mechanisms that drive behavioral change as linear or deterministic systems instead of as complex systems.
27
Several acknowledge the existence of complex systems. For example, behavioral motivation theories highlight the importance of addressing “states of opportunity” at the right time26,28 or recommend breaking long-term goals into short-term specific activities and providing timely guidance.29,30 However, these theories do not explain
Retention
Furthermore, many theoretical frameworks do not consider fatigue, countereffects, or resistance to persuasion, 31 which may cancel interventions’ effects or even backfire. Due to frequent delivery of interventions and assessments as well as strong demand for participants’ time and effort, JITAIs may have the potential to cause fatigue or non-adherence, which could result in lower retention rates. Retention rates refer to the percentage of users who continue to use JITAIs over a given period of time. It is a measure of how well a JITAI is able to keep its users engaged and satisfied. 5 As indicated in prior sections, JITAIs which require less time or effort from the participants, such as using passive assessments, may lead to higher retention rates than those that require longer time or active assessments. 8 Besides, theoretically grounded JITAIs may generate better outcomes, such as increasing retention rates and engagement, than JITAIs without a theoretical basis. 26 In order to examine if key design principles and components impact retention rates, besides analyzing the overall effectiveness of JITAIs and key moderators, this meta-analysis will also investigate if theories, assessments of tailoring variables, and intervention time affect retention rates.
Other potential moderators
As the concept of aggregate complexity indicates, differences in initial conditions can largely affect forthcoming events. As initial conditions, such as social demographic factors, first exist when complex systems are established, they may ultimately influence behavioral outcomes at each practice. Besides initial conditions, the nature of the interactions between different agents “contribute to the uniqueness of each complex adaptive system.”
2
For example, the interactions between interventions and people who are ill may be different from those between interventions and healthy people as various agents’ “attitudes, skills, individual self-efficacy, and the nature of the interactions” are distinctive.
32
Since JITAIs often have various initial conditions and interactions, it is critical to consider these factors when evaluating the effectiveness of JITAIs. In the current meta-analysis, we will therefore consider
Present review
Several systematic reviews have reviewed fundamental components of JITAIs and provided useful guidance on designing and testing JITAIs as well as addressing challenges.5,33 To our knowledge, only one prior meta-analysis assessed the effectiveness of JITAIs, and it only included studies published by February 2018. 11 As an increasing number of JITAIs have been designed and tested in recent years, an updated meta-analysis is needed. Moreover, Wang and Miller's study combined JITAIs with ecological momentary interventions (EMIs) which utilize real-time assessments but do not necessarily provide tailored interventions based on these real-time assessments. 34 Since JITAIs and EMIs are not always the same, 35 the overall effectiveness of a combination of JITAIs and EMIs may not accurately represent the effects of JITAIs. A meta-analysis only focusing on JITAIs is therefore necessary. In the current meta-analysis, the primary goal is to examine the overall effectiveness of JITAIs on objective health outcomes. The secondary goal is to investigate if JITAIs’ fundamental components, such as assessments, theories, retention rates, and intervention duration as well as initial conditions, such as demographic variables and population, moderate JITAIs’ effects on health outcomes. We also aim to explore if theories, assessments, and intervention time affect retention rates.
Method
Literature search
Previous studies have used various terms to describe interventions adaptive to individuals’ changing needs although not all of the interventions are adaptive in real time. Such terms include “dynamic tailoring” 36 and “intelligent real-time therapy.” 37 In July 2021, keywords including just-in-time adaptive intervention, JITAI, dynamic tailoring, and intelligent real-time therapy were searched in the Academic Search Premier, CMMC, PsycINFO, PUBMED, and Medline via EBSCO databases. A completed checklist of PRISMA guidelines can be found in Appendix 1.
Inclusion criteria and selection
Since the primary goal of this study is to examine the effectiveness of JITAIs on objective health outcomes and JITAIs need to be delivered in the natural environment to evaluate their effects, 5 we only included English-language studies that (a) measured health outcomes objectively (e.g. step counts, sedentary time, etc.), (b) had a control condition or pre-post-test design, and (c) were conducted in the real-world setting. Studies were excluded if the intervention did not target individuals’ changing internal and contextual states in real time. For instance, a study was excluded as it delivered dietary-related messages that were scheduled based on students’ usual time to contemplate food choices (e.g. between lectures) instead of delivered just-in-time based on their changing states. 38 We contacted the authors if their studies met all the above criteria, but the authors did not provide data that can be used to calculate effect sizes in their publications.
The titles and abstracts of the articles were screened by an author and a research assistant separately, and any articles that were deemed potentially relevant were thoroughly assessed in full text. The two researchers then independently reviewed the full texts. In cases where there were disagreements, the two researchers discussed the reasons behind their inclusions and exclusions until a consensus was reached. The identification, screening, and inclusion process followed the PRISMA 2020 guideline. 39 Figure 1 shows the flow diagram.

PRISMA 2020 flow diagram.
Statistical analysis
All analyses were performed using R (version 4.1.2). First, Hedges’ g, which refers to the unbiased estimate of standardized mean differences,
40
was used as the effect size indicator. Both randomized trials and non-randomized studies were included. All randomized trials also reported pre-post-test results in the experimental conditions. Since it is not recommended to compute a summary value across randomized and non-randomized studies,
41
we utilized pre- and post-test means and standard deviations (SDs) or standard errors (SEs) to calculate Hedges’
Second, since several studies included different types of behaviors, effect sizes nested within a paper were not independent. To account for dependency among effect sizes, a three-level random meta-analytic model was used to calculate the overall effect size of JITAIs on health outcomes.
42
To test for heterogeneity, log-likelihood-ratio tests were performed to determine whether significant within-study variance and between-study variance exist. To assess the relationship between study sample sizes and effect sizes as well as publication bias, a funnel plot was created.43,44 The rank correlation test was used to statistically determine the potential asymmetry of the funnel plots. The trim and fill method was applied to estimate an unbiased estimate of the overall effect size, assuming publication bias resulted in the asymmetry of the funnel plot.
45
One of the causes of asymmetry is publication bias, but asymmetry can also be caused by other factors, such as variance in studies’ true effect size.
44
Thus, to assess the robustness of the overall effect size and further assess publication bias, fail-safe N was calculated to examine publication bias using Rosenthal's
46
method. It calculates “the number of studies averaging null results that would have to be added to the given set of observed outcomes to reduce the combined significance level” (

Forest plot.
Third, besides general health outcomes, to obtain more precise estimates of the effectiveness of the interventions, subgroup analyses based on different types of outcomes were conducted. Specifically, a three-level random meta-analytic model was used to calculate the effect size of JITAIs that specifically focused on physical activities and the effect size of JITAIs that specifically targeted dietary behaviors.
Fourth, multivariate moderator analysis was used to examine whether assessments, theories, retention rates, intervention duration, population, behaviors, gender, age, and race significantly moderate the overall effects of JITAIs on health outcomes.
Finally, Pearson correlation analysis was used to assess if retention rates significantly correlated with intervention duration. The Wilcoxon rank sum test was performed to assess whether theories and assessments affected retention rates.
Results
Literature search results
We searched five databases and identified 604 articles. After removing duplicate records and non-peer-reviewed articles, 170 full-text articles were checked. After removing studies on interventions that did not meet the inclusion criteria and/or did not report on effect sizes, 14 studies were included in the final analysis. As some of the studies examined different types of behaviors (e.g. physical activity and dietary behavior in Rabbi, Pfammatter, Zhang et al. 47 ), the effect size for each behavior was coded separately. Thus, 21 effect sizes were included in the final analysis.
Study characteristics and coding
There was a total of 592 participants across the studies in this review. Coded study characteristics included
Population
Eleven interventions were tested among the general population while 10 interventions target at-risk populations such as people with spinal cord injury 25 or with depression. 48
Outcomes
Twelve interventions focused on physical activities such as reducing sedentary time,21,49 eight interventions focused on dietary behaviors such as limiting caloric intake, 47 and one intervention focused on providing support to people with depression. 48
Assessments
Thirteen interventions only used passive assessments of tailoring variables while eight interventions used both passive and active assessments of tailoring variables. No interventions used only active assessments.
Theories
Nine interventions did not indicate that the messages were designed based on behavioral or intervention theories while 12 interventions’ messages were theoretically grounded.
All of the studies and the elements of each intervention that could be coded are presented in Table 1. Other information (i.e. intervention design) regarding each included study is in Appendix 2.
Study characteristics and coding.
Note: outcomes: physical activities = 1; dietary behavior = 2; providing support to people with depression = 3. Assessments: passive assessments = 0; both passive and active assessments = 1; population: general population = 0; risk populations = 1.
Meta-analysis results
A three-level random meta-analytic model revealed that the overall effects of JITAIs on health outcomes were significant (
Publication bias
The rank correlation test confirmed asymmetry of the funnel plot (

Funnel plot.
Subgroup analysis
Regarding JITAIs that specifically focused on physical activities, a three-level random meta-analytic model demonstrated that their overall effects on physical activity outcomes were significant (
A series of multivariate moderator analyses revealed that assessments (
The Wilcoxon rank sum test suggested that assessments significantly affected retention rates (
Risk of bias
The RoB 2 tool categorized ten randomized trials into two groups: three as intention-to-treat studies which examined all enrolled participants and four as per-protocol studies which only examined participants who completed all protocol requirements. Among the intention-to-treat studies, 67% were regarded as some concerns and 33% were regarded as high risk of overall bias. A full overview of risk values by domain can be found in Figure 4(a). Among the per-protocol studies, 25% were regarded as low risk of overall bias while 75% were regarded as high risk of overall bias. A full overview of risk values by domain can be found in Figure 4(b). According to the ROBINS-I tool, among the nine non-randomized studies, 29% were regarded as moderate risk of overall bias while 71% were regarded as serious risk of overall bias. Figure 4(c) shows the full overview of risk values by domain.

(a) Intention-to-treat studies’ publication bias. (b) Per-protocol studies’ publication bias. (c) Non-randomized studies’ publication bias.
Discussion
Summary effect size
This review demonstrated some support for the efficacy of JITAIs in improving objective health outcomes. According to Cohen,
59
a Hedges’
Moderating effects
Contrary to our prediction, theoretically grounded JITAIs did not generate better outcomes than JITAIs without theoretical perspectives. Moreover, when theories were used in the JITAIs included, this did not seem to increase retention rates. These findings imply that the theories used in these JITAIs, such as social cognitive theory used in Hiremath et al., 25 Rabbi et al., 55 Rabbi et al., 47 and Wahle et al., 48 fail to contribute to the persuasiveness of JITAIs. A possibility is that these theories do not address the dynamic nature of underlying behavior change mechanisms and thus provide little guidance on adaptive intervention design for complex behavior change. As mentioned in prior research, the lack of behavioral and intervention theories that are dynamic is a major gap that hinders JITAIs’ development. Most existing theories treat underlying behavior change mechanisms as relatively stable and do not allow them to vary over time.27,60 Therefore, it is necessary to advance theories of health behavior or develop dynamic behavior theories to explain “time- and context-embedded change mechanisms.” 61 The complexity science approach and chaos theory can help develop theories that explain and predict “non-linear and quantum influences on human behavior.” 62
Although researchers express concerns regarding intervention adherence and retention, 5 we found that retention rates did not moderate JITAIs’ effects. In other words, JITAIs that attract participants over a longer time period may not necessarily lead to better outcomes than JITAIs that only keep participants over a shorter time period. As mentioned in some research, many current JITAI studies primarily focus on usability testing (i.e. engagement and retention) without involving lots of effort in assessing behavior outcomes. 58 Thus, more attention should be devoted to the evaluation of JITAIs’ effects on health behaviors. At the same time, it is worthy to note that the majority of the studies that were included show high retention rates. As most of the JITAIs included in this study offered incentives for participation, it is possible that incentives, instead of intervention engagement design, encouraged participants to continue using JITAIs. Currently, in the absence of evidence, it cannot be concluded that the intervention will yield a similar level of effect size when the retention rate is low. Therefore, it remains crucial to make efforts to ensure a high level of retention and adherence during the development of a JITAI. When doing usability testing, future research is expected to rule out the influences of incentives or other factors on intervention adherence.
Our results also revealed the relationship between several JITAIs’ fundamental components and retention rates. Specifically, JITAIs using passive assessments tended to keep participants for a longer time than JITAIs using a combination of passive and active assessments. Besides, intervention duration was not related to retention rates. These findings suggest that repeatedly asking participants to self-report their thoughts and behaviors may lead to low retention rates. However, interventions that last long may not lead to low retention rates. According to prior work, intervention engagement and retention is also a dynamic system that contains the interplay among affective, cognitive, and behavioral states. 63 Besides the demands of time and effort, individuals’ attitudes toward themselves and the intervention; their daily life tasks, moods, and symptoms; and many other factors can all impact their adherence to the JITAI. 64 Future research can benefit from including other individual-level data (i.e. psychological data) when examining engagement and retention rates.
Risk of bias
Overall, randomized studies had less risk of overall bias than non-randomized studies, and intention-to-treat studies had less risk of overall bias than per-protocol studies. Among three intention-to-treat studies, one study demonstrated high risk while the risk of bias across six domains among other studies was reasonable. Most per-protocol studies demonstrated high risk due to missing outcome data and deviations from intended interventions mainly because they did not evaluate whether non-adherence could have affected the outcome. For instance, if a JITAI is designed to prevent substance misuse, participants who do not complete the follow-up assessments are highly likely to continue using substances; however, as they do not respond to the assessments, their behavior outcomes are not recorded, and thus, those who are classified as continuous substance users at the post-test are not representative of missing participants. 65 Most non-randomized studies had serious bias due to missing data. Similarly, it was because more than 10% of participants dropped out of the study, and their reasons for non-adherence were not reported. Future JITAI studies are recommended to use high-quality randomized trials and collect data on non-adherence.
Limitations
It is important to acknowledge several limitations. First, our conclusions, especially related to the moderator analysis, are limited by a few small numbers of subgroups. Several subgroup analyses did not generate statistically significant results, maybe because of insufficient power.
Second, all the JITAI studies included in this analysis used a pre- and post-test design, which may result in an overestimation of the effects. To obtain a more reliable estimate of the effectiveness of JITAI, future meta-analyses could concentrate on randomized control trials that compare the outcomes between arms at follow-up rather than solely examining pre–post changes in the intervention arm. By comparing the effect size from these studies with the one obtained in this study, a more robust estimation of the effectiveness of JITAI could be achieved.
Third, the inclusion criteria for this study included a requirement for objective measurement of health outcomes. It might create a certain bias in the results, as some health outcomes are more easily measured objectively than others. Additionally, there was no evidence to suggest that self-reported behaviors are less valid or reliable. For example, a study revealed that “biomarkers have been found not to add sufficiently to the accuracy of self-reported measures to warrant being used.” 8 It is worth noting that the lack of active-only assessment-based JITAIs may be attributed to the fact that active assessments are typically employed in cases where outcomes cannot be measured objectively.
Fourth, we applied a step-wise model building approach. Since no factor was significant in the univariate analyses, we did not perform the multivariate analysis. With more JITAI studies published in the future, future meta-analyses can conduct multivariate analysis and examine how design principles may be related to each other.
Finally, the funnel plot indicated that more JITAI studies with large sample sizes were needed. As many current JITAI studies primarily focus on usability testing, it is reasonable to test the designs and features among a relatively small sample. 51 Before generalizing research findings, more larger JITAI studies needed to be tested in real-world environment.
Conclusion
Despite the limitations, this meta-analysis evaluated and explained the effectiveness of JITAIs through the lens of the complexity science approach and provided practical guidance on key design components that could increase retention rates of JITAIs.
Supplemental Material
sj-docx-1-dhj-10.1177_20552076231183543 - Supplemental material for Using a complexity science approach to evaluate the effectiveness of just-in-time adaptive interventions: A meta-analysis
Supplemental material, sj-docx-1-dhj-10.1177_20552076231183543 for Using a complexity science approach to evaluate the effectiveness of just-in-time adaptive interventions: A meta-analysis by Zhan Xu and Eline Smit in DIGITAL HEALTH
Footnotes
Acknowledgements
Contributorship
Declaration of conflicting interests
Ethical approval
Funding
Guarantor
Supplemental material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
