Abstract
Keywords
Introduction
The Mindfulness-Based Interventions: Teaching Assessment Criteria (MBI:TAC)
1
is used within mindfulness-based teacher research and training programs to enable assessment of teaching competence. It is also used informally to support skill development and reflective learning (Evans A, Griffiths GM, Sansom S, Crane RS. Using the Mindfulness-Based Interventions: Teaching Assessment Criteria (MBI-TAC) in supervision.
This rapid uptake of the tool is likely being driven by a combination of expansion of MBP research and practice, alongside a commitment to ensuring integrity in practice settings. There are inevitable challenges of ensuring delivery quality both in research and in real-world practice settings. A tool such as the MBI:TAC is a unifying force in the field internationally as it communicates consensus on the common elements of MBP teaching. 8 It is fair to say, however, that the implementation of the tool is ahead of systematic research on its reliability, validity, and its relationship to MBP’s intended outcomes. Whilst there are grounds for this, in that the tool has a practical contribution to make to the pedagogy of MBP teacher training, it is also important that research examines the psychometric properties of the tool and that its implementation in practice is research-led.
Initial validation of the MBI:TAC instrument was promising. Preliminary research on the psychometric properties of the tool was conducted within the context of routine assessment of MBP teaching practice within 3 UK mindfulness Master’s degree programs, which had MBP teacher training embedded into them.
9
This preliminary study demonstrated strong interrater reliability (intraclass correlation coefficient [ICC]:
If instruments like the MBI:TAC are to be used more broadly in the training of MBP teachers and as a tool for assessing intervention fidelity, a critical question is whether new assessors can be trained in a way that results in good interrater reliability. Ensuring the quality of the implementation of the tool has the potential to influence MBP teacher training practice by linking training and research centers internationally to commonly agreed norms and standards of practice. In turn, this has the potential to positively influence the quality of MBP teaching internationally, thus supporting accessibility of the benefits of MBPs globally. Within the international mindfulness-based training context, there is increasing demand for access to quality training for prospective users of the MBI:TAC for both assessment and reflective purposes.
In this paper, we report on phase 1 of the Predictors of Outcomes in MBSR Participants from Teacher Factors (PROMPT) research study. 6 This study was based on the premise that whilst initial validation steps for the MBI:TAC were promising, additional research was needed to inform application in MBP research and practice, including testing whether new assessors could be trained to achieve reliable results using the MBI:TAC. The study had 2 main phases—first, to train a group of assessors to use the MBI:TAC reliably (interrater reliability), and second, to examine a range of teacher factors that could influence MBP participant outcomes (reported in forthcoming paper). This paper reports the outcomes of the first phase and discusses both evidence-based practice and practice-based evidence that informs how to train MBP teacher assessors, supervisors, and trainers to use the MBI:TAC. The research reported in this paper had 2 aims: first, to develop and test a method of training MBI:TAC assessors who can use the tool reliably, and second, to examine in an independent study, interrater reliability using this training model.
Methods
We recruited 31 participants for the MBI:TAC assessor training who met the following criteria: (1) certified to teach through an established and recognized MBP teacher training program, (2) more than 3 years of MBSR teaching experience (the focus was on MBSR as this was the anchor for the PROMPT research funding), and (3) have the interest and time available to complete assessor training and subsequent teacher ratings. We developed and delivered training sessions using a videoconference platform (Zoom). The 31 international participants were strategically selected to represent senior mindfulness-based trainers and supervisors across the world in order to both train assessors for the PROMPT research and to build capacity in the use of the MBI:TAC internationally. See Table 1 for rater characteristics.
Description of MBI:TAC Rater Training Participants (N = 31).
Abbreviations: MBI, Mindfulness-Based Intervention; MBSR, Mindfulness-Based Stress Reduction.
The delivery was structured as 7 × 1.5 h sessions, delivered by 2 trainers (RC and WK), involving didactic presentations on the nature of integrity in MBP teaching; the background and context for the development of the MBI:TAC and its design and structure; tutorials walking trainees through observing/reviewing recordings of MBP teaching and making a rating on each MBI:TAC domain; building skills in assessing through discussions in small and the whole group; and assigning home practice of independently assessing teaching video recordings of teaching. Each week iteratively built understanding of the 6 domains of the MBI:TAC (Table 2) and practised these through focusing on different aspects of MBP curriculum elements (Table 3).
Domains of the MBI:TAC.
Overview of Assessor Training.
Abbreviations: MBI:TAC, Mindfulness-Based Interventions: Teaching Assessment Criteria.
To establish interrater reliability throughout the training process, participants were asked to rate a short video clip of an MBP teaching session between each training session, and their individual MBI:TAC assessments were compared to benchmark assessments. To establish reliability after the training, we asked trainees to individually assess at least 3 videos that they had not previously seen, with teachers they did not know. Their ratings were compared to benchmark assessment points for the videos that had been established via consensus agreement between 4 expert users of the MBI:TAC (trained by the MBI:TAC developers to use the tool reliably as tested through established benchmarks).
To further evaluate the training delivery, we conducted a survey of the participants with 6 questions (see Table 4) that participants answered on a scale of 1 (not at all) to 10 (extremely). Participants were also invited to offer qualitative feedback in response to the questions: (1) Please let us know any suggestions for how we could strengthen a future MBI:TAC training, and (2) Any other comments on the training? (see Table 5).
Quantitative Feedback on the MBI:TAC Training From Trainees (N =23).
Abbreviations: MBI:TAC, Mindfulness-Based Interventions: Teaching Assessment Criteria.
Content Analysis of Trainee Feedback.
Abbreviations: MBCT, Mindfulness-Based Cognitive Therapy; MBI:TAC, Mindfulness-Based Interventions: Teaching Assessment Criteria; MBSR, Mindfulness-Based Stress Reduction.
Analysis
To assess whether the training course improved the consistency of ratings done by trainees during the 7 weeks of training, we first calculated the difference between trainee homework ratings and benchmark assessments (ratings) from each class, and for each MBI:TAC domain. We took the standard deviations of the differences across trainees, within class week and domain. These standard deviations (one per MBI:TAC domain and class week) were regressed on class week, with a random domain effect to allow differences by domain. Assessor (rater) reliability was calculated based on a maximum likelihood estimator of the ICC from a one-way random effects model treating the benchmark assessments as a gold standard. 11
Twenty-three participants completed the anonymous feedback questionnaire. The quantitative responses were analyzed by deriving a mean, standard deviation, and variance score for each of the 6 questions. The qualitative survey data were analyzed using a content analysis approach,12,13 a systematic technique which allows text to be put into categories based upon rules of coding. Here, emergent coding was employed, with categories established during the content analysis as they emerged from the data. The categories thus represent text that was grouped based on having similar meanings. For parsimony, categories that contained 2 or fewer comments (for example, 1 participant reported technical difficulties) were not included as categories.
Results
Evaluation of Consistency of Homework Ratings Over Course Time
As a measure of how training improved rating consistency over time, we took the standard deviations of the deviation between the trainee homework assessments and benchmark assessments of the same videos, by class week and MBI:TAC domain. With a random intercept allowing for differences by domain, we found that the standard deviation decreased on average across all domains by 0.10 points per week during the 7-week training (95% confidence interval: 0.05 to 0.15,
Evaluation of Interrater Reliability
Twenty-four participants completed the training and provided a set of final assessments of MBP teaching videos after completing the course. Interrater reliability coefficients were calculated for each trainee assessor (rater), comparing ratings of at least 3 videos for each MBI:TAC domain against the “gold standard” of the benchmark assessments for the same videos. Rater reliability overall was high, ranging from 0.67 to 1.0 (1.0 represents perfect agreement between participant and benchmark assessments), with a mean of 0.94. Interrater reliability was consistently high across participants for ratings of MBI:TAC domains 1 (coverage, pacing, and organization of session curriculum), 2 (relational skills), and 5 (conveying course themes through interactive inquiry and didactic teaching). Reliability remained high for a majority of participants across other MBI:TAC domains but with some outliers, and the lowest reliability was seen for domain 4 (guiding mindfulness practice).
Evaluation of Trainee Experience of the Training
The quantitative responses from the participants are laid out in Table 3.
The results of the content analysis of the qualitative trainee feedback are presented in Table 4.
Course Completion
Out of the 31 recruited participants, all completed the training. Four of these trainees were selected, based on the accuracy of their midtraining homework ratings, to join RC and WK in determining, by a process of consensus, the final benchmark ratings used as the standard for comparison for the final assignment. Twenty-four of the 27 who were asked to complete the final benchmark ratings (which were judged against the consensus standard for calculation of interrater reliability) did so. Of the 3 raters who did not complete the final benchmark ratings, 2 cited lack of time, while the other expressed intention to complete them but failed to actually do so. These 3 noncompleters did not differ from the other raters on any demographic, teaching, or meditation history characteristic. Twenty-three participants completed the optional feedback questionnaire. As is common for course evaluations, in order to encourage honest feedback, our end-of-training survey was anonymous; thus, we are unable to determine differences between the 23 completers and 8 noncompleters.
Discussion
First, we discuss the analysis of MBI:TAC interrater reliability, and then the evaluation of the trainees experience of the training process, before drawing out implications for practice.
Interrater Reliability
The consistency of assessments (ratings) improved over the 7 weeks of training, and final ratings were close to benchmarks for the majority of participants. The level of reliability developed through this process was better than that of both the previous studies on the MBI:TAC.9,10 There is evidence therefore that this more systematic approach to training hones participants’ skills in using the MBI:TAC reliably. Within this, there were some outliers: a small number of participants still had relatively low interrater reliability compared to benchmarks across 4 of the 6 domains at the end of the training. Our sample size does not enable any reliable inferences about the characteristics of those whose reliability was higher or lower. Thus, whilst there is evidence that suggests that the majority of participants internalized the training well, it is also important that in future trainings there is an assessment process to ensure participants reach an acceptable understanding of the MBI:TAC prior to using it in the field (and perhaps periodic checks to ensure reliability over time). A further important finding was that domain 4 (guiding mindfulness practices) showed less agreement than other domains and therefore may benefit from additional training components. This is in contrast to the Huijbers et al.’s 10 study which had stronger reliability in this domain. One possible reason for this is that the teaching videos used to assess reliability in this study included both MBCT and MBSR which have similarities but also differences about how the mindfulness practices are guided. This may have confounded the process when our trainee cohort was predominantly comprised of MBSR teachers. Interestingly, embodiment was rated more consistently in this study than previous studies. This may be due to work that has taken place to make the descriptors in the embodiment domain more precise, and a greater focus within the training on the these particular descriptors when rating this domain.
A study limitation was the available sample of recordings of MBP teaching. Only between 3 and 4 were available for each MBI:TAC domain. Moreover, benchmark assessments did not span the full range of possible ratings for any domain but instead ranged from 4 (competent) to 6 (advanced) (with a mean of 4.9 [
Evaluation of Participant’s Experience of the Training
This discussion on the training draws both on the qualitative and quantitative feedback, along with the experiences of the trainers (RC and WK). The MBI:TAC assessor training was appreciated by participants, with an aggregated mean of 8.14 for all 9 questions (scale of 0 = not at all to 10 = extremely), and appreciative comments being the largest category in the content analysis. Participants valued the opportunity to develop personal skills, as well as the wider engagement around shared concerns about teaching standards and discussions of a potential way to create joined-up standards internationally.
The lowest score was in relation to participant’s view of their competence in using the tool (mean = 6.74), which speaks to the challenges trainees experienced in engaging with the assessment process. Participants particularly struggled when their assessment points were not aligned with the benchmarks. There was a strong call for a longer training process to enable more time to reflect on the assessment process, greater examination of the rationales underpinning assessment points, reflections, and skills building. The training was the first of its kind, and many participants were only just beginning to consider assessment of teaching as a concept—so the immediate immersion in skills building with the MBI:TAC was a big step for many.
It is important to note that the questionnaire used with this particular sample at the end of the course was likely to elicit socially desirable outcomes. Future investigations of the training process should be more proactive in actively probing for critical reflections on the training and the tool, ideally conducted by an independent researcher using a topic list of possible problems and issues. This should include those who did not complete the training, who did not respond to the invitation to offer feedback, and whose assessments did not match the benchmarks.
The training included participants from America and European countries, and this triggered questions of cultural differences in teaching styles. As trainers (RC and WK), we noticed that the question of cultural differences was raised frequently by participants during the training sessions: how possible is it for an assessor to validly assess a colleague from a different culture? The MBI:TAC aims, within certain parameters, to be inclusive of different styles and approaches to the teaching. However, it is clear that we each have unconscious biases that will steer us to favor a certain style over another. Similarly, the question was raised: how possible is it for an assessor to validly assess an MBP curriculum with which they are not familiar? Within this training, there was a mixture of MBSR and MBCT teachers participating, and a mixture of teaching clips from these programs working with different populations and within different contexts. It became clear that across the domains, there were program-specific issues that needed knowledge and experience with that program to enable accurate assessment. Some participants commented on this in the survey (see Table 4, theme 6).
During the training, the importance of the accuracy of benchmark ratings in training was highlighted. This also emerged as a theme from the survey (Table 4, theme 5). We recalibrated some of the benchmarks, and in the benchmarks used for the summary, interreliability engaged 2 further experienced MBP trainers so the benchmarks were the consensus of 4 rather than 2 assessors. These issues point to the emergent and early stage of this work. Some of the initial benchmarks were conducted within one training center and were not subject to peer review. The exposure within the training process to review from an international cohort of experienced teachers highlighted some flaws and biases in the benchmarking process. There was a particular vulnerability to bias when the assessors personally knew the teacher they were rating. In future, benchmarks that are being used to train assessors need to be subject to peer review from experienced teachers from differing training centers. Further qualitative work conducted on assessor training could usefully gather perspectives on the reasoning behind assessment points. This could help shed light on the reasons assessors divert from benchmarks and may provide valuable information for further adapting the instrument or the benchmarks. Further work could involve colleagues working with different populations, contexts, and countries with different MBP curricula, engaging collaboratively in selecting examples of teaching and developing benchmarks. For training purposes, this could include qualitative narratives giving detailed rationales anchored to the MBI:TAC key features for why particular ratings were given. Overall, there is the opportunity going forward for this process to be a vehicle for international collaboration and consensus building on the elements of MBP teaching integrity.
Retention in the training was excellent, with all participants completing the 7-week course, and 30/31 completing the homework for the last class. Of the 27 who were asked to complete final benchmark ratings, only 3 failed to do so, citing primarily lack of time. The 3 who did not complete the benchmark ratings did not differ from the other raters on any demographic, teaching, or meditation history characteristic.
Implications for Future Practice
This research informs 2 practice-related issues. First, developing greater understanding about how to effectively train practitioners for the range of uses of the MBI:TAC. Second, developing greater clarity on good practice and governance issues needed to enable assessment of MBP teaching practice in teacher training and research contexts. The latter is addressed in a separate paper in this special issue (Crane RS, Koerbel L, Sansom S, Yiangou A. Assessing mindfulness-based teaching competency: Good practice guidance.
Practitioners currently seek MBI:TAC training for 2 main purposes:
To enable recognition and certification of their teaching competence at key points in the MBP teacher training journey—particularly at the end to enable graduation with a Certificate of Competence; Within research trials to enable a check on the integrity of the teaching or where fidelity is the subject of the research question.
The MBI:TAC training for this assessment function therefore focuses on how to skillfully use the tool to enable the practitioner to reliably assess teaching practice in others.
Like any tool, there are skillful and unskillful ways of using the MBI:TAC. For both the functions named above, the user of the tool needs to build appropriate skills, alongside understanding of limitations, cautions, “edges,” and good practice when implementing the MBI:TAC (see also Evans et al., under review and Griffith et al., under review). Theme 6 in the content analysis highlighted the need for training in using the MBI:TAC to support giving skilled narrative feedback. Following the PROMPT research, a collaboration between 3 university mindfulness-training centers engaged in the study has considered the outcomes from the research along with experience in the UK of delivering MBI:TAC training over the last 5 years. From these prototypes, a 3-level training process which progressively builds skills in the range of applications of the MBI:TAC has been developed and is continually evolving. Details of the training process, including learning outcomes and training methodologies, are available online. 15
It has also become clear that the process of assessing MBP teaching is difficult and vulnerable making work for both assessor and assessed. Theme 4 in content analysis demonstrates the markers’ need for ongoing mentoring. Theme 2 reflects the need for further opportunity to share peer reflections, dialogue in small groups; and to do so with others working within a range of contexts and cultural settings. Ongoing collaboration between markers would also serve to moderate discrepancies in ratings, support interrater reliability, and reduce the potential for “drift” over time. Regular attendance at practice and peer reflection groups, such as those held by the “Support for Integrity in Teaching and Training” (SiTT) 16 Community, is recommended good practice for MBI:TAC assessors.
Summary practice learnings that are informing next steps include:
Some trainees are only interested in using the tool as a framework for reflection and therefore only need a short training (level 1 training); For those using the tool to assess, more time is needed to enable reflection on the assessments, and so to progressively build skills in the use of the tool—particularly to build reliability in assessment (level 2 training); On completion of the training, a check on participant reliability in using the tool is needed prior to using it in the field (level 3 training); Ensure that when using benchmarked assessments in the training, they include written comments offering a rationale for each score so the trainees can see how they were arrived at. Preliminary evidence from other, as yet unpublished, data in the PROMPT trial also underlines the importance of benchmarks being agreed by 2 or more assessors; The research was not designed to enable us to understand whether reliability is stronger when assessors are matched to the curriculums they are experienced in delivering and to the language and culture. However, intuitively this seems to be best practice; Develop understanding of how the MBI:TAC can skillfully contribute to the sustainable development of MBPs in the international context through aligning adaptations of the MBI:TAC with culturally sensitive MBP adaptations and expanding capacity of assessors who can use translations of the MBI:TAC in their local language; and All participants appreciated the opportunity to learn together and dialogue about the issue of teaching and program quality. Therefore, it is vital to embed the principle in future trainings that the “how” of learning is as important as the content.
Conclusion
The MBP field is at an early stage in addressing the challenge of quality MBP implementation internationally. There is the risk in the transition from research to practice of an “implementation cliff” such that interventions are almost universally more effective in the context of research trials compared to “real-world” settings.17,18 One concern is that the promise of the research evidence does not become available to the public due to lack of internationally agreed systems for assuring acceptable levels of MBP teaching consistency and quality. This training was the first to bring together senior MBP teachers from a range of countries to examine the issue of MBP teaching standards, competence, and fidelity. The process underlined the ways that international collaboration on these issues informs a globally joined-up approach, which in turn enables the sustainable development of the field. Examination of these issues is at a germinal stage. The effort to reach interrater reliability is considerable, which poses significant challenges in implementing methods such as these. It may be that, over time, research studies such as this one provide stepping stones toward discovering other ways to assess teaching integrity which are less resource intense and more reliable. For example, we are currently developing a version of the MBI:TAC which is completed by course participants. In many ways, this study raises as many questions as it answers—questions at the interface between MBP research and practice.
The MBI:TAC is recognized by the international community of MBP trainers as accurately representing the elements that make up the MBP teaching process. These MBP teaching characteristics laid out in the MBI:TAC also align well with the defining elements of MBP programs. 8 These rubrics that support the building of alignment around defining anchor points for MBP programs and teaching characteristics are critical ingredients for supporting the field in the next phase of development. It is important to note, however, that the MBI:TAC only assesses one element of mindfulness-based teaching—that which is visible in the teaching space. There are many other important elements that inform integrity and quality including professional ethics and practice, theoretical knowledge, and teacher reflective skills. Therefore, its implementation needs to be part of a multifaceted approach to MBP teaching professional standards.
Given the popularity and expansion of MBP practice, it is important that there are ways of ensuring integrity and standards. Assessment of teaching quality is one part of ensuring quality in research and practice settings. In turn, the quality of the assessment process needs attention. Whatever assessment tool is employed, it is important that the user is trained to use it reliably. This research evidenced that it is possible to train a cohort of senior MBP teachers from a range of regions worldwide to align toward shared agreement about the elements of competent MBP teaching. The project informs next steps about how to deliver training for future MBI:TAC assessors. The training can be delivered online, making it accessible and enabling the outcomes of this work to contribute to MBP teaching quality globally.
