Abstract
Keywords
Introduction
Coding text is one of the most common methodological approaches for qualitative data analysis across a wide range of academic disciplines. As texts available for qualitative research expand in form and volume (e.g., social media posts and digitized text repositories), there is an increasing need for techniques that enable coding qualitative data
Team-based coding is one approach that enables researchers to code qualitative data at higher volumes and with increased speed (Burla et al., 2008; Casio et a. 2019; Campbell et al., 2013; Giesen and Roeser 2020; Hruschka 2004; MacQueen et al., 1998). Simply put, more human coders facilitate data analysis in a shorter period of time by sharing the labor cooperatively.
Yet, team-based coding also presents many challenges, and these challenges are only amplified as more team members are brought on to code larger volumes of data. In this paper, we outline unique challenges to working with large coder teams based on 18 projects in the last decade. Each of these projects engaged a team of 12 (minimum) to 54 (maximum) individuals to code a large-scale qualitative data set. For each challenge identified, we detail examples of problems we encountered and solutions we devised so that other researchers can more easily mobilize large coder teams for the analysis of large-scale qualitative data sets in academia, government and non-profit sectors, or industry. The same techniques can be applied to smaller-scale datasets in cases where rapid processing is needed, such as piloting for time-sensitive intervention and/or development projects.
Team-based approaches to qualitative data analysis
Over the past 25 years the methodological literature on team-based coding has grown substantially as more scholars recognize the benefits of using multiple coders to analyze qualitative data. Team-based coding typically involves, first, developing a codebook and assessing intercoder consensus or reliability in some way, and then, splitting up the data among multiple coders so that each coder applies the codes to a portion of the dataset (Burla et al., 2008;Campbell et al., 2013; Carey et al., 1996; Giesen and Roeser 2020; Hruschka et al., 2004; Kurasaki, 2000; MacQueen 1998). Most coder teams discussed in the methods literature consist of two to four coders (Campbell et al., 2013; Carey et al., 1996; Giesen and Roeser 2020; Hruschka et al., 2004; Kurasaki, 2000). The number of coders needed for a project, however, depends on multiple variables, including the size and/or complexity of the dataset; the training, ability, and experience of the coders; language and cultural expertise required; the dispersion of the analytically significant themes in the dataset; the number of times the themes of interest appear in the dataset; the difficultly of detecting the theme in the text; and the levels of specificity the researcher wishes to achieve (Ryan 1999; Bernard et al., 2016).While most scholars cite increased speed and efficiency as their primary motivation for team-based coding (Burla et al., 2008; Cascio et al., 2019; Giesen and Roeser 2020; Hruschka et al., 2004; Lichtenstein & Rucks-Ahidiana, 2021; MacQueen et al., 1998), using multiple coders also provides several other key advantages.
First, team-based coding encourages analytical precision by forcing researchers to clarify exactly what a theme means so that everyone on the team can code consistently (Hruschka et al., 2004; MacQueen et al., 1998). For example, differing interpretations or understandings of themes that arise during codebook development can help the research team to refine thematic codes and establish inclusion and exclusion criteria for codes through iterative discussion and resolution (Cascio et al., 2019; Hruschka 2004; MacQueen et al., 1998).
Second, team-based coding helps to ensure coding reliability (Benoit et al., 2016; Burla et al., 2008; Carey et al., 1996; Hruschka et al., 2004; Krippendorff, 2018; MacQueen et al., 1998). Ensuring agreement among coders allows researchers to demonstrate that different people are able to apply the codebook in the same way and, by extension, that individual coders are more likely to use the codebook in a consistent way over time (Cascio et al., 2019; Hruschka et al., 2004; MacQueen et al., 1998).
Third, multiple coders can help establish the validity (Hruschka 2004; Kurasaki, 2000; Moret et al., 2007) or credibility (Tracy, 2010) of the analysis process. Agreement among multiple coders indicates the themes identified are recognizable by multiple people and not simply figments of one researcher’s imagination (Bernard et al., 2016). Moreover, the emic validity (Whitehead, 2005) of the coding system can be enhanced if the coding team includes participants who possess cultural, linguistic, or local expertise relevant to the phenomenon being studied (Bernard et al., 2016).
Finally, using multiple coders can help researchers to identify typicality among coded data segments (Ryan 1999). For example, passages that all coders code consistently for a particular theme capture that theme’s core, while passages with less coder agreement generally represent atypical exemplars, or the edges of, a theme (
Despite these significant advantages, team-based coding also presents many challenges. Team-based coding is prone to communication difficulties, especially among coders with many differences in perspective, opinion, personality, or workstyle (Bozeman et al., 1999; Hall et al., 2005). Good communication in a coder-team requires effective management to ensure team members work efficiently, cooperatively, and on time, with minimal duplication and error (Bozeman et al., 1999; Hall et al., 2005; Richards, 1999).
Training coders—especially novice coders—is time consuming (Cascio et al., 2019; Hall et al., 2005; Hruschka et al., 2004; MacQueen et al., 1998). The amount of time required to train coders
Difficulties often arise in determining compensation, attributing contributions, and managing the various competing interests and goals of team members (Bozeman et al., 1999; Hall et al., 2005; Liggett et al., 1994; Richards, 1999). Such problems can be resolved through close communication and a clear articulation and understanding of the project goals among team members (Bozeman et al., 1999; Hall et al., 2005; Giesen & Roeser, 2020). But, all these challenges and complexities to team-based qualitative analysis become amplified as more team-members join the process (Giessen & Roser 2020). Solutions that work for managing a team of two to four coders may not work well with 10 + coders.
A research informed approach to defining challenges and guidance for large coder teams
18 Studies Using Large Coder Teams for the Analysis of Qualitative Data.
As reflected in Table 1, our lab projects primarily focus on cross-cultural research. The special conditions that drive our theoretical frameworks and code and codebook development are outlined in Wutich and Brewis (2019) and Wutich et al. (2021). However, in some cases we conduct more conventional single-site projects. In those cases, the theoretical frameworks and codes are determined by the lead PI of the project (e.g., Brewis et al. 2019; Roque et al., 2021; Ruth et al., 2021; Trainer et al., 2021). IRB oversight for all these projects was provided by Arizona State University.
Based on our experiences leading and training coders for these 18 studies, we identify four key recurring challenges for large coder teams: (1) recruiting and training coders, (2) providing coder compensation and incentives, (3) maintaining data quality and ensuring reliability at scale, and (4) building team cohesion and morale. We consider these four challenges to be the most salient challenges for large coder-teams that are not presently discussed in the methodological literature. Our identification of these challenges occurred through inductive, iterative reflection and analysis of training manuals and documents we have developed over the past 15 years and lab notes we have taken on the processes and procedures of past projects. We conclude with our collective observations on the unique advantages of employing large coder teams despite these challenges, and we highlight three notes of caution based on the problems we have yet to solve.
Challenge 1: Recruiting and training coders.
Strategies and Examples for Recruiting and Training Coders.
Guidance for recruiting and training coders
Recognize potential pools of coders and target recruitment
Advertising for paid research assistants may be the most obvious choice for compiling a research team, but we have found it useful to think broadly about other potential pools of coders that may be available to join a project. Potential pools of coders may include undergraduate and/or graduate students who are eager to gain hands-on research experience; engaged community members with a stake or vested interest in the research outcomes; or interdisciplinary research collaborators who are untrained but interested in qualitative research.
As university-based researchers, we most frequently recruit undergraduate students for our coder teams. We do this in two ways: (1) through lab-based internships and (2) through practicum course experiences. When recruiting for lab-based internships, we put out a general call to student email list-serves advertising our lab internship and describing our research studies. Generally, over the course of an academic year, our lab houses two to four projects and we typically assign lab interns to one of these ongoing projects (i.e., students work on the same project over the course of the semester or academic year).
Practicum course experiences involve structuring a university course around a specific research project and turning the whole class into a coder team. For example, for a study on children’s perceptions of water futures in the United States (Vins et al., 2014), we crafted the data analysis schedule around the learning goals of an upper-division course. The 54 students enrolled in the course became the coder team, refining the codebook and coding a data set of 3,120 pieces of children’s art over the course of the semester. By structuring a university course around coding and analyzing data, we were able to process far more data than would be possible on a small research team. Importantly, this enabled students to obtain an unparalleled hands-on, collaborative research experience in order to learn the social science research process. In fact, the lead author of our academic publication was an undergraduate student enrolled in the practicum.
Clearly articulate coder benefits and incentives
With planning, we find it is possible to align the coders’ needs with our project’s research goals, learning outcomes, and compensation. For example, if recruiting students, PIs should highlight the types of research skills and experiences coders will gain. If recruiting engaged community members, it may be more important to highlight the broader impacts of the research and competitive pay rates.
When recruiting student coders to join either our lab as interns on multiple ongoing projects, or our practicum courses for a specific project, we outline pertinent details of the project(s), including the project goals, community partners, and general research strategy. We highlight the concrete skills students gain upon completion of the course/project, the credits students would earn toward their degree, the curricular requirements that the course fills in the students’ degree program, and the amount of time outside of class students need to dedicate to the project (e.g., class homework). This information allows students to make an informed decision as to whether or not they wish to join the course/project.
Target coder training according to both project needs and coder incentives
We recognize three major strategies to training a coder team, based on the types of coders hired: (a) expert: hire technical experts with experience in qualitative coding and give project-specific training; (b) targeted training: hire coders who are technical novices and give targeted methodological training; and (c) full training: hire coders who are technical novices and make a major methodological investment to train them as full collaborators.
The expert strategy (a) typically involves hiring paid assistants who have technical expertise. This strategy is financially costly and not always feasible at scale. The full training strategy (c) requires significant time and resources, and normally represents the process of training a graduate student over a number of years or training a community partner who collaborates on a long-term project or a series of projects. Due to the high costs of both strategies, they may not be feasible for large-scale coder teams.
The targeted training strategy (b) is most common for the purposes of compiling a large coder team for a specific project. This means that training should be targeted to the specific project,
Challenge 2: Providing coder compensation and incentives
Strategies and Examples for Providing Coder Compensation and Incentives.
Guidance for providing coder compensation and incentives
Appropriate compensation
If compensating coders with pay, we study local salary ranges and compensation practices for professionals and assistants. This includes expectations that people may have for paid or unpaid time off, as well as local cultural practices like bonus pay during certain times of year (e.g.,
Whether we compensate coders using paid or applied course-credit arrangements, it is necessary to let coders know what compensation they can expect. In paid positions, this is straightforward. In our lab-based practicum courses and internships, students earn course credit that counts towards graduation and degree requirements. In our research lab, student interns receive course credits based on the amount of time they dedicate to the lab each week. In course-based classes, all enrolled students receive a set amount of course credits for completing the course. The provided course credits can meet degree requirements, but students have alternative options to meet requirements. For instance, students who did not want to participate in the advertised project or internships can choose a different course to fulfill those requirements. The key is to be clear and upfront about (a) the amount of expected work involved, including expected hours, and (b) what student coders can expect in return for that work.
We use a variety of monetary and non-monetary means of compensation to ensure that the value of coding work is recognized and compensated. When budgets allow, we pay coders. When budgets are constrained, we provide compensation for student coders in the form of course credit for the amount of hours worked on the project. We also facilitate an atmosphere in which students are aware that professional support, including networking advice, job, and graduate school application preparation, and letters of recommendation are benefits to joining a coding team.
Offer career mentoring
Compensation alone, in money or course credit, is not sufficient to create a sense of investment and dedication to a team and project. Being part of a coder team is almost always a temporary job. We find coders are more invested when their training and experience helps them achieve their career or educational goals. We try to create opportunities for educational and professional development for all coders, so that their duties align with their long-term goals.
In Coders’ Own Words: The most valuable lesson or skill coders say they gained working on a coding team
Create an incentive structure for promotions and increased responsibilities
Large coding teams will inevitably consist of coders with a range of competencies, interests, and goals. We train all coders so they meet a basic level of competency to accurately code the specific project data set (Campbell et al., 2013; Carey et al., 1996; Hruschka et al., 2004; Krippendorf, 2018; MacQueen et al., 1998). Many coders who join a team have busy lives and other interests, and they wish to be involved in the project only to this baseline extent. But often, a number of coders on any one team demonstrate interests and abilities that exceed this baseline standard. We recognize and reward these interests and abilities through promotions to higher-level tasks and supervisory roles, paired with appropriate mentorship for higher-level positions.
One approach we use for large coder teams is to promote such coders to a “coding supervisor” position in which they help to supervise and mentor other coders on the team, are charged with higher-level tasks such as setting up and calculating intercoder reliability tests, and help research leads to choose typical exemplars for project reporting and publication. Along with these increased responsibilities, promoted coders receive increased compensation. For coders in paid positions, this means raising their pay. For student coders earning course credit, this often means promoting them to paid positions or recommending them for paid fellowships. When possible, we nominate student coders for prestigious awards that allow them to develop their own independent projects, with the support of our lab research infrastructure. Such incentive structures not only create opportunities for coders to be further invested in the project (if they wish), but also provide an added benefit of ensuring data quality (discussed further below).
Challenge 3: Maintaining data quality and ensuring reliability at scale
Strategies and Examples for Maintaining Data Quality and Ensuring Reliability at Scale.
Guidance for maintaining data quality and ensuring reliability at scale
Build barriers to original data access
Maintaining data integrity is a key concern for all researchers. If too many people have access to the original primary data, they can be compromised. Such access can lead to data being changed, re-arranged, or deleted due to oversight and general confusion around who is accessing what data and when. To address this, we build barriers to original data access using various technological tools and clear procedures. Regardless of the specific software that we use (see Note 1) to apply and keep track of codes, only the project PIs and research leads can access original data files. One technique that has worked particularly well for us is to have coders enter their coding into web-based forms (e.g., Google Forms, Qualtrics, SurveyMonkey) that are set up and collated by research team leads. The web-based form lists each unit of analysis and either a drop-down menu or box for coders to enter their codes. These procedures limit the number of people with access to original data files, which drastically lowers the likelihood of accidental data tampering or loss.
Research team leads provide digital copies of data files to coders; the coders edit and/or code data on the duplicate copies.
We use a variety of software tools to help coders record and keep track of codes. For projects that have a smaller group of coders (typically < 10) we use VERBI Software MAXQDA to tag and keep track of codes in text because we find it is the easiest QDA program on which to train novice coders quickly. In the past, we have also used Atlas.ti and NVivo for this purpose. However, for large coder teams we do not find it cost effective or practical to have large number of QDA software licenses. In this case, we use a modified software strategy in which we segment or unitize texts in Microsoft Excel and ask coders to record the presence or absence of codes for each segment of text either in a new column in the Excel document or via entry into a web-based form (as described in Challenge 3).
Establish pilot periods and re-assign coders when necessary
Pilot periods allow PIs to asses coder reliability, attitude, flexibility, and ability to work on a team. We establish a pilot period immediately following the training period, and allow for coders to complete a full task arc (i.e., all of the duties they are expected to perform). For example, we often work first with one code from our codebook—building, refining, and testing intercoder reliability, and applying that one code to a selected subset of documents before moving on to repeat these procedures for the rest of the codes in our codebook. This process allows us to (a) ensure that all coders on the team are performing tasks at the basic level of competency the project requires and, (b) promote coders who show interest and the ability to take on higher level and mentorship roles (i.e., as coding supervisors, as mentioned above), and (c) re-assign coders who do not meet the basic level of competence to other tasks. We lay out clear ground rules for the pilot period, including the duration of the period, the compensation structure of the period, and the performance expectations and incentive structure. We also establish evaluation procedures for the pilot period, including whether or not evaluation of the pilot period will be formal or informal. We clearly communicate with coders about all the options that will occur after the pilot period (e.g., assigned as coder, promoted to coding supervisor, reassigned to other tasks, as described in Table 5).
Use a “lead coder” approach for assessing intercoder reliability
Debates over ways to measure intercoder agreement or reliability are discussed in the voluminous literature on the topic (e.g., Armstrong et al., 1997; Campbell et al., 2013; Guba, 1981; LeCompte and Goetz 1982; MacPhail et al., 2016; Schwandt et al., 2007; Tracy, 2010). Common methods include using statistical measures such as Cohen’s Kappa (Hruschka et al., 2004) or Krippendorf’s Alpha (Krippendorf, 2018) and coming to intercoder consensus through repeated dialog and discussion over coding disagreements (Bernard et al., 2016; Cascio et al., 2019; Campbell et al., 2013). These strategies typically use teams of two to four coders. Calculating intercoder agreement and managing the process for achieving intercoder reliability becomes a tremendous challenge when working with teams of 10 + coders. While there are merits and drawbacks to all techniques for calculating intercoder reliability, we find quantitative measures useful when working on very large coder teams because they serve as an efficient baseline at which we can enter conversations about agreements and disagreements around coding (see Hruschka et al., 2004). On a team of 2–4 coders, it is much easier to detect agreement and have consensus based conversations (Cascio et al., 2019), but these methods can become burdensome and unproductive on a team of 20 + coders. To navigate this, we employ a “lead coder” approach to measure and come to intercoder agreement with a large coder team.
In the lead coder approach, the project leadership team constructs the initial version of the project codebook following MacQueen and colleagues’ (1998) method to create detailed and structured codebook definitions for each code. After extensive pre-testing and refinement, we distribute this initial version of the codebook to the whole coding team to review. The “lead coder” (usually one person on the project leadership team) then samples the data set to create a coding test for the purposes of coder training and codebook refinement (usually about 25 text coding units from the data set). The lead coder, working with another member of the leadership team, codes the test set of data, following the initial codebook. Agreement is assessed, and any differences (typically, rare and minor at this point) are resolved through discussion. A final “test set” is then produced and used to onboard new coders to the project. Each new coder uses the initial codebook to code the test set independently. Each coder then measures their coding agreement for each code with the “lead coder” (we most often use Cohen’s kappa to assess intercoder reliability, but when this measure is inappropriate to the data set and research question, we employ alternative techniques [see Barbour, 2001; Krippendorf, 2018; Tracy, 2010]). Depending on amount of disagreement, the size of the team, and coders’ topic/site expertise, the whole coding team (including the lead coder) may collectively discusses coding agreements and disagreements and revise the codebook accordingly, following the process outlined by Campbell et al. (2013). In this scenario, the lead coder then re-samples the data set to create a new coding test and the process is repeated until coders achieve an acceptable level of agreement for each code with the lead coder. Key to our process here is our commitment that the “lead coder” does not have the authority to mandate that their coding is “correct” in the initial rounds of codebook development. Coding disagreements are discussed and mutually rectified between the lead coder, project leadership team, and coders with relevant topic/site expertise. In this way, the lead coder and each team member can be considered a dyad of independent coders who come acceptable levels of negotiated agreement (Campbell et al., 2013).
This process of using a lead coder has several advantages. (1) An experienced lead coder with detailed knowledge of the data set and the codebook provides hands-on training and imparts conceptual knowledge to trainee coders through the codebook refinement process. (2) Open discussion enables novice coders to develop more nuanced conceptual understandings of the code by hearing the ways that other coders had thought through and applied the codes (including the lead coder). (3) Points of disagreement among coders and the lead coder help to refine the codebook as coders and the lead coder collectively discuss and reconcile their disagreements. (4) This process provides a test of coder competence. Coders who are not able to achieve an acceptable level of intercoder agreement with the lead coder after multiple rounds of codebook refinement are re-assigned to other tasks.
Create 100% redundancy in coding procedures
In addition to evaluations of coder competence, we build 100% coding redundancy into our coding procedures when working with large coder teams. For example, on the project that consisted of a team of 54 student coders who were tasked with coding 3,120 drawings, we assigned each of the 54 coders two sets of 58 drawings to code (116 drawings total, per coder). This occurred after 10 + rounds of codebook revision and refinement and all coders reaching acceptable levels of intercoder reliability with the lead coder, as described above. By assigning each coder two sets of drawings to code, each drawing in the data set (
Foster a culture that prioritizes data quality and ethical social science
Strategies and procedures for ensuring data quality on large coder teams work best, we find, when they operate in a work culture that prioritizes data quality and ethical social science. We build such a culture very intentionally through formal and informal norms: affirming ethical commitments on all project and team documents, training periods dedicated to social science research ethics, reiterating the role of data quality in ethical social science research, and setting aside time to discuss data quality and ethical research procedures collectively as a team. When problems arise in the research process, we problem-solve by centering approaches that uphold our ethical commitments and the quality of our data. A culture that prioritizes data quality and ethical research creates countless informal reminders to coders to maintain and protect data quality as an ethical responsibility at all times.
We find there is not one main technique to ensure data quality when working with large coder teams. We layer data protections by using technological tools, providing intensive coder training, creating redundancy in our coding procedures, and fostering an atmosphere that prioritizes data quality and ethical social science research. Different means of data protection will be more or less appropriate for different types of teams, but ensuring data quality through multiple angles has been key to our success.
Challenge 4: Building team cohesion and morale
Strategies and Examples for Building Team Cohesion and Morale.
Guidance for building team cohesion and morale
Prioritize and schedule social connection as part of team coding efforts
Coding qualitative data is arduous work that requires great focus and can take a long time. We set realistic timeframes for coders to complete their assigned coding (that allow for coders to take breaks and do not require too much coding in any one day), but also plan time for coders to decompress, socialize, and talk about their ongoing work. These periods of informal social time—in which coders may bring up the curiosities they have encountered in texts, or the examples they see over and over—can assist with team bonding and fostering team spirit. We have found that they also can lead to important new insights in the data or potential new spinoff projects. We build in times for team bonding, such as having coffee hours or allowing 10–15 minutes of free chat before a team meeting formally begins.
There are many different ways to build team cohesion and morale, but we find that deliberately fostering a strong sense of team-spirit in the research process is key to navigating the inevitable setbacks, pitfalls, and communication difficulties of team-based research. We cultivate this through clear communication and established procedures and repeatedly emphasizing (and recognizing) all team contributions to the research process.
Emphasize the team-based nature of research as a whole
We ensure that all team members know upfront the team rules, procedures, and expectations, including the baseline levels of competency we expect for coding a particular project. If a coder does not meet those competency levels for a particular project and needs to be reassigned (as described in Table 5) we maintain morale by emphasizing the importance of all research tasks and explain coding to be one element of the research team.
Institute consistent communication (in-person and virtual) with well-established procedures, values, expectations, codes of conduct around scholarly contributions
We maintain a consistent commitment to appropriately recognizing the contributions of coders in our lab work. Recognition can take many forms (e.g., authorship credit, author order, acknowledgements, etc.). Since we founded our lab, norms around recognition of work have shifted in the academy. Lab work by undergraduates is now increasingly likely to be credited with authorship—or expected to be.
We rely heavily on external guides for how to assign credit, in part to maintain consistency through time when dealing with multiple undergraduate collaborators, while also being able to change as norms shift. For some years, we have used the International Committee of Medical Journal Editors (ICMJE)’s roles and responsibilities guidance (ICMJE, 2021) as our baseline for explaining and sharing transparent expectations around co-authorship. We follow HWISE guidelines for forming consortia to share co-authorship across large international collaborative teams (Jepson et al., 2020).
Given the centrality of anti-racism work to our lab philosophy, we also consider the Civic Laboratory for Environmental Action Research (CLEAR) guidelines (Liboiron et al., 2017). These guidelines help us prioritize junior and marginalized scholars in the recognition and ordering of co-authorship contributions. Being able to share and follow external guidelines, particularly as they are updated, allows adjustments without creating confusion or inconsistencies between lab members and through time.
At the time of this writing, we are 18 months into the COVID-19 global pandemic. While the guidance provided here has been developed from our fully-completed projects (which we consider as those where the analyses have appeared in at least one peer-reviewed publication), we have continued to conduct these same lab activities on projects throughout 2020 and 2021 through lockdowns and other disruptions, switching in March 2020 to a synchronous online-only modality. All meetings occurred over zoom. In August 2021, we switched back to in-person lab activities. But the online modality was sufficiently successful that we are continuing to also offer parallel synchronous online options for students moving forward. In adapting these processes to online, we used a “work alongside” strategy where coders worked remotely but met in prescribed two-hour blocks through the week over Zoom with a faculty member or lab manager in those time blocks. This way, there was always someone available to answer questions as they coded in real time, and a sense of access to and engagement with others in the lab. Overall, for those students unable to be physically on campus, this worked well for all involved. Given disruption is inevitable in global collaboration, strategies for shifting large coder team management online are crucial to have in place.
Research benefits of large coder teams
Despite significant challenges posed by large coder teams, we find that they offer tremendous advantages for qualitative data analysis. If effective, efficient, and equitable procedures for recruitment and training are in place, large coder teams undoubtedly enable researchers to process qualitative data in much less time than a smaller team. In an era of big data, well-run large coder teams open up possibilities to analyze new research questions and work with new data sets that may not have been possible for a small team of coders to tackle. But, perhaps more significantly, large coder teams also have the potential to include a far greater amount of diversity of insights into the analysis process, especially through procedures for codebook refinement. More coders often mean more perspectives and ideas that can be incorporated into the process of refining codes—especially if researchers make a concerted effort to recruit coders from a diversity of backgrounds, cultures, language expertise, and experiences. This process often translates into deeper and more nuanced codes and being able to explain analytical constructs in more concrete ways when reporting research results.
Notes of caution
Despite the significant advantages of large coder teams, we highlight three notes of caution for researchers looking to mobilize large coder teams in their own research.
First, we find that large coder teams are best suited for highly structured coding. In our experience, analysis with a large coder team works best when a smaller team of researchers works to develop the initial version of a codebook (inductively or deductively), and then bring on a larger team of coders to refine the codebook via initial coding tests and discussion. During the initial process of codebook development, too many team members can result in too many ideas that end up creating overly complex codes and codebooks. Thus, highly inductive projects, such as grounded theory or schema analysis projects, are not well suited to large coding teams.
Second, large coder teams require time and resources. While one of the primary goals of employing a large coding team to process data is to
Third, in our experience, burdensome oversight structures are typically necessary to ensure high-quality analysis with large coder teams. We have found hierarchical leadership structures to be necessary in order to establish chains of supervision, reporting, and accountability as well as to ensure such vital functions as safety, consistency, follow-through, and record-keeping. The reputation of the research group depends on its ability to consistently deliver high-quality coding and analysis. While such hierarchies can be burdensome for all involved, oversight is crucial to ensuring that coding errors are identified and corrected in a timely manner. That said, overly rigid hierarchies are unhelpful to building collaborative teams, and we encourage avoiding leadership hubris and being open to the contribution of ideas, feedback, and criticism from every team member.
Conclusion
Human coding of text—especially large volumes of text or when using many codes—is a laborious and often boring task. In qualitative projects, it can represent a large percentage of the cost of conducting research, in time or funds. While machine-based coding helps with challenges of volume, many researchers recognize that the insights they seek require the subtleties only trained coders can provide. We have provided some perspectives, based on our collective experience in a large qualitative lab leading 18 cross-cultural studies, on how large coding teams can be activated, managed, and sustained to move coding forward faster and at greater scale. These solutions are imperfect and evolving, and will not suit everyone, but we hope this provides researchers an additional option to consider as they expand the reach and scale of their research.
