Abstract
Keywords
Introduction
Survey data are an essential tool to collect information that can be used to inform health research, interventions, and policies. It is often the main source of data about populations’ behaviours, attitudes, and health outcomes that are not captured in routine clinical records. When designing surveys, researchers must compare the benefits and costs associated with closed and open-form questions. Closed-form questions are efficient for analyzing large volumes of data but require making assumptions about what the appropriate response options should be based on previous surveys (De Leeuw, 2012; Reja et al., 2003). Open-ended questions may offer more nuanced responses, but can be time-consuming to analyze in large volumes and can discourage participation in surveys (De Leeuw, 2012; Newcomer et al., 2015; Reja et al., 2003). At the same time, researchers continue to struggle with identifying the best practices to engage individuals in surveys, particularly individuals from underrepresented groups, who may not see their experiences represented in closed-form response options (George et al., 2014; Savard & Kilpatrick, 2022; Schonlau & Couper, 2016).
In response to these challenges, this study uses a combination of computational and manual methods to analyze qualitative feedback from a large-scale survey developed by the Alberta’s Tomorrow Project (ATP), a longitudinal research project (Ye et al., 2017), to generate new insights into the best practices for participant engagement with surveys. The ATP surveys are based on questions from several validated survey tools and ask several open-ended questions about different aspects of the survey process. These questions were answered by over 15,000 individuals.
To process such a large volume of qualitative data, we use a combination of computational text mining and traditional manual qualitative analysis to examine responses to a large-scale survey administered to understand participant experiences with the survey process. Text mining methods offer new approaches for researchers to efficiently analyze large volumes of unstructured text-based data from open-ended responses and have been applied in health-related research (e.g., Forsgren et al., 2023; Nitiéma, 2023; Ramon-Gonan et al., 2023) and other social sciences (e.g., Derksen et al., 2025; Ferrario & Stantcheva, 2022).
As illustrated by the review by Rouhani et al. (2024), there have been many different approaches used to incorporate text mining methods in research involving textual data. Previous methodological studies have compared the underlying preconceptions that motivate computational approaches to textual analysis with manual methods, highlighting that there are many epistemological similarities (Alla et al., 2018; Aureli, 2016; Guetterman et al., 2018; Marcolin et al., 2023; Yu et al., 2011). This is particularly true when comparing computational text analysis approaches with approaches such as grounded theory and content analysis, which require researchers to adopt an openness to emerging findings and involve iterating upon conclusions as findings emerge.
This kind of triangulation to compare the findings from computational and manual analytical approaches has been adopted in several settings. This includes analyses of Korean social media posts (Lee et al., 2022), academic papers related to sustainable manufacturing practices (Zhou et al., 2022), philosophical texts (Forest & Meunier, 2005), corporate reporting (Aureli, 2016), and product reviews (Anastasiei & Georgescu, 2020). However, these differ from the current study in several ways.
First, none of these applications focus on data from large-scale surveys. Survey data are a critical source of information for researchers and practitioners across many fields. Understanding how computational and manual approaches compare at scale may, therefore, provide critical insights into how open-ended survey questions can be efficiently and appropriately analysed. Second, most of these applications do not directly compare the consistency of manual to computational approaches. For example, while Zhou et al. (2022) use some manual intervention to identify papers that are not relevant to the review and to interpret the themes from the computational text analysis methods used, there is no attempt to manually code the themes in the papers directly. Similarly, Ferrario and Stantcheva (2022) use text mining to analyse open-ended text from several large surveys; however, this focused only on findings based on a computational approach. These differences highlight a gap in the literature regarding the comparability of manual and computational approaches in analyses of data from large surveys.
This is particularly true within social science or health contexts, where most studies focus exclusively on manual thematic analysis of small samples (e.g., Corr et al., 2015) or text-mining approaches of large volumes of data (e.g., Nitiéma, 2023; Ramon-Gonan et al., 2023; Wright et al., 2022). This was further emphasized by a recent scoping review of research related to nursing, which finds that most articles in this literature that involve computational text mining approaches focus on purely computational approaches (Wang et al., 2024). Furthermore, this review finds that most articles use only one computational text mining approach, rather than a combination of computational methods. A handful of notable exceptions come from recent work by Hacking et al. (2023), Rutkowski et al. (2022), and Syyrilä (2021), which compare the findings from text mining and manual coding. However, these are all focused on a relatively small sample of interviews or survey responses, which limits the ability to understand how these approaches compare at scale, where computational approaches are more likely to be adopted to analyze large volumes of textual data.
The present study addresses these gaps in the literature by using both text mining and manual thematic analysis to identify themes in descriptions of participant experiences with the ATP survey. This may have important practical implications for the design of large-scale longitudinal surveys. Methodologically, the triangulation approach employed in this research also allows this study to assess how comparable the findings from computational text mining and manual content analysis are when they are applied to responses in large-scale longitudinal surveys. By including both approaches, this ensures the voices of all ATP survey participants are represented, while also providing an opportunity to reflect on the appropriateness of the findings from the more computational approach in a very large sample of responses.
Data and Methods
Our research design uses a combination of computational text analysis and inductive thematic analysis. This combination of methods is used to identify patterns in open-ended responses to questions about participant experiences with the administration of a survey conducted by the ATP. The combination of analytical methods also allows for the finding of each method to be triangulated, while also allowing the research to generate complementary insights when it comes to the ontological and epistemological assumptions embedded in each analytical approach (Rohleder & Lyons, 2017). Specifically, the computational text mining methods are informed by a post-positivist orientation, based on the assumption that there are patterns in the responses that can be identified and measured. At the same time, the thematic analysis is primarily informed by a constructivist orientation that assumes participant responses are context-dependent and that there are multiple values and perspectives present within responses. The research team integrates these approaches to capture the complementary insights associated with each of them. At the same time, the study also assumes some aspects of a pragmatic orientation, which reflects the aim of generating actionable insights to improve survey design.
Researchers have been using a combination of open and closed-ended questions to leverage the benefits of both for decades. However, processing large volumes of open-ended responses presents a challenge, as traditional approaches to analyzing open-ended responses involve manual review. To overcome this, we use several text mining approaches (“computational text analysis”) to detect the tone and topics in responses (Grimmer et al., 2022). However, these methods can overlook nuances in language, contextual meaning, and the diversity of individual perspectives. Popular topic models may overgeneralize or misclassify themes, especially when responses contain multiple overlapping ideas. Similarly, out-of-the-box sentiment models may misinterpret tone, misinterpret responses with mixed sentiments, or overlook culturally specific expressions. We, therefore, combine these approaches with manual thematic analysis.
Survey Data
The data for this analysis comes from the Alberta’s Tomorrow Project (ATP), a longitudinal research project. ATP was established in 2000 and has recruited 55,000 participants between 2000–2015. ATP has administered regular surveys to its participants to collect information on their health and lifestyle (Ye et al., 2017). ATP is designed to support research investigating why some people develop cancer and chronic disease while others do not. In the 2023 iteration of the survey, six optional questions were included about the survey administration process, which will be the focus of this analysis. Before being administered, participant advisors were consulted to review and test the 2023 survey along with program staff. ATP has established a panel of participant advisors (30 participants with diverse backgrounds from the cohort) to provide participants’ voices and feedback on the surveys we develop and the activities we engage in. This consultation allowed advisors to review that questions are easily understood and answered, that the flow of the survey makes sense, that communications used in the implementation of the survey are easily understood/accessible, and to see the value in what is being collected. Facilitated discussions were held to make sure all feedback was captured and taken into consideration.
Even with such extensive consultation and testing, there still will be questions or concerns from participants, as this is a large cohort with many perspectives. This motivated the inclusion of the six optional evaluation questions that are the focus of this research, which enabled ATP to collect participant feedback intended to improve future surveys. Five of the six questions included a multiple-choice component, where participants rated their experience between excellent and very poor. These included an optional open-ended explanation for their response. A sixth open-ended question regarding suggestions for future engagement was also asked, with no accompanying multiple-choice question. Table 1 summarizes the number of responses for each question. For questions EV01 and EV02, which centered on participant experiences with the survey invitation and the administration process, respectively—and tended to elicit longer responses—more than half of all participants who responded to a given multiple-choice question also provided an open-ended response.
There are also more responses to the multiple-choice portion than to the open-ended portion. For question EV02, which is used for the combined analysis, we find that 10% of respondents to the open-ended portion of the survey question did not respond to the multiple-choice component. This suggests that these open-ended responses were not a substitute for the multiple-choice component alone. EV06, which asks about participant’s suggestions for future surveys, is also used for the combined analysis, but does not have a corresponding multiple-choice component. For the remaining questions, which are used only for the computational text analysis, question EV01 has a similar number of multiple-choice and open-ended responses to EV02. Like EV02, only 10% of open-ended responses to EV01 had no corresponding multiple-choice response. For questions EV03 and EV04, which are only analysed using computational methods, significantly fewer participants responded to any part of these questions, including the multiple-choice component. In these questions, 63% and 52% of participants responded to the open-ended portion without responding to the multiple-choice component. This may suggest that for questions with lower response rates, the open-ended questions were considered to be a substitute, rather than an add-on.
Data Analysis
To account for the large sample size, the study combines manual thematic analysis with computational text analysis. The computational text analysis included sentiment analysis and topic modelling. Sentiment analysis refers to a broad category of methods used to evaluate the emotional tone of phrases. The sentiment model used in this study is a RoBERTa transformer-based deep learning model, developed by Barbieri et al. (2020). Unlike traditional dictionary-based sentiment analysis, this approach captures contextual sentiment shifts, allowing for a more nuanced interpretation of the emotional tone responses (Barbieri et al., 2020).
To examine the content of the open-text responses, we also employed a topic model. Topic modelling refers to a set of methods that identify clusters of terms (“topics”) that are used together within a set of texts (Hvitfeldt & Silge, 2021). We employ BERTopic, introduced by Grootendorst (2022), which identifies topics within text data. This allows us to identify patterns in participant experiences and suggestions for how to improve the survey design moving forward.
These methods are used at two points in the analysis. First, they are used to select a purposive sample for the manual thematic analysis of two of the five questions: EV02 and EV06. These two questions focus on the participants’ experience completing the survey and suggestions for future engagement, respectively. Given the potential insight these questions will have for future design recommendations, these have been prioritized for analysis using a combination of computational and manual analysis, which will be based on a subsample of the full set of responses. Once the purposive sample was selected, the computational and manual analyses were conducted simultaneously, with multiple iterations where the findings of each approach were compared and discussed by members of the research team to make iterative changes to the manual coding and to consult additional details from the computational text analysis. Additional analysis of the other survey questions using only computational methods is in Online Supplemental Appendix 2.
For EV02 and EV06, a purposive sample was chosen for manual analysis. To identify this sample, we begin with a stratified random sample of the full set of respondents, with strata defined by the sentiment of the open-text responses predicted by the computational text analysis. The predicted sentiment of a response is a continuous value between −1 and +1, with responses assigned a value of −1 being the most negative, and responses assigned a value of +1 being the most positive (Barbieri et al., 2020). The purposive sample was therefore identified as follows. First, to ensure the most polarized perspectives were also captured in the purposive sample, we oversampled the tails, adding the individuals with the 32 most positive and 32 most negative responses to the purposive sample. Then, a random sample of 394 individuals with positive and negative responses, respectively (788 total), was selected. This leads to 852 respondents selected for the purposive sample for each question. A check was conducted to confirm that the distribution of topics represented in the purposive sample was similar to the distribution of topics in the overall sample, though this was not specifically targeted by the selection criteria. The random selection by sentiment was sufficient to ensure a representative distribution of topics within the responses in the purposive sample.
Purposive sampling is intended to identify the set of respondents that are expected to provide insights into the specific topics that researchers are seeking answers to (Kelly et al., 2010). In this study, the purposive sample plays two key roles.
First, the purposive sample is intended to validate the findings of the computational text analysis. This is why the purposive sample has been selected to include a stratified random sample of responses with both positive and negative sentiment. Second, the purposive sample is also intended to provide insights into themes that may not be identified by the topic model. After the purposive sample was selected, the computational text analysis methods were used to analyze responses in both the purposive and the full sample. The computational text analysis was used specifically to provide insights into the sentiment and topics discussed in all responses, using the methods described above.
The computational text analysis of the purposive sample was conducted alongside a manual thematic analysis. An initial codebook (Online Supplemental Appendix 1) was developed using the twenty-six topics identified by the computational text analysis. Using these topics as a starting point, a manual coding process was conducted to refine the categories and capture more nuanced themes not identified by the topic model. The analysis employed an inductive analytical process that involved adding new emergent codes that were not identified by the computational approach, combining topics where there was repetition in the original topics, and removing topics when they were not identified as relevant to the manual analysis. The responses were then systematically coded by assigning one or more of the developed codes to each response. Following Castleberry and Nolen (2018), this was done by multiple coders to ensure consistency in the identification and interpretation of emergent themes.
Following best practices in thematic analysis, each code was applied according to its relevance to the content of the response, and responses that did not fit any of the predefined categories were assigned to new emerging codes (Aurini et al., 2021). During the manual review process, several codes were identified as conceptually equivalent based on consistent patterns in participant language and underlying meaning. For instance, the topics “long duration,” “too long,” and “very long” were consolidated into a single code representing concerns related to time burden when assessing the comparability of the computational and manual approaches. The coding process was iterative, with the team continually refining the codes as new insights emerged. This ensured that the final set of thematic codes was comprehensive and represented the full range of participant experiences.
The computational text analysis and thematic analysis were completed simultaneously. As findings emerged from each approach, they were compared to triangulate the findings across methods. The continuous comparison also led to the development of new codes and the adaptation of emerging codes in the thematic analysis. The findings from each approach were compared to determine how consistent the results were across methods. This follows previous work from small-scale surveys by Guetterman et al. (2018), which compared the findings from qualitative coding to a computational text analysis that measured how similar word pairs were.
Results
Experiences Participating in the Survey
Participants were asked, “How would you rate your experience completing this survey?” with multiple choice options as well as an open-ended follow-up question asking, “Why did you give this rating?” As a first stage, Table 2 shows there is a high concentration of non-responses who selected “Excellent” or “Good” in the first-stage multiple choice question, indicating that people who had a poor experience were more likely to provide open-ended responses. This is consistent with evidence that respondents who are dissatisfied or face barriers during a survey are more likely to provide detailed comments in open-ended fields, introducing a form of nonresponse bias tied to satisfaction (Galesic & Bosnjak, 2009).
Figure 1 shows the distribution of topics identified by the computational text analysis of the open-ended responses to this question, and how often the tone of these responses was positive, neutral, or negative. The most common topics indicate that people found the survey easy to fill out, with several topics referencing “easy” having relatively more positive responses. However, many people found certain question types challenging. There were several recurring themes related to challenges in remembering past events. “Health conditions in the past,” “difficulty remembering,” and “COVID vaccination dates hard to find” were among the most common topics, typically with negative or neutral sentiment. People also had trouble remembering answers related to tanning beds, vaccination dates, and sun exposure, and highlighted technological issues related to usability on different device types.
Similar conclusions were drawn from the thematic analysis. Ease of use themes were the most prominent, reflecting the survey’s accessibility for most respondents. Participants frequently described the survey as “easy,” “straightforward,” or “smooth,” as illustrated in the following quote: “The survey ran smoothly (no technology challenges) and was easy to understand”
These terms consistently emerged from the feedback, underscoring a generally positive impression of the survey’s ease of use. Participants also frequently referenced long survey duration, using phrases such as “too long,” “very long,” or simply “long.” These were then grouped under a single code when assessing the consistency across methods, since these reflect a consistent concern about time commitment (see the codebook in the Online Supplemental Appendix).
A common theme identified by the manual analysis that the computation analysis did not capture was that many respondents experienced feelings of fulfillment, gratitude, and a general sense of accomplishment from participating in the ATP survey. Many participants expressed appreciation for the opportunity to contribute, as illustrated in the participant quotes: “I am happy to take the time to fill in the survey, and am honoured to be part of it.” “I love having the opportunity to participate in the research.”
These findings suggest that most individuals were keen to participate and had a generally positive experience completing the survey. However, both analyses identified several common issues related to the survey program’s functionality. System logouts and restart issues were among the most frequently mentioned problems: “The Program functionality hung up a couple of times and I had to restart. When you restart, it starts at the beginning, question one, and you have to scroll through all of the answers. This was especially painful at 90% completion!” Another participant shared, “It is frustrating when you restart - continue, then you have to start over at the sign-on page, then re-enter the last page you were on.”
This is consistent with broader evidence in the literature, which has shown that program functionality can significantly impact participant experiences with online surveys, and ultimately affect participation rates (Eynon et al., 2017; Gideon, 2012).
The thematic analysis also identified that the design of some questions presented issues. Memory-intensive questions presented significant challenges, as captured in themes “Information Recall” and “5 Years.” Participants frequently remarked that the restriction excluded relevant information, such as diagnoses or events that occurred slightly outside of the five-year period the survey focused on. These challenges are consistent with cognitive response theories, which highlight the difficulty participants face when retrieving memories in survey contexts, especially when the specified time frames do not align with how individuals naturally recall events (Tourangeau et al., 2000).
Other participants expressed confusion about how to answer questions related to parental health if their parents were deceased or if they were adopted, as the survey did not provide a “Not Applicable” option. This left respondents uncertain about how to proceed and raised concerns about the accuracy of their answers. This supports longstanding critiques of fixed-response survey design, which suggest that the absence of appropriate “Not Applicable” or inclusive response options can compromise data accuracy and contribute to participant frustration (Reja et al., 2003; Schonlau & Couper, 2016). Participants, such as those described in quotes, noted that they were forced to select answers that were not reflective of their situation. For example, one participant noted: “You asked questions about when my parents died and then asked me what their diseases were in the last 5 years with no option to indicate that not applicable as they died 10 or more years ago”
Another participant noted: “Just an [sic] bit of a glitch when entering family health history as I was adopted and it wouldn’t accept my ‘Do Not Know’ response. Then there were further questions, where me being adopted didn’t quite fit in and I had to enter - Do Not Know - Adopted”
This was related to the five-year timeline. One participant noted, “
Looking at the computational text analysis presented in Figure 1, we find that the responses assigned to topics related to “difficulty remembering” and “health conditions in the past” often had a negative tone. This suggests that the question wording and response options may be an area where the survey can be adapted to be more user-friendly.
While the different analytical approaches arrived at similar conclusions overall, there are some exceptions. Figure 2 shows where the computational text analysis and thematic analysis identified consistent themes. There are four distinct topics related to the survey associated with the survey being “easy,” which are each associated with largely positive or neutral responses. While some topics labelled as “easy” were consistent across analysis methods, other related topics, such as “easy survey,” were not. This was often the case when responses included multiple themes. For example, the topic model labelled the following quote as “easy survey”: “Survey is not compatible with screen reader usage. Probably not compliant with WCAG 2.0 AA web guidelines for web accessibility. However, call centre staff member was extremely accomodating [sic] with capturing the survey over the phone.”
This quote describes interactions as “easy,” but also describes other, more prominent issues with the program’s functionality. This form of misclassification aligns with critiques of sentiment models, which often struggle to differentiate between conflicting emotional tones or to capture contextual nuance within individual responses (Venkit et al., 2023). In response to this, the response was re-coded during the manual thematic analysis.
Future Engagement Suggestions
This section describes patterns in participant responses to the question “In your opinion, how can we continue to keep you engaged in future data collection?” There is no multiple-choice version of this question; however, we see in Figure 3 that nearly all topics have higher proportions of positive or neutral responses.
Figure 3 also describes the most common topics identified by the computational text analysis. The most common responses included “suggesting more surveys,” “keep doing a great job,” “texts and emails,” and “interested in learning the results.” This is consistent with the generally positive attitudes people expressed towards the survey process and their expressions of feeling connected to the impact of the ATP research. Most negative responses were related to the survey being too long.
This enthusiasm for the survey and participants’ interest in continued participation was echoed by the inductive thematic analysis. Many respondents expressed continued commitment to the project and conveyed satisfaction with existing communication practices, as illustrated by the following quote: “Keep up the good work! I’m glad to be part of such an important research project”
This corresponds with the widely expressed view that people were motivated to participate because they felt there was a lot of value in completing the survey, as demonstrated by the quote: “Continue to make me feel that filling out the form is helpful for research purposes. So long as I feel it will help others, I will try to continue doing it.”
Beyond describing their motivation for participating, many responses focused on the recruitment methods, noting that the current methods (especially email-based invitations and periodic updates) were effective in maintaining engagement. This preference for email invitations is described by one participant as follows: “I like the invitation by email approach the best. Thank you!!!”
The responses also pointed to areas for improvement for future engagement. Excessive or poorly targeted reminders, particularly those delivered via telephone, were perceived as intrusive or “nagging,” including by one participant who noted: “The number of reminders by email, text and telephone were much too frequent. I felt more annoyed than engaged.”
These differing preferences regarding reminder frequency and method underscore the importance of tailored engagement strategies, particularly when engaging with heterogeneous populations (Savard & Kilpatrick, 2022).
Several participants also noted a disconnect between the original focus of the study and the evolving survey content. This was described by one participant as follows: “When I first signed up for this years ago, it was about cancer research. Now it’s about COVID. How did that happen? It’s not what I signed up for.”
This illustrates one participant feeling frustrated that the focus of the surveys had shifted away from cancer research. This observation, along with the “Change Focus of Health Questions” code and the “COVID-19” code, captures participant concerns about the focus of the survey content changing. These were less common than other themes; however, the increased emphasis on COVID-19 appeared to detract from long-term motivation among participants who were motivated by the study’s focus on cancer research.
Figure 4 shows the proportion of the codes that were consistent across the topic model and inductive thematic analysis. Again, the two analytical approaches used led to similar conclusions about strategies for future engagement with the ATP survey. However, there are some notable exceptions. Responses that fell into the “reminders” and “keep doing a great job” topics in the computational text analysis were also assigned to this theme in the manual analysis. However, in general, the topics for this question had lower consistency, often because the topic model did not capture more nuanced themes added through the thematic analysis.
Other Survey Responses
Here, we discuss the findings of the computational text analysis of the three remaining open-ended question responses. These questions asked “Why did you give this rating?” after participants were asked “How would you rate the invitation asking you to participate in Survey 2023?”, “How would you rate your experience completing the occupational history section using the link to the Government of Canada National Occupation Classification (NOC) website?” and “If you opted in to receive text messages, how would you rate the text message reminder for Survey 2023?”. Table 3 indicates that open-text responses are predominantly provided by participants who reported less favorable experiences. This bias should be considered when interpreting the tone and topics in open-text responses.
Figure 5 presents the topics people discussed in the responses to each of these questions, along with the tone they used to discuss each topic. Panels (a) and (b) show that responses related to the invitation process were particularly positive, and, like the responses to future engagement and survey responses, emphasized the importance of the ATP. There was more favorable sentiment associated with the length to complete noted in the invitation. Considered alongside earlier findings about survey length, this may indicate some mismatch between the expected and actual survey time.
Panels (c) and (d) indicate that the general sentiment associated with the NOC codes was more negative. The instructions were the most common topic discussed, and while 29% of respondents used a negative tone in these responses, 34% also adopted a positive tone, suggesting the instructions were not well-suited to everyone. Less common topics pointed to more negative responses related to “job or occupation,” “finding code,” and other similar topics, which suggests that respondents may have been frustrated when they could not find their occupation in the NOC code system.
Panels (e) and (f) indicate that respondents’ preferences for communication methods differed widely. Specifically, respondents appeared to generally like the reminders but strongly disliked receiving text messages. This is generally consistent with suggestions people had for future engagement.
Discussion
This study has several important implications for qualitative research. First, this highlights aspects of design features for qualitative questions in surveys that are critical to support participant engagement. Respondents consistently noted that the survey was easy to complete. Respondents also regularly reported feeling grateful to have the opportunity to participate in the survey and felt that by completing the surveys, they were contributing to something important. This sentiment, combined with the project’s regular participation reminders—particularly reminders sent by email—led many to feel satisfied with their participation in the survey both now and in the future. This is consistent with other research in the survey methods literature, which has found that survey participation is higher for some people when they feel their involvement will have a positive impact (Singer, 2002). This consideration is especially important for maintaining participation in longitudinal studies, where perceived alignment with the study’s original goals and anticipated community benefit may help sustain response rates over time (George et al., 2014). Some participants also expressed concern that the focus of the survey appeared to be drifting from its original purpose. Together, this emphasizes the importance of informing participants about the purpose and benefits of a survey to increase participation rates in health surveys.
Furthermore, by combining text analysis and manual review, the research design employed here allowed the analysis to uncover several key challenges that could not be captured by the associated multiple-choice questions alone. Many participants reported challenges remembering health events that happened in the distant past, particularly for events that had happened more than five years ago. This aligns with previous research related to cognitive burdens in retrospective surveys (Tourangeau et al., 2000). More specific suggestions to allow for more inclusive options for questions related to participants’ parents highlighted that providing alternative options may be needed for individuals who were adopted or who had deceased parents. There were also several complaints about the survey program’s functionality. There were multiple reports of system logouts and challenges when the survey had to be restarted. Previous research has shown that these kinds of technical issues can significantly reduce completion rates for online surveys (Eynon et al., 2017; Galesic & Bosnjak, 2009; Gideon, 2012), which suggests that addressing these issues could increase completion rates in future survey iterations. Participant feedback was incorporated during multiple instances in the survey design process. These findings highlight how, even with participant input incorporated early in the process, this kind of analysis can identify areas for additional improvements for survey design.
This study makes several important methodological contributions. These findings suggest that the manual and computational text analysis generated similar overarching conclusions, with a third of responses being coded with the same topic/theme across methods. In general, when a topic model identified several similar themes, there was greater consistency when the topic itself was simple and direct. For example, responses categorized as “easy” by the original topic model were also classified as easy by the thematic analysis. This is consistent with past comparative work by Lee et al. (2022) and Alla et al. (2018), which have shown that topic modelling and manual thematic analysis were both able to consistently identify major themes. Similar consistencies across methods have also been observed with other computational text mining methods measuring word similarity and manual qualitative coding (Guetterman et al., 2018). However, previous comparative work has focused on small-scale surveys or small samples of qualitative interviews, where the benefits of computational approaches are more limited.
However, our findings identify that even major themes were less likely to be consistently coded across methods when the topic was less direct or contained multiple distinct concepts. For example, responses labelled as “easy survey” by the original topic model were only classified as this less than 20% of the time in the thematic analysis. Similarly, when asked how to improve future surveys, the manual review identified that in responses labelled as “suggest more surveys,” the topic model was broadly grouping suggestions to both shorten, restructure, and not change the surveys, suggesting the topic model over-generalized these responses. This observation is consistent with critiques of topic modeling methods, which frequently encounter difficulties in identifying overlapping themes and may conflate semantically distinct ideas under a single topic label (Ferrario & Stantcheva, 2022; Grimmer et al., 2022). This builds upon previous studies that have shown manual content analysis can add important contextual information that computational approaches overlooked or missed in small samples (Alla et al., 2018; Guetterman et al., 2018).
These kinds of discrepancies across analysis methods may be due to responses that contained multiple themes that the topic model could not adequately capture. Figures A3.1 and A3.2 in the Online Supplemental Appendix suggest that, in general, the analytical methods yielded more similar conclusions when the responses were of “moderate” length. For the shortest and longest responses, the methods were more likely to yield inconsistent conclusions. These discrepancies highlight the limitations of computational approaches alone. Recent literature points to the integration of human coding with computational tools, noting that semi-automated approaches can benefit efficiency while maintaining the nuance necessary to address ambiguity in responses (Grimmer et al., 2022; Schonlau & Couper, 2016).
Together, these findings illustrate the benefits of using computational text analysis to process large volumes of open-ended responses from surveys. For large cohort studies like ATP, using these methods could allow for broader use of open-ended response questions in surveys. Indeed, we find that both approaches can capture major themes within open-ended survey responses, which should make practitioners more confident in the main findings from computational text analyses. At the same time, these findings suggest that computational approaches cannot fully capture the nuance of all responses. Using a small purposive sample may be a helpful first step for practitioners interested in identifying topics that are more frequently inconsistent across methods. This can help determine which topics should be automatically subject to manual review to ensure the perspectives of participants assigned to these topics are adequately understood. Further research is needed to systematically identify criteria for when each method is appropriate based on the characteristics of a response or the topic a response is assigned to by a computational first step.
This study was not without limitations. The study is based on individuals who voluntarily participated in and completed an optional portion of the ATP survey and may not be representative of the broader population that the survey is targeting. This may limit the generalizability of the study findings. This is particularly true since the ATP survey is longitudinal and participants have chosen to participate, meaning they may not be representative of the broader population. The longitudinal nature also suggests that participants have a greater incentive to participate in feedback about the survey process (since they expect to be invited to participate in future surveys). The findings may also not generalize to other kinds of open-ended questions in surveys that are not specifically seeking feedback and instead involve broader themes. Additionally, the sentiment analysis used for the computational text analysis is dependent on models that have been developed based on qualitative data from social media posts, which may have a different structure than the open-ended survey responses (Venkit et al., 2023). For example, while transformer-based models such as TweetEval (Barbieri et al., 2020) are powerful, they were trained on short social media posts and may have limited generalizability to structured research contexts without domain-specific adaptation (Venkit et al., 2023).
Conclusion
This study combines text mining with traditional thematic analysis to analyze open-ended survey responses from large-scale surveys. The study findings suggest that most participants had positive experiences with the ATP’s survey process. These responses highlighted the importance of ensuring a survey was easy for respondents to complete and that respondents received reminders to participate using their preferred communication method. The findings suggest that addressing challenges related to memory recall, question relevance, and program functionality could improve user experience and data quality. The triangulation approach employed for the study highlights the benefits of using computational methods to efficiently process large volumes of data while allowing for human validation to capture nuanced perspectives. These results suggest that the findings of both computational text mining and manual thematic analysis yield similar conclusions overall. However, the computational text mining methods can overgeneralize responses with multiple themes and miscategorize informationally sparse responses. This suggests that a combination of analytical approaches can be beneficial when analyzing large volumes of qualitative survey responses.
Supplemental Material
Supplemental Material - Combining Text Mining and Manual Thematic Analysis to Understand Participant Experiences With Surveys
Supplemental Material for Combining Text Mining and Manual Thematic Analysis to Understand Participant Experiences With Surveys by Ardyn Nordstrom, Aditya Maheshwari, Rheann Quenneville, Grace Shen-Tu in International Journal of Qualitative Methods
Footnotes
Acknowledgements
Ethical Considerations
Funding
Declaration of Conflicting Interests
Data Availability Statement
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
