Abstract
Keywords
Access to research data supports a central tenet of open research, that “access to scientific knowledge should be as open as possible” (UNESCO, 2021, p. 36). Data availability enables the verification of past findings and accelerates the discovery of new findings through reanalysis and evidence synthesis (Fecher et al., 2015; Hardwicke et al., 2018; J. N. Towse, Ellis, & Towse, 2020). Accordingly, data availability is the basis of transparent, effective research systems that create credible conclusions, democratize access to knowledge, and underpin equitable innovation (Concordat on Open Research Data, 2016; G7 Science and Technology Ministers, 2023; UNESCO, 2021). The research community is increasingly recognizing the value of data sharing in these pursuits. Most significantly, the recent UNESCO Recommendation on Open Science positions open research and data as a global research priority that can improve the reliability of evidence needed for decision-making and policy (UNESCO, 2021). However, despite the role of shared data in addressing global environmental, economic, and social issues (UNESCO, 2021), many researchers are not yet engaging meaningfully with such behaviors (see Gabelica et al., 2022; Hardwicke et al., 2018; J. N. Towse, Ellis, & Towse, 2020). In the present qualitative research, we use a behavior-change framework to determine the barriers and enablers that researchers experience when (considering) engaging with data-sharing behaviors with a view to informing the design of future interventions.
Although formal data sharing has existed for more than 100 years (Branney et al., 2019; Karhulahti, 2023; Sieber, 2015), it was the digital age and electronic access to data that created the conditions to facilitate widespread sharing. The broad recognition of the value of data sharing has occurred simultaneously across the sector, and funders, journals, societies, universities, and researchers have all advocated for data sharing and creating top-down initiatives. In the UK, the largest national funding agency, the UKRI (formerly RCUK), has had their Common Principles on Data policy since 2011 (UKRI, personal communication, June 6, 2023). Likewise, the country’s largest charity funder, Wellcome, launched its policy in 2007, the current iteration of which actively encourages data-management and data-sharing costs to be included in grant applications (Wellcome, 2017). A diverse group of stakeholders, including funders and publishers, developed the FAIR (Findable, Accessible, Interoperable, and Reusable) Data Principles, a set of guidelines for enhancing the reusability of data (Wilkinson et al., 2016). Many publishers have their own data-sharing policies (e.g., Bloom et al., 2014), and it is also a key component of the Transparency and Openness Promotion (TOP) guidelines (Nosek et al., 2015), a tool to support the implementation of open-research practices at journal level. Simultaneously, universities and other organizations have institutional-level policies and are providing support for data storage through managed external data archives (e.g., UK Data Service), institutional data repositories, or general-purpose services (e.g., Zenodo).
Although researchers are only part of this wider data-sharing ecosystem (Borgman & Bourne, 2022), ultimately, it is individual researchers who are responsible for the act of data sharing (Bezuidenhout & Chakauya, 2018). Research has consistently shown that overall, researchers view data sharing as positive and important (e.g., Cheah et al., 2015; Digital Science et al., 2022; Farran et al., 2020; Fleming et al., 2022; Soeharjono & Roche, 2021; Van den Eynden et al., 2016) and that lack of access to data is an impediment to research progress (Tenopir et al., 2011). Measures of engagement show progress, as illustrated by the global 2022 the State of Open Data survey, in which 35% of respondents reported being familiar with FAIR principles, 1 up from 28% the previous year and the highest percentage since the question was first asked in 2018. Yet despite this positive momentum, implementation is often low (e.g., Farran et al., 2020; Fleming et al., 2022; Rowhani-Farid & Barnett, 2016) and may fall short of accepted standards. For example, when authors of articles with data-availability statements (indicating that data are available on request) were asked to share their data, 93% failed to reply or declined to share their data, and only 6.8% shared the requested data (Gabelica et al., 2022). This attitude-behavior gap raises questions about the barriers preventing researchers from sharing their research data (Fecher et al., 2015).
In the present research, we use UNESCO’s (2021) definition of open-research data as data that include, among others, digital and analogue data, both raw and processed, and the accompanying metadata, as well as numerical scores, textual records, images and sounds, protocols, analysis code and workflows that can be openly used, reused, retained and redistributed by anyone, subject to acknowledgement. Open research data are available in a timely and user-friendly, human- and machine-readable and actionable format, in accordance with principles of good data governance and stewardship, notably the FAIR (Findable, Accessible, Interoperable, and Reusable) principles, supported by regular curation and maintenance. (p. 9)
This broad definition allows us to consider data as the evidence that underlies research publications and therefore applies across a range of academic disciplines. In the present research, we use the term “data sharing” 2 to also include data that by necessity (i.e., for various legal, ethical, or commercial reasons) are not openly available but that are made accessible to specific users according to defined access criteria.
Benefits of Data Sharing
There are many potential benefits of data sharing at individual researcher (C. Allen & Mehler, 2019; McKiernan et al., 2016), research-community (e.g., Milham et al., 2018), and societal levels (e.g., Besançon et al., 2021). Access to data leads to a more equitable distribution of opportunities and promotes inclusion (Digital Science et al., 2022; UNESCO, 2021). Reuse of data facilitates greater efficiency, effectiveness, and innovation by using the same resources multiple times to create new knowledge (Burgelman et al., 2019; DuBois et al., 2018) rather than duplicating research efforts. Increased transparency and more focus on reproducibility enables verification of findings and reanalysis when improved methods are developed. Specifically at the researcher level, sharing data enhances the visibility of research, and this can lead to a citation advantage (Piwowar et al., 2007; Piwowar & Vision, 2013) and more opportunities to collaborate (McKiernan et al., 2016). However, many of these potential benefits are distal compared with the more proximal challenges posed by sharing data and the ever-present pressure to publish frequently and thereby increase the chances of employment, promotion, and funding (Munafò et al., 2017).
Concerns About Data Sharing
Debates about data sharing commonly focus on qualitative human data (Karhulahti, 2023) and point particularly to concerns over epistemology, informed consent, and privacy (e.g., Parry & Mauthner, 2004). Issues of epistemology relate to the reflexive, subjective, and contextually bound nature of qualitative research that suggests that reuse could lead to misinterpretation (e.g., Broom et al., 2009). The key concerns raised about informed consent are whether researchers are less willing to be candid about sensitive topics (MacLean et al., 2019) and whether participants truly understand the implications of consent (Parry & Mauthner, 2004). Relatedly, concerns have been raised about ensuring anonymization of qualitative data, particularly for sensitive data or small, potentially reidentifiable communities (Broom et al., 2009; Parry & Mauthner, 2004). However, it is possible for these issues to be overcome with careful planning and sufficient resources (for proposed solutions, see Bishop, 2005; Branney et al., 2019, 2023; DuBois et al., 2018; Karhulahti, 2023). Furthermore, the majority of participants consent to share their deidentified data (e.g., Mozersky et al., 2020), even for research on sensitive topics such as abortion (VandeVusse et al., 2022) and general-practitioner–patient conversations (Amelung et al., 2020; discussed in Whitaker, 2021), citing helping others as their primary motivation (VandeVusse et al., 2022).
Data Sharing as Behavior
The term “data sharing”’ encompasses a range of behaviors that occur across the research life cycle, taking place before (e.g., preparing consent forms), during (e.g., recording exclusions), and after the research (e.g., depositing the data in a repository). Behaviors do not occur in isolation but in systems of behaviors that interact with and depend on one another (Michie et al., 2014). This interdependence means that if one data-sharing behavior does not occur, this may ultimately prevent data from being shared (see Norris & O’Connor, 2019). For example, omitting information about future data sharing from participant-consent forms or failing to secure suitable funding for data archiving may preclude the data from being shared.
For the purposes of the present research, we are interested in individual researchers’ data-sharing behaviors. Here, we provide a synthesized list of the key behaviors that comprise an idealized data-sharing process at the individual-researcher level. 3 Not all behaviors listed are required to meet the overarching behavior of data sharing (e.g., ethics is not required for all research); essential behaviors are noted. We used our interviews to explore this list of behaviors and check that we had not missed any behaviors:
Seek out skills and resources: seeking out and engaging with educational resources and/or participating in training to learn about what constitutes “data”; the benefits of sharing; how to share data within ethical, intellectual property (IP), and commercial constraints; and how to handle sensitive data. Reading and complying with university and funder mandates. Seeking practical, financial, or motivational support from peers, colleagues, ethics committees, prebid teams, funders, and other facilitators. For example, applying for funding to support data preparation and storage.
Create a data-management plan: creating a data-management plan that outlines what types of data will be collected and how researchers will handle the data during and after the study. The plan should address all stages of the research life cycle from planning through sharing. Data-management plans are required for some funding applications.
Obtain ethics: submitting an ethics application that includes plans to share data and details of how this will be done. For example, anticipating terms of access.
Precursor behaviors: carrying out data-sharing precursor behaviors throughout study design and the active project phase. For example, preparing participant information sheets and consent forms to gain consent from participants to share their data or acquiring agreement from other stakeholders to share the project data. Then during the active project phase, collecting and analyzing data with reuse in mind.
Prepare and manage data (essential behavior): preparing data for sharing by following relevant standards (e.g., FAIR) and disciplinary norms to ensure that data will be findable, accessible, interoperable, and reusable. This behavior includes storing, naming, and versioning the data in a format that can be shared and creating documentation and metadata. For personal sensitive data, this would include anonymizing it (i.e., removing identifying information to protect participants’ identities), or for commercial data/IP protection, this might include aggregation.
Deposit data (essential behavior): depositing the data and metadata in a repository and providing reuse guidance by adding a license. For sensitive data, shielding may be required in the form of access control, that is, specifying the conditions under which the data can be accessed. The data may be placed under a reasonable embargo, for example, to delay the release of the data to coincide with a publication or end of project or to protect first-use rights.
Ultimately, the aim of data sharing is to facilitate reusability and subsequent new knowledge. To enhance the value and reusability of data (A. S. Towse et al., 2021), it should comply with the FAIR data principles (Wilkinson et al., 2016). Therefore, the core data-sharing steps—preparing (5 in list above) and depositing data (6 in list above)—should be carried out with reuse in mind: ensuring that data are stored in a suitable permanent repository, with rich metadata, clearly labeled and described to ensure it can be independently understood, in a future-proof and ideally nonproprietary format, with a global persistent identifier and an appropriate, preferably open, license (e.g., CC BY). Without these provisions, data have limited reusability (J. N. Towse, Ellis, & Towse, 2020).
Whether researchers decide to adopt data-sharing behaviors is a behavioral question (Norris & O’Connor, 2019; Osborne & Norris, 2022), and behavior-change theory has the potential to help understand and improve adoption and maintenance of such behaviors (Norris & O’Connor, 2019). The present research has been developed using the capability, opportunity, motivation–behavior (COM-B) model from the behavior-change wheel (BCW; see Fig. 1; Michie et al., 2011, 2014): The BCW is a layered framework designed to guide the development of theory-based behavior change from analysis to intervention design (Michie et al., 2014). We selected this framework because it can be applied to behavior across different fields and contexts and was developed based on overcoming the limitations of 19 multidisciplinary frameworks (Michie et al., 2011). It has recently been applied in the domain of open research to develop interventions to increase the uptake of preregistration among researchers (Osborne & Norris, 2022) and to investigate the barriers and enablers to implementing the TOP guidelines (Naaman et al., 2023).

The behavior-change wheel from Michie et al. (2014). The green ring shows influences on behavior, the red ring shows intervention types, and the gray ring represents policy options.
The COM-B model is at the center of the BCW (Fig. 1, green ring) and is used to perform a behavioral diagnosis. This process involves identifying a target behavior; investigating individual, sociocultural, and environmental influences (i.e., barriers that decrease the likelihood of the behavior occurring and enablers that increase the likelihood); and assessing what needs to change in terms of capability, opportunity, and motivation. These three components are part of an interacting system and must be present in sufficient amount for the behavior to occur: Capability is the individual’s physical and psychological ability to enact a behavior, opportunity refers to the physical and social environment that enables behavior, and motivation constitutes the reflective (i.e., rational choice) and automatic (i.e., feelings, habits) mechanisms that activate or inhibit behavior (Michie et al., 2011, 2014). To change behavior, one or more of the components must change to reconfigure the system. The choice of behavior-change intervention should be evidence-based and informed by the factors that influence current behavior to develop something that might be most effective in the specific setting (Hulscher & Prins, 2017).
In addition to COM-B, the theoretical-domains framework (TDF; Atkins et al., 2017; Cane et al., 2012) was used in the current study for the development of the interview schedule and analysis. This validated integrative theoretical framework (Cane et al., 2012) comprises 14 domains (knowledge; skills; memory, attention, and decision processes; behavioral regulation; social/professional role and identity; beliefs about capabilities; optimism; beliefs about consequences; intentions; goals; reinforcement; emotions; environmental context and resources; and social influences; Cane et al., 2012), which map to the three COM-B components (see Fig. 2) and can provide a granular understanding of behavior (Michie et al., 2014).

The theoretical-domains framework (TDF) mapped to the subconstructs of capability, opportunity, and motivation from the capability, opportunity, motivation–behavior (COM-B) model. Reproduced from Chater et al. (2022).
Barriers and Enablers to Data Sharing
Despite important reasons to share data, including individual career-based reasons (C. Allen & Mehler, 2019; Markowetz, 2015; McKiernan et al., 2016), many researchers do not share their data because of perceived costs (Abele-Brehm et al., 2019; Miyakawa, 2020) and lack of incentives (Adimoelja & Athreya, 2022; Chawinga & Zinn, 2019). With data sharing becoming an increasing priority across the sector, the determinants of researchers’ attitudes and behaviors to data sharing have received some scholarly interest. Existing research, spanning various disciplines and geographical areas, has largely focused on real and perceived barriers and has used survey formats. Below, we discuss current evidence categorized according to the three COM-B components.
Barriers
Opportunity
Lack of resources is regularly reported as a barrier to data sharing (Fecher et al., 2015). For example, in a survey of more than 13,000 scientists conducted in 2009–2010, insufficient time and funding were the most frequently named barriers to data sharing, cited by 55% and 40% of respondents, respectively (Tenopir et al., 2011). The fact that time is a frequently highlighted barrier (Astell et al., 2018; Chawinga & Zinn, 2019; Cheah et al., 2015; Farran et al., 2020; Houtkoop et al., 2018; Van den Eynden et al., 2016) is unsurprising because it is well acknowledged that academics have increasingly untenable workloads (Hostler, 2023; Long et al., 2020). Data sharing has the potential to increase research efficiency in the medium to long term at a systems level, but in the short term and at the individual level, such behaviors increase workload and require more time and effort compared with “closed” research (Gomes et al., 2022; Hostler, 2023). Other opportunity-related barriers relate to physical resources: In low- to middle-income countries, lack of specialized data-management expertise (Cheah et al., 2015) and infrastructure issues, such as lack of current hardware, software, and suitable internet access (Bezuidenhout & Chakauya, 2018), also pose a challenge.
Capability
Acknowledged barriers also include lack of knowledge and skills (Chawinga & Zinn, 2019), resulting in researchers not feeling fully equipped to complete data-sharing tasks (Tenopir et al., 2015). Participants report that they have not learned how to share data (Houtkoop et al., 2018) and lack knowledge about how to share data in a useful way (Astell et al., 2018). The variety of available repositories and the lack of integration between them also poses a challenge in terms of selecting the most suitable storage (Astell et al., 2018). Researchers report a lack of knowledge about copyright, licensing (Astell et al., 2018; Farran et al., 2020), ethics, and confidentiality issues that can affect data sharing (Gownaris et al., 2022).
Motivation
In a survey of 600 psychologists asked about 15 barriers, data sharing being uncommon in their field was selected as the most relevant reason for not sharing data (Houtkoop et al., 2018). Other studies have shown that researchers might not share data because of fear of the implications, for example, the possibility of compromising confidentiality and harming research participants if they can be identified, particularly for sensitive data or stigmatized communities (Cheah et al., 2015). Researchers are also concerned that their research reputation could be harmed (Cheah et al., 2015) if they are scooped (Bezuidenhout & Chakauya, 2018; Soeharjono & Roche, 2021) or if others who have insufficient information and context to understand the data misinterpret or misuse it (Bezuidenhout & Chakauya, 2018; Gomes et al., 2022; Sayogo & Pardo, 2013; Soeharjono & Roche, 2021; Tenopir et al., 2015; Van den Eynden et al., 2016) or even find errors in the data (Gomes et al., 2022). Furthermore, previous research has found that lack of credit and appropriate attribution when others reuse data is a barrier (Cheah et al., 2015; Farran et al., 2020; Gownaris et al., 2022).
Enablers
Opportunity
To be able to share data, researchers require opportunities, including suitable infrastructure, that is, technical, legal, financial, and time-allocation support from institutions and funders (European Commission, 2017). For example, the availability of a data repository has a significant influence on STEM researchers sharing data (Kim & Zhang, 2015), and Wellcome-funded researchers cited funding to cover the costs of data preparation as their biggest motivator (Van den Eynden et al., 2016). Researchers who work solely on research and do not have time-consuming teaching obligations are more likely to share their data (Tenopir et al., 2011). Likewise, researchers were more likely to share their data if minimal effort was required (Wallis et al., 2013). Opportunity also includes social opportunities, such as institutions providing a positive research culture in which data sharing is recognized and rewarded (Huang et al., 2012).
Capability
Researchers must have the necessary skills to carry out the various subbehaviors that comprise data sharing. This includes not just knowledge and skills about how to share data but also planning during study-design phases. Reanalysis of data from Tenopir et al. (2011) found that having data-management skills increased the likelihood of data sharing (Sayogo & Pardo, 2013).
Motivation
Researchers who perceive career benefits to data sharing are more likely to have positive attitudes toward it and engage in more data-sharing behaviors (Kim & Zhang, 2015). Direct personal benefits, such as data sharing being looked on favorably in funding and promotion decisions, and enhanced reputation are also motivating factors (Van den Eynden et al., 2016). In the aforementioned survey of psychologists, mandates to share data from funders or institutions were ranked top of the conditions most likely to encourage data sharing (Houtkoop et al., 2018). Increased impact, visibility, and opportunities for collaboration are cited as incentives to share data (Digital Science et al., 2022; Farran et al., 2020; Van den Eynden & Bishop, 2014). When their data are reused, researchers consider acknowledgment or citation to be essential (Digital Science et al., 2022; Sayogo & Pardo, 2013; Tenopir et al., 2015). But researchers also recognize broader incentives of public benefit and transparency and reuse (Farran et al., 2020).
Research Questions
The majority of research on factors influencing researchers’ data-sharing behaviors is based on survey data and focuses on barriers; a more comprehensive and nuanced understanding is missing. For example, in survey responses, one cannot disentangle the often cited barrier “lack of time” from a lack of motivation to prioritize data sharing because it is not incentivized. Like other behaviors, data sharing is not stable within an individual and may vary across time (Corker, 2018; Norris & O’Connor, 2019) based on internal factors, such as motivation and habit, and external factors, including resources and project priorities (Kwasnicka et al., 2016; Norris & O’Connor, 2019). Therefore, for researchers who are currently engaging or have engaged with data-sharing behaviors, we are interested in understanding what facilitated these behaviors and what needs to change in the system to ensure maintenance and adoption by others.
Given the centrality of shared data in accelerating knowledge and solving global social issues (UNESCO, 2021), more thorough insight into the barriers and enablers to data sharing is important. Such an understanding can help facilitate the future development of effective behavior-change interventions. From this perspective, we are particularly interested in participants from one university because the insights from this study will be used by the university to develop future interventions to encourage data sharing. The overall aim of this study is to draw on the COM-B model and TDF to explore the factors that help and hinder researchers in sharing their research data. To do so, we conducted qualitative interviews and analyzed them using thematic template analysis. Our research question is as follows:
The results are presented in written format and synthesized visually in the form of a behavioral map that plots data-sharing behaviors and their dependencies within the broader university system and shows relationships between actors, behaviors, and influences (barriers and enablers).
Method
Design
The study is a qualitative Registered Report (Henderson et al., 2023). It consists of semistructured qualitative interviews with researchers carried out during November and December 2023. An interview design was selected to allow an in-depth exploration of the topic that extends beyond the strictures of quantitative surveys and enables participants to talk about their individual experiences and the barriers and enablers that are particularly pertinent for them. Interviews help to ensure that voices across different disciplines and career levels are given equal opportunity to be heard, and a semistructured approach allows for prompts to help obtain further details. Furthermore, because open-research-related terminology differs between disciplines, a one-to-one approach would help minimize misunderstandings that might have occurred in a focus-group setting or via a survey.
We supplemented the methodological details below by completing the Consolidated Criteria for Reporting Qualitative Research (COREQ; Tong et al., 2007), a 32-item checklist for reporting key aspects of qualitative research (see “Materials & Procedures” component on OSF, https://osf.io/w3sfq/files/osfstorage). We note that the COREQ is controversial; criticisms include the inability to replicate the development of COREQ (Buus & Perron, 2020) and a focus on data saturation (Braun & Clarke, 2021c). In the present study, the COREQ checklist did not guide our decisions but provides a quick summary of the research. In addition, we supplement interviewer characteristics by providing positionality statements (see “Positionality” component on OSF, https://osf.io/d4sjk/files/osfstorage).
The research received a favorable opinion from E. L. Henderson and R. Abrams’s university’s Research Ethics Committee (FHMS 22-23 072 EGA).
Recruitment and participants
Purposive sampling was used to recruit research-active staff and PhD students working at a university in the south of England. We deliberately recruited only researchers who are aware of or practice open research to ensure that participants could talk about their experiences of barriers and enablers to data sharing. Inclusion criteria included the following: researchers who produce potentially shareable data in their research or work in a team that does so and self-report one or more of the following: (a) have shared data once or more; (b) have experience using one or more of the following open-research practices: open software/code, preregistration or Registered Reports, preprints, open monographs, open educational resources; or (c) are aware of two or more of the aforementioned open-research practices and have considered data sharing but have not yet engaged with it.
Statistical generalizability is not the goal of qualitative research; rather, we aimed to provide rich knowledge that reveals the breadth of participant experiences (Smith, 2018). To maximize diversity in our target group, we recruited participants to include a range of the following characteristics: career stages, genders, disciplines, and experience with data sharing (the latter being as per the inclusion criteria above). As a minimum, we ensured that our final sample included one female and one male participant from each of the four career stages (see Table 1), one participant from each of the three broad research discipline (STEM, social sciences, and humanities), two participants from ethnic groups other than White British, and two participants who had not shared data.
Participant Demographics
The open-research practices we consider relevant are open software/code, preregistration or Registered Reports, preprints, open monographs, and open educational resources.
The first round of recruitment was conducted before submitting the Stage 1 Registered Report because an apt opportunity occurred for people to express interest in taking part in the study: Initially, potential participants were identified based on their contribution to a prior survey, led by the UK Reproducibility Network (UKRN), that ran in early 2023 and investigated attitudes toward and experience in open research. After completing the UKRN survey, if the potential participants were interested, they were directed to a short, separate sign-up survey in which they were asked, “How important do you believe open research is to your field?” and “Thinking about one of your recent research projects, did you/do you plan to make your research data open (i.e., information you collect, observe, generate or create as part of your research)?” Twenty people indicated their interest in being interviewed (one of whom did not work with data and was therefore not eligible). To ensure diversity on the characteristics mentioned above, we recruited additional participants by advertising the study internally at the university via email (see “Materials & Procedures” component on OSF, https://osf.io/w3sfq/files/osfstorage). This round of recruitment was conducted after in-principle acceptance of the Stage 1 Registered Report. All potential participants (including individuals that had already shown interest) completed a short screening survey to assess them against the inclusion criteria and to collect demographic information relating to our characteristics of interest: career stage, gender, discipline, and additional demographics, that is, age and ethnicity (see “Materials & Procedures” component on OSF, https://osf.io/w3sfq/files/osfstorage). Answers were assessed against the inclusion criteria. If all criteria were met, participants were invited for interview. To ensure that pseudonyms were allocated respectfully, the survey asked participants to provide their own pseudonym (R. E. S. Allen & Wiles, 2016).
Personal data from the recruitment and screening survey were password protected and stored in a separate folder to the pseudonymized participant interviews.
Sample-size justification
A priori, we set a minimum sample size and a maximum stopping rule. As described in the Data Analysis section below, our use of template analysis sits on the spectrum between codebook and reflexive thematic analysis, and therefore, data saturation is theoretically incoherent (see Braun & Clarke, 2021c).
Information power
Information power proposes that the more relevant information a sample holds, the fewer participants are required (Malterud et al., 2015). Five dimensions affect information power: (a) study aim—information power increases with a narrower research question and decreases with a broader question; (b) sample specificity—a sample comprising participants with characteristics and knowledge highly relevant to the research has high information power; (c) established theory—applying established theories increases information power; (d) quality of dialogue—if the data are rich, fewer participants are required; and (e) analysis strategy—single-case or cross-case analysis decreases information power (Malterud et al., 2015). In summary, studies with focused research questions, participants specific to the study aim, and rich data that are supported by theory and analyzed using in-depth exploration of narratives have higher information power and require smaller samples (Malterud et al., 2015).
In this study, we had dense sample specificity because participants were purposively recruited based on their knowledge and/or experience of data sharing, the semistructured interview format promoted good quality of dialogue, we used established theory to design and interpret the study, and we did not use single-case or cross-case analysis. However, our research question was neither broad nor narrow because although the topic—data sharing—is narrow, we asked it in the context of researchers across disciplines and career stages. Overall, information power considerations suggest a smaller sample size.
Previous research on qualitative sample sizes
Braun and Clarke (2013) typically recommended a sample size of 10 to 20 for a medium thematic analysis project using interviews. Notwithstanding our comments above about data saturation, we note that a recent systematic review of qualitative sample sizes found that on average, 12 to 13 interviews reached saturation (Hennink & Kaiser, 2022), confirming previous work that also reported saturation at 12 interviews (Guest et al., 2006).
Pragmatic resource constraints
We also considered pragmatic constraints related to funding (limited internal funding) and time (Emma L. Henderson’s (ELH) temporary contract and the time pressure that researchers, our participants, are under). Because of these resource constraints, we set the maximum number of interviews to 20.
Sample size
Our aim was to capture the depth and nuances of the topic in relation to the research questions while avoiding research waste in terms of funding and participant time. Based on the above three considerations, we set an anticipated lower sample size of 12 and an upper sample size of 20. The final sample size was decided in situ via discussion with the research team, who considered “the adequacy (richness, complexity) of the data for addressing the research question” (Braun & Clarke, 2021c, p. 211). A discussion was led by R. Abrams after 14 interviews were collected. This deviated from the original Stage 1, which intended for a discussion to take place after 12 interviews. However, because 14 interviews were necessary to fulfill our sampling criteria, the discussion took place after 14 interviews. At this point, the team agreed that the sample size was adequate based on patterns in the data demonstrating a range of perspectives and similarities.
Participant demographics of our final sample are presented in Table 1. In total, we interviewed 14 participants (10 from our original recruitment method and four from additional recruitment) ages 32 to 83 years. Eleven participants had previously shared their data, and three had not.
Materials
A 26-item interview schedule was used to identify the barriers and enablers to data-sharing behavior (see Table 2). Interview questions were created informed by the COM-B model (Michie et al., 2011, 2014) and TDF (Atkins et al., 2017; Cane et al., 2012) and developed to extend previous work that suggests that opportunity-related factors, such as time and resources; capability-related factors, such as knowledge and skills; and motivation-related factors, such as incentives, are barriers and enablers to data sharing. The schedule covered all COM-B constructs and TDF domains apart from physical capability because we assume that if researchers are physically capable of conducting research, they are also capable of sharing data. The interview schedule was piloted in May 2023 with a participant who is familiar with open-research practices. The questions were subsequently modified to ensure clarity. For details of how the interview was introduced and closed, see the “Materials & Procedures” component on OSF, https://osf.io/w3sfq/files/osfstorage.
Interview Schedule Informed by COM-B and the TDF
Note: COM-B = capability, opportunity, motivation–behavior; TDF = theoretical-domains framework.
Procedure
One-to-one semistructured interviews were conducted by R. Abrams) online via Teams. 4 Participants were provided with the information sheet and consent form (see “Materials & Procedures” component on OSF, https://osf.io/w3sfq/files/osfstorage) via email a minimum of 3 days before the interview. Participants were advised that they may withdraw their data at any point and up to 1 month after interview completion without providing a reason. The information sheet explained that pseudonymized transcriptions of the interviews would be made openly available.
Interviews lasted approximately 1 hr, during which both participants and the interviewer had their cameras on. At the start of the interview, the researcher explained the purpose of the research, mentioned that participants may ask for a break or withdraw at any time, and reminded them that the interview is being recorded. Questions were asked in the same fixed order for all participants (Table 2) the majority of the time. However, given the semistructured nature of the interviews and the need to respond to participants, there were times when questions were also asked at different points and earlier questions returned to if they had not been covered in the intended order. After completion of the interview, participants were thanked and provided with a debrief (see “Materials & Procedures” component on OSF, https://osf.io/w3sfq/files/osfstorage). They were offered the opportunity to review their pseudonymized transcripts for the purpose of highlighting any parts that they did not wish to share. Four participants took up this opportunity, but none wished for any redactions. Participants received a £50 Amazon voucher via email in return for participation.
Data analysis
Interviews were video recorded and transcribed using Otter.ai and Word’s automated audio transcription and stored in the university’s research folder. Transcriptions were checked by the R. Abrams against the recordings to ensure verbatim accuracy of all verbal utterances and that punctuation was used to preserve the original meaning (Braun & Clarke, 2006). The shortest interview lasted 38.54 min, and the longest lasted 65.47 min. The average interview duration was 50.13 min. The recordings were deleted at the point that the Stage 2 manuscript was accepted.
Pseudonymization
Pseudonymization was carried out by R. Abrams and followed the UK Data Service’s guidance for qualitative data (UK Data Service, n.d.). We used the following steps: (a) When possible, we did not collect disclosive data. For example, we did not ask for names of people, departments, universities, or companies. For cases in which a participant volunteered this information, we deidentified this in the transcript and indicated this. (b) We had intended to use the UK Data Service’s Text Anonymization Helper Tool (UK Data Service, n.d.) that runs MS Word macros to help find any disclosive information. However, information-technology security prevented this tool from being downloaded. Therefore, three members of the team (R. Abrams, E. L. Henderson, and E. K. Farran) reviewed all transcripts to identify any disclosive information instead. (c) Pseudonymization occurred once transcription was complete. The original, unedited version of the transcription was kept for use within the research team. (d) Finally, we replaced any identifying information rather than blanking it out. Replacements were clearly indicated using brackets. For example, “My colleague Indiana Jones” would have been edited to “My colleague [name].” We kept a pseudonymization log of any edits and an identifying key, stored separately from the pseudonymized transcripts.
Thematic template analysis
Broadly, the purpose of thematic analysis is to develop themes in the data set in relation to the research question (Braun & Clarke, n.d.). There are three main types of thematic analysis (Braun & Clarke, 2019, 2023) that appear on a spectrum from “coding reliability” that prioritizes coding accuracy to “codebook,” in which the coding structure is developed based on both the data and a priori theory, through to “reflexive,” which emphasizes “the inescapable subjectivity of data interpretation” (Braun & Clarke, 2021a, p. 37). Template analysis is a flexible form of thematic analysis that uses a hierarchical coding structure (Brooks et al., 2015). This style of thematic analysis was selected because it allows the theoretical underpinnings of the research, in this case, COM-B and TDF, to be used to develop a priori themes, but these themes remain flexible. The coding template is further developed based on a subset of the data and then refined and advanced as it is applied to the full data set (Brooks et al., 2015).
Where template analysis sits on the spectrum of thematic analysis depends on researchers’ epistemological position. Because in this research, we aimed to explore what factors influence researchers’ data-sharing behaviors, we adopted a critical-realist ontology assuming that a meaningful reality exists but that one’s experience of it is subjective and socially influenced (Braun & Clarke, 2013). The analysis was also underpinned by a contextualist epistemology: Contextualism aims to understand truth but views knowledge as contextually located and influenced by the researcher’s position, and therefore, truth is bound to the context in which data are collected and analyzed (Madill et al., 2000). This position is consistent with a data-focused approach to thematic analysis that acknowledges the active role of the researcher (Brooks et al., 2015). From this philosophical position, template analysis sits on the spectrum between codebook and reflexive thematic analysis and on the continuum between deductive (initial themes are established before coding) and inductive (themes are developed and refined through engagement with the data) thematic analysis.
The pseudonymized transcribed data were coded using the software package NVivo (Version 12). The template analysis followed Brooks et al.’s (2015) and King et al.’s (2018) guiding framework. Initially, template analysis is typically carried out on a subset of the data. The subset should capture the variety of experiences covered in the full data set; therefore, the precise number cannot be determined in advance. We anticipated that it would be a nonrandom sample of around five interviews; in actuality, we analyzed four. We used King et al.’s understanding of “codes” as comments linked to extracts of text, indicating that they are relevant to the research question. Codes develop into themes, and “coding” is the process of assigning codes and themes to the text.
Stages 1 through 4 below were carried out independently by R. Abrams and A. Marcu. Both R. Abrams and A. Marcu are experienced qualitative researchers. However, both would typically engage in inductive coding, and neither had experience working with COM-B and TDF. Therefore, both researchers discussed the analysis in two meetings and involved the wider team in shaping the findings write-up. Throughout the process, coding was discussed with all authors for the purpose of developing a richer understanding of the data.
In Stage 1, we anticipated an iterative approach to the development of the coding template based on the data. However, because the interview questions had been mapped to COM-B and the TDF, no refinement was required. To see specific details of the original coding plan, see the Stage 1 document (Henderson et al., 2023). For deviations from our original Stage 1, see Table 3.
Deviations From Our Original Stage 1
Note: COM-B = capability, opportunity, motivation–behavior; TDF = theoretical-domains framework.
Step 1: familiarization with the data
Familiarization was a key step because template analysis requires that extracts of text are interpreted in the context of their meaning within the participant’s complete account. The coders became immersed in the data by listening to the interview recordings and reading the transcripts while looking for meaning and patterns. Informal notes were made, for example, noting quirks and connections in the data and broadly what was going on in the data.
Step 2: preliminary coding
Preliminary coding was carried out based on what appeared interesting in the data in relation to Research Question 1. We used a coding template of initial themes (Version 1) developed a priori based on the COM-B constructs and TDF domains (see “Materials & Procedures” component on OSF, https://osf.io/w3sfq/files/osfstorage). Although in the Stage 1 we had anticipated removing or modifying a priori themes, we found that because the interview topic guide had already mapped questions onto COM-B and TDF, the data were characterized against this from the outset, leaving little room for inductive interpretation.
Step 3: clustering
As mentioned above, despite anticipating the grouping of codes and a priori themes into meaningful themes at Stage 1, in actuality, the COM-B became the themes, and TDF become the subthemes. Thus, the process became one of checking that data did not overlap, repeat, or duplicate across themes, similar to the intended purpose of sorting, collating, and combining similar codes into clusters of meaning to capture significant patterns in the data set.
Step 4: developing the coding template
Having identified clusters, themes, and their relationships, the coding template was applied across all data. Because we did not change the a priori template, there was not a Version 2.
Step 5: apply and modify the coding template
R. Abrams applied the Version 1 coding template to all remaining interviews and considered whether the themes captured the meaning of all interviews. No changes were made to the coding template during this process. Because of the nature of the interview-topic guide, the COM-B and TDF structure remained intact throughout the coding process. For cases in which data did not map directly onto the a priori codes, the codes were left as intended, until all analyses were completed.
Step 6: finalize the coding template
Steps 5 and 6 were not two distinct stages in the analysis because the coding framework left little to no opportunity for inductive coding. Therefore, the coding template was considered final at Step 4 and applied to all data.
Step 7: writing up
Findings are presented theme by theme. We had anticipated presenting the theme and subtheme in a table at the start of each thematic section. However, we instead report all themes, subthemes, and the corresponding barriers and enablers in a holistic table. In the write-up, we focus on reporting the (sub)themes most relevant to the research question. Vivid examples of themes that capture their core meaning are used to illustrate each of those themes.
Credibility strategies
As described above, coding was led by R. Abrams. A. Marcu coded a subset of four transcripts, and these were discussed with R. Abrams before the remaining data set was analyzed. We did not use consensus coding or interrater reliability because these methods are inconsistent with the philosophical assumptions that underlie more reflexive or codebook-types of thematic analysis (Braun & Clarke, 2021b). For example, interrater reliability assumes that there is a single accurate reality that should be coded in the data, whereas reflexive thematic analysis holds that the researcher is an active participant in meaning making and that codes are derived via a situated interaction between the researcher and the data (Braun & Clarke, 2013; Braun et al., 2019). The researcher’s subjectivity is embraced as a resource that “sculpts the knowledge produced, rather than a must-be-contained threat to credibility” (Braun & Clarke, 2021b, p. 334). Explicating a researcher’s motives, background, and perspectives via a positionality statement allows the reader to consider the researcher’s influence on data collection and analysis, thereby increasing transparency and rigor (Steltenpohl et al., 2023). We have provided prestudy positionality statements (see “Positionality” component on OSF, https://osf.io/d4sjk/files/osfstorage). A second positionality statement, in which R. Abrams reflected on how their assumptions and position might have shaped the coding process, was completed once the data had been analyzed and written up. In addition, to establish the rigor and dependability of the work, we shared the raw transcripts (of all 14 consenting participants).
Data availability
The study materials are available in the OSF repository, https://osf.io/w3sfq/. To help maximize adherence to FAIR principles, the pseudonymized transcripts were archived with the UK Data Service (Abrams, 2025). For further details on how we ensured adherence to FAIR principles see the “Data” component on OSF, https://osf.io/ejcp5/files/osfstorage.
Results
Following our coding template, we mapped themes to five of the six COM-B components (physical capability was not present in the interview schedule) and 12 of the 14 TDF domains (behavioral regulation and emotions were not present). Themes are presented below, supported with participant quotes, and summarized in Table 4.
Combined COM-B and TDF Analysis of the Influences on Data-Sharing Behaviors.
Note: COM-B = capability, opportunity, motivation–behavior; TDF = theoretical-domains framework.
Capability
Psychological capability
Capability is internal to individuals and reflects their ability to engage in a behavior (Michie et al., 2011). In our data, psychological capability was evident, reflected in the TDF domains of knowledge, skills, memory, and attention and decision processes (behavioral regulation was not evident). We did not ask participants about physical capability because we assumed that if they were physically capable of collecting data, they were capable of sharing it.
In STEM disciplines in which researchers worked with quantitative data, they were typically familiar with data-sharing expectations from funders, journals, and research communities. This familiarity appeared to enable sharing behaviors and, in turn, facilitated the knowledge for how to share data effectively. However, in disciplines such as social sciences and humanities and/or researchers working with qualitative data, expectations about data sharing were less established, encouraged, and recognized and therefore occurred less frequently or were harder to enact. For participants sharing data already, their decision processes about when and how to share was largely driven by the need for shared data to be both useful and usable.
Knowledge
The TDF domain knowledge refers to awareness of something’s existence (i.e., data sharing) and procedural knowledge—understanding how to do it (Atkins et al., 2017). In cases in which data sharing was reported as being more embedded (e.g., researchers do it as part and parcel of their role), researchers reported more knowledge about which repositories were available to them and what processes they needed to follow. This appeared to be a crucial step in enabling data sharing. For example, knowing where to go to gain the knowledge required and gaining this knowledge early in the research cycle enabled researchers to tailor their outputs and format them as part of the research process rather than trying to do it retrospectively or once the funding period had ended: “For us, a lot of it’s very clear, we know exactly what [repository] everyone uses,” said Jennifer.
When researchers perceived data sharing as a normal and integral part of the research process, then it was not seen as an additional activity but one that was incorporated into what they do as part of a project’s life cycle: So I guess I just see it as part of the normal publication process. So given that I have all this data anyway, it’s not that much of an extra step really, to you know, once you know where you’re sharing it or which repository you’re submitting it to, or whether it’s just a journal table, it doesn’t take too long. I think we’re pretty well set up to get all that sorted. (Amelia) My PhD students don’t even question it, it’s not even something that they’d have to think about. . . . As we develop code, I’m going to put it on GitHub. It’s part of that pipeline. (Jennifer) We don’t even consider open research. No, like we don’t even have that conversation. . . . It’s just how we do it. You publish your paper, you put your code and add the data in the repository. (Rick)
For individuals in social sciences/humanities and/or working with qualitative data, knowing where to share data and what their first step should be presented the initial barrier that was hard to work around. In these cases, researchers reported needing to think much harder about what to share and how because they did not have the background knowledge on which to act: I think there’s something that I have to actively think about and seek out and kind of think about what is the data that I could share and how could I make it available? I don’t feel that something that’s naturally in the process of my work at all. . . . I have no idea how to share it. Yeah. I’m still finding out how to do the first step and then hopefully, I’ll find out about the sharing but I know basically, nothing. (Lara) And then where do I put it? Am I creating my own archive? Where does that go? (Sophie) How do you develop the right Research Data Management Plan, and there’s sort of a muddle to me in that perspective, and it’s taking on a lot of my initiative as a researcher to go and find. (Beckham)
Broadly speaking, researchers across all disciplines felt that it was the combination of both knowledge and skill that supported data sharing. Therefore, in the theme below, we discuss associated skills that enable or prevent data sharing.
Skills (cognitive and interpersonal)
“Skills” refers to the ability or competence to perform a behavior, developed through practice (Atkins et al., 2017). Researchers predominantly reported finding it easy to acquire the skills needed for data sharing. They also indicated that related guidance on how to share data had increased over time, which had consequently enhanced their skill set: It’s probably quite simple to do. It’s just a matter of, you know, finding out how to do it and finding the time and pages to kind of go through the process. I wouldn’t necessarily say it’s, you know, it’s a tricky thing to do, that requires sort of complex skills. And, but it’s a matter of finding out how and putting the time into it. (Zainab) There wasn’t any guidance in terms of what to do with the data. So I just literally plugged it into my spreadsheet and it was fine. Which is interesting, because then 2 years later, I was submitting my second paper to the same journal, it was a completely different story. Because they had loads of guidelines. Again, the same thing, I plugged my dataset into whatever it was and then they go back to me saying like, “Oh, this is wrong. This is wrong. This is wrong. You should do the ABC,” and I was like, oh, okay, well, this has changed. . . . I can now describe my data in a way that makes it reusable. So I’ve put together like a README file where I describe my variables. (Zainab)
Taken together with findings in the knowledge theme, the majority of participants felt that should they need to, they had the appropriate skills to increase or enhance their ability to share data.
Memory, attention, and decision processes
This TDF domain refers to the ability to retain information, focus attention, and make decisions or choices (Atkins et al., 2017). For researchers making decisions about what data to share and when, they reported that this often happened at the outset of a project, largely driven by a compulsory section in a funding application about how data would be shared once the project was finished. This requirement focused researchers’ attention on data sharing at the beginning of the research process and encouraged them to prioritize data sharing in their project plans: It all has to come in the beginning. The plan that we’re writing for a bid right, which is what funders now [want], if you look I think most will have a section on and it will fall with ethics. (Jennifer)
However, for participants working with qualitative data, researchers reported that data sharing would sometimes be a retrospective decision. This might be because of aspects that arose or changed throughout the course of the project or because of circumstances that enabled data sharing, such as available funding or the importance of the data found: So it’s almost not until the end that you kind of go wow, okay, this is really important. Oh my God, I wish we’d done X, which was exactly where we found ourselves . . . and I think that is not something I would have anticipated at the beginning . . . in the messy reality of research, things change. . . . The data you think you want to share is different to the data you’re actually sharing all those things can morph but maybe you know what I know now I do the next project differently. (Sophie)
In some cases, data sharing was reported as being considered a lesser priority, one that did not drive the work but might be considered at a later point to leverage project data: So the first and foremost priority is the integration of research, as you know. So this is the most important thing and the completion of it correctly, ethically, successfully. These are important priorities to be considered, sharing data comes second to that. (Beckham) But it’s like if I can publish a dataset, how can I do it and what would the data be? It doesn’t drive, normally how I start the work . . . at some point when you’re thinking about okay, this is the experiment that we’re going to run and then it’s like, well, okay, what could we maybe do to leverage that data for a wider purpose? (Frisby)
Ultimately, when a decision was made to share data, it was done so with the need to make it both useful and usable: When we think of a shareable format, I guess the main thing is that just because you share the data, doesn’t mean that it’s usable by the people. So if you’re using headings, you know, what are those headings? When we have code, like how are we annotating that code, right? So I can share code very easily, but that’s not usable for people. . . . But if you’re going to put in the repository, then you have to clean it up a little bit. You’re going to put some comments on it. And actually, you know, you can always read almost out of self-interest. It doesn’t need to be because of other people. (Rick)
Having data that were both useful and usable required attention to detail and careful planning from the outset.
Motivation
In the COM-B model, motivation has two subcomponents. Reflective motivation involves conscious processes that influence behavior (Michie et al., 2011). In our data, this was demonstrated through all six associated TDF domains: beliefs about capabilities, beliefs about consequences, optimism, intentions, goals, and social/professional role and identity. “Automatic motivation,” in contrast, refers to unconscious processes that drive behavior. In our data, this was demonstrated by reinforcement, and the TDF domain emotions was not evident.
Several researchers discussed whether data sharing was driven by self-interest or altruism. Many acknowledged that regardless of whether they engaged in data sharing, there was not much reward or recognition, which at times made it harder to prioritize or meant that researchers were left with a feeling of “could do better” at data sharing. Many also acknowledged that although confident in their existing skills, it was important to get data sharing “right,” especially when the need to deidentify data was involved.
Reflective motivation
Beliefs about capabilities
Beliefs about capabilities relates to self-confidence, perceived competence, self-esteem, and professional confidence (Atkins et al., 2017). Researchers largely believed that their capabilities for data sharing could be enhanced with practical support, including training. However, researchers mostly believed that if they were engaging in data-sharing activities, this would either be on top of existing workload or undertaken in a researcher’s own personal time. Thus, although the wider organizational culture might encourage data sharing, applying it in practice might present a conflict because it is not an activity that is typically prioritized: Data sharing is not how researchers are judged, there’s no route to promotion, if you will, or even assessment of how a lab is organized. That isn’t given the priority, it’s just purely day to day to get the publication and that, sadly, is how academic research is predominantly judged. And so therefore, the stuff on the side you either do it out of hours on your own, or it doesn’t happen, which I think is terrible. I’m constantly battling it, especially because I do believe in it strongly. But I have to do it largely in my own time. (Michael)
To this end, several researchers believed data sharing to be an administrative burden that did not have any support channeled into it. For it to be taken seriously across the board, in some cases, it would need to be mandated: I guess it would need to come from above. And it would either have to be something that’s mandated. Everybody has to do it, therefore everyone does it. Or it will have to be strongly encouraged by you know, [principal investigator’s] leadership, etc. Obviously, you know, it could be that it becomes part of the culture as such, but I think there are other pressures on researchers in terms of you know, progressing their careers, etc. And I don’t think that data sharing gives us every one of those things that are going to win unless it gets mandated. (Zainab)
Most researchers reported feeling confident that data sharing was a doable part of any project with the right resources in place, especially if considered and implemented at the start of a project (Eric: “It’s not a particularly hard thing to do, but you’d have to do it at the outset”). They felt confident in their skills set if they had been sharing data for a while, and for participants that had not, they felt confident in their ability to find out which steps to undertake and who they might need to speak to for help.
However, they also acknowledged that it was an activity that they could also be better at and one that may suffer when it competes with other academic priorities: “I think we probably do more than the bare minimum, but not the most,” said Jennifer. Michael said, “Do I do enough of it? No, I don’t, I’m afraid. I’m kind of under constant pressure to continue evolving the research.”
Overall, the beliefs that researchers held about their capability to engage in data sharing were often a product of their working environment (see opportunity: physical opportunity theme).
Beliefs about consequences
This TDF domain relates to accepting the truth about the outcomes of a behavior, including outcome expectancies and potential regret (Atkins et al., 2017). Across the board, researchers felt that what data they shared and how they shared it required careful consideration because of the potential consequences of sharing it. For example, several participants made decisions not to share data that they considered was either messy (e.g., self-taught coding) or high risk (e.g., potentially distressing in vivo images): I didn’t share some of it because it’s all very basic. So you’d be just clogging up your GitHub with like, how to do a normal plot or something. So I think some of it needs more thought in terms of which parts of my code are really needed for people to reproduce what I’ve done and which parts of my code is something they could do in 3 seconds and much better. (Amelia) So if I’m doing in vivo work, and there’s been animals used to collect it. The anti-vivisection movement is something that is quite scary. And to do any in vivo work, you do have to be a bit careful . . . not everyone believes in the benefits of in vivo research. (Michael)
Typically, researchers wanted data sharing to contribute something novel and/or helpful to the discipline at hand.
In addition, participants working with qualitative data reported feeling very cautious about deciding to share data because of its very nature (i.e., needing to make decisions to protect the anonymity of their research participants and consider the consequences of not doing this properly): So I’m . . . not that 100% open to the sharing of data for the sensitivity of data that we have, or for the confidentiality on participants, their ideas, their views, you know, personal information, personal, personally identifiable information, which is really important for some not to be shared. (Beckham) So we, I, work in a very psychologically unsafe world for people who talked to me. So there was huge caution about the data being made public in any way because it might come back to bite them, and they might lose their jobs. Given that we want them, I want them, to tell the truth. Yeah. And they have to trust me. And they have to trust me a lot, and the other interviewers to give that data and trust that I will do the right thing with it and not throw them under the bus by not deidentifying it correctly. (Sophie)
All researchers working with data capable of identifying the participant reported holding concerns about how to share while still protecting participants’ or patients’ identity. Some researchers felt concern about aspects of their data (i.e., it being imperfect, too basic, or containing a bug or mistakes) and of needing to manage risk (i.e., checking the data for identifying details). Other researchers were concerned about the time and resources it might need. Other researchers still had concerns about what others might do with their data (i.e., the risk of data being sold, e.g., to Google), repositories that contain data disappearing as a result of lack of funds, having their ideas stolen, or data being misinterpreted: A bit cautious I would say probably sums it up. I think. I know, it is seen as good practice and indeed, highly desirable for publicly funded projects. And of course, I do agree with that where you’ve had public money to generate evidence. I would be fully behind that being available to others. And of course, we want open research. We don’t want data hidden where it can’t be seen and where it can’t be interrogated by others. That’s where kind of perhaps mistakes get made or even worse, you know, people might draw erroneous conclusions from data and falsify data. So broadly, I think having data open is a good idea. But and I guess there’s a big but for me, from my perspective is that that comes with quite a lot of both responsibility and what I’d probably call administrative work, although it’s not only administrative work, there’s quite a bit of ethical thinking and work that has to go on there. (Sophie) I would say with data sharing I’m afraid of the biggest trouble with sharing is ensuring that I’m not unintentionally leaking personal data, this would be a huge trouble. (Leonid)
Data sharing was therefore seen as a vulnerable activity because it might expose researchers or errors in their work if not archived correctly.
Optimism, intentions, goals, social/professional role and identity
In this theme, we report on four TDF domains together because they are interrelated. “Optimism” refers to confidence (or lack thereof) that things will work out well or that goals will be achieved. Intentions involve conscious decisions to perform a behavior or act in a certain way. Goals are mental representations of the desired outcome a person aims to achieve. Social/professional role and identity encompass the behaviors a person adopts in social or work settings (Atkins et al., 2017).
Researchers felt that data sharing and certainly the ethos of sharing was something they considered part of their identity as a researcher. Although data sharing was not an activity they engaged in every day as part of their role, the guiding principle of being open and willing to share was something all researchers considered integral to their work. Researchers also felt that data sharing was not the only way to be open and that ensuring their articles were open access was another way in which they upheld this principle.
Researchers talked about having the goal of sharing all their data. They discussed wanting their intentions for sharing their data to be driven by morality rather than self-serving goals. They also discussed including data sharing in future grants when they had not previously while also acknowledging that it was either something they just had to do or wanted to do even if the wider system (i.e., appraisals, recognition, rewards) did not always acknowledge it: It wasn’t something that was prioritized by anyone. So I would like this to become part of my practice as kind of being a good scientist doing the right thing. If you know, I mean, I don’t know whether my internal motivation would follow that. But when it comes to kind of being a good scientist, I would like to tick that box and share data because you know, it’s the right thing to do. (Zainab)
Even when researchers believed that sharing data was self-serving and they operated within a discipline in which data sharing was embedded, this was generally under the premise that it still helped to further the field or led to increased opportunities to collaborate on other researcher projects. Thus, there existed a degree of optimism that the consequences of data sharing could be good for both a researcher and the wider community if done well: So I do think that leads to you know, higher citation rate and more opportunities to collaborate with people and work with them on different projects. I think I get invited to more . . . proposals or conferences or, you know, people ask me if I can share something with them, . . . but I think that in general, yeah. It’s very positive. (Amelia)
However, individuals who did not have their data in the public domain but had stated in their publications that data were available on reasonable request mentioned they had rarely, if ever, been asked to share the data. Only one participant mentioned being contacted for data sets. Most researchers stated that users of their open data would most likely be colleagues within the wider research community but that in reality, it was unlikely their data would be used by anyone else. This assumption did not affect their motivations to share their data. Thus, it may be that data sharing as a behavior signals more to the wider research community about the type of researcher one is (i.e., open and transparent and the type of values the researcher holds).
Automatic motivation
Reinforcement
“Reinforcement” refers to rewards, incentives, and punishments that increase or decrease the probability of a behavior occurring (Atkins et al., 2017). Several researchers felt that despite data sharing being encouraged and supported in some fields (i.e., by funders and journals), there was not always external reinforcement before or once data had been shared. Some researchers felt that they could write data sharing into a data-management plan and then not action it because no one else followed it up or checked it. Others felt that there could be only personal gain from sharing data (i.e., it was self-interest to boost citations and reach). Others still felt that if data sharing were to be taken seriously, then it would need incentivizing through financial support; that it relied heavily on the goodwill of scientists or was altruistic/the right thing to do; and that it was an activity that competed with other, more pressing priorities and so would drop farther down the list because of not being enforced: Well, I don’t think there are any incentives for it. You do it because you want to do it, or you’re forced to do it, you know, because of X, Y, Z right? I mean, if your funders force you or whatever, then people have to do it. For me we just do it and that’s it. Nobody incentivizes it, but it is taking from your time isn’t it? But it is the right way to do it. And as I said before, it might be self-serving too, because when you go back to your code a few years down the line, then if everything was clean and annotated, then that’s got to be better for you. (Rick) It’s the stick rather than a carrot. . . . I guess my internal motivation could be an incentive. But when it comes to anything offered by the institution, or the school, there isn’t really anything that would incentivize me to share data, and if it wasn’t . . . for the journals requiring it, I would probably not be knowing about data sharing at all. (Zainab)
Regardless of whether there was reinforcement or incentivization to share data, researchers engaged with data sharing primarily because of their reflective motivation as opposed to automatic motivation.
Opportunity
Opportunity is external to the individual and encompasses the physical environment and social systems (Michie et al., 2011). In our data, both the TDF domains of social influences and environmental context and resources were present. Typically, researchers felt that data sharing was encouraged in principle but that in reality, very little institutional support was offered in terms of resources to facilitate a broader culture shift. Researchers did not make any specific references to how social opportunities facilitated or hindered their data-sharing behaviors above and beyond there being clear messages from heads of departments or schools that data sharing and indeed open research more generally was a practice all researchers should embrace.
Social opportunity: social influences
“Social influences” refers to interpersonal interactions that may lead people to modify their thoughts, feelings, or behaviors and includes social norms, group conformity, power, and modeling (Atkins et al., 2017). When referring to social influences, participants rarely talked about the influence of specific people (i.e., team members or colleagues). Some researchers referred to more nebulous examples, such as the expectations within any given research community. This included reference to an assumption that certain research communities expected data sharing to be undertaken and that therefore, researchers who share their data were perceived as more trustworthy: There are some people who still keep their proprietary data for a very, very long time, but I think it’s so frowned upon . . . people are much more willing to trust you and work with you if you’re more open about what you’re doing, I think. (Amelia)
However, in other fields in which there were fewer expectations around data sharing or it was less common (i.e., social sciences), some researchers felt that there was very little point in sharing data because no one expects it or checks it: Sometimes it’s probably the carrot or the stick and I suspect a bit more stick, a bit less carrot would actually do the job quite honestly. I do follow the rules as much as I humanly can and so if somebody said, “No, you’ve got to do this and we expect you to do this,” then I’ll do it. But if it’s less work, and not a requirement, then I’ve got to see something in it for me. (Frisby)
There were also instances in which data were either too commercial or confidentially sensitive, which presented a barrier to being able to share data: The employers actually came back and said, no, sorry, they’re working for this company and they are a direct competitor and that is our IP so no. So actually, in that case, I was kind of handcuffed because it was a company funded position and we were doing research for them and they were not comfortable sharing that knowledge. (Michael)
Overall, researchers spoke more about the external factors that influenced their ability to share data or not, as discussed below.
Physical opportunity: environmental context and resources
This TDF domain relates to circumstances in a person’s environment that either encourage or discourage development of skills and adaptive behaviors, such as (material) resources or organizable culture (Atkins et al., 2017). The lack of necessary infrastructure or resources led to challenges when trying to engage in data-sharing behaviors. This most commonly arose in conversation with researchers working with large data sets, in which hosting data was often prohibitively expensive for universities and not always supported by journals either: It’s very expensive to host and maintain on the university server. So usually it ends up with us hosting them elsewhere because frankly, we don’t have the funds to buy that kind of backup and storage space from the university itself. (Amelia)
Several researchers acknowledged that cost for large data sets or indeed, cost for support (i.e., researcher time) was not always considered by individuals preparing grants: They won’t put in a bid and forget to buy the mass spectrometer or the lab instrument they need. They won’t forget to charge you know, for the consumables and oratory, they won’t forget to charge for what they need to go out and do interviews. I think they can forget to charge for data management, open data management and the resources and expertise they need. They can forget to charge somebody’s time to do that. (Eric) For example, you’ll put some money aside putting it onto a certain safe data resource database that has a charge for doing so. But you wouldn’t count, consider having to pay someone to do that. And you don’t always have someone in your team who knows how to deal with that . . . and it’s not until the day that you have to upload but you realize no one on the team knows how to use this platform. (Jennifer)
Time, or rather the lack of it, was also often seen as a barrier. Researchers across all disciplines felt that resources (i.e., cost, time, staff members) would need to be specifically allocated to the activity by the university, funders, and journals to truly embed data sharing.
Concrete examples of how data-sharing behaviors were enabled physically included the implementation of funding guidelines and reporting templates specifically relating to data sharing, data-sharing policies from funders and journals to set out expectations, data-sharing plans in funding applications, specific requirements to make data open (i.e., biological data), public places (e.g., observatories) putting time limits on how long data could remain private to the team/researcher who collected it (i.e., propriety), and providing specific spaces to host data: Definitely in the last 10 years, I’d say there has been more of a focus on demand from journals and our funders to make data available. Not all data, but some. And it’s no longer a question of whether we want to or not, it’s a requirement. It’s a requirement from funders as well. Some more than others. Charities, for example, don’t tend to necessarily make that requirement but the UK is moving into it. . . . It’s become more common practice. (Jennifer)
Thus, data sharing and the behaviors associated with it appear to be context- and discipline-specific, which may be an important consideration when implementing policies, guidelines, and support for researchers as a physical opportunity to foster data sharing.
Discussion
We interviewed a range of researchers across disciplines and career stages about their experiences of data sharing. Findings indicate that quantitative data-sharing behaviors were performed differently to qualitative behaviors, which affected the required skills. For example, researchers in STEM had noticed a definite culture shift toward data sharing among funders, journals, and their research communities. This was enabled through guidelines, specific sections on funding grants requiring data sharing, and journals requiring data. The culture shift had increased knowledge, awareness, and skills for this group of researchers, allowing data-sharing behaviors to become routine regardless of motivation and opportunity. However, this was not the same for researchers working with qualitative data; these researchers felt they lacked the knowledge about how, where, and indeed why they might share their data.
Findings indicated that the motivation of researchers to carry out data-sharing activities could be both self-serving and altruistic. Although many researchers felt that the data they had already shared or could/might share would not necessarily be used by others, the motivation to contribute to open research was an enabling factor. This could be because although potentially vulnerable and exposing, it signaled a certain identity and associated values about them as researchers (i.e., as someone who values transparency and openness). However, some researchers held concerns about being too open and the consequent risk of ideas being scooped or mistakes being found.
Findings also indicated that data sharing is context-dependent, that is, there are physical, environmental, and social opportunities that can both help and hinder it. The barriers most commonly associated with the physical or social opportunity to share data were similar across all disciplines. These barriers included a lack of time to undertake data-sharing activities, concerns over General Data Protection Regulation/correct deidentification of data, and limited infrastructure to host large data sets and the expense associated with this.
Of the six key data-sharing behaviors described in the introduction, researchers felt they had the capability to seek out skills and resources but that seeking out, preparing, managing, and depositing the data were all constrained by the physical opportunity to do so (i.e., lack of time and resources). Data-management plans were known and implemented by some but not all, and this was the same for securing ethics with a view to sharing data—not all researchers carried out behaviors to facilitate data sharing (i.e., preparing participant information sheets with a view to sharing data). This was because it was not always expected or common. Our findings accord with existing research in this respect—a lack of resources, including physical opportunity (time, funding, and in our case, infrastructure), is the most frequently reported barrier to data sharing (Astell et al., 2018; Chawinga & Zinn, 2019; Cheah et al., 2015; Farran et al., 2020; Hostler, 2023; Houtkoop et al., 2018; Long et al., 2020; Tenopir et al., 2011; Van den Eynden et al., 2016). Not sharing data because it is not expected or is uncommon within the field was also a finding previously identified (Houtkoop et al., 2018), and in our study, this was particularly the case for researchers with qualitative data. All researchers discussed their concerns or fears, including compromising confidentiality, working with sensitive data, reputational harm or risk (i.e., messy data, being scooped), or the misinterpretation or misuse of their data (Bezuidenhout & Chakauya, 2018; Cheah et al., 2015; Gomes et al., 2022; Sayogo & Pardo, 2013; Soeharjono & Roche, 2021; Tenopir et al., 2015; Van den Eynden et al., 2016). Although these concerns did not always prevent data sharing, they influenced how, what, and why researchers shared their data.
The ability to share data and the factors that enabled it included available guidance; access to infrastructure, including a repository; factoring in funding allocations; and having the necessary knowledge and skills. These factors were present in our data set and existing literature (Kim & Zhang, 2015; Sayogo & Pardo, 2013; Van den Eynden et al., 2016). However, researchers did not feel that data-sharing behaviors were recognized or incentivized. This aspect appeared to be overshadowed by a lesser discussed finding in previous literature and one that our study identifies: Researchers are driven to be seen as open researchers. This identity matters to them for both the good of research and their discipline and what it signals about them. It is a key enabling factor, potentially driving behavior even in the absence of other factors. This could be an interesting finding to expand on not only with participants without relevant knowledge of data sharing, who were excluded from our study, but also participants for whom barriers and enablers might present differently.
In this qualitative Registered Report, we used COM-B and TDF frameworks to identify data-sharing behaviors and determine these deductively. Although our methods supported inductive findings also being identified throughout analysis, it is likely that because of how the interview schedule was organized (i.e., aligned to COM-B and TDF), deductive analysis took precedence. Furthermore, although the interview schedule covered almost all COM-B constructs and TDF domains, physical capability was not included because it was assumed that if participants had the physical capability to collect data, they also had the capability to share it. This may have precluded this as a finding. This may warrant further research in this area using a more inductive approach.
Overall, participants believed that data-sharing activities could be mandated by institutions to enable more widespread behaviors. However, these activities need to be both discipline-specific and supported by institutions providing adequate resources (e.g., time, recognition, infrastructure, and support). For researchers working with qualitative data, energy could be invested into raising awareness of the benefits and practicalities through appropriate training and upskilling. Researchers themselves could consider embedding data-sharing behaviors from the start of a project (e.g., in data-management plans, consent forms, and research proposals) rather than treating it as an afterthought. However, data sharing should not be done without careful consideration of the implications on participants, researchers, and universities.
