Abstract
Introduction
In the past, researchers have relied on self-reports to measure social media usage (Griffioen et al. 2020). This approach has been criticized due to the cognitive burden that complex survey questions about social media usage impose on respondents, which can lead to unreliable answers (Haenschen 2020). Several studies confirm that users' estimates of the frequency of use of and time spent on social media do not align with more objective, direct behavioral measures (Ernala et al. 2020; Junco 2013; Mahalingham et al. 2023; Parry et al. 2021). These behavioral measures come from so-called digital traces, byproducts of individuals' interactions with digital systems, such as web browsers and smartphone apps (Keusch and Kreuter 2021; Stier et al. 2020).
Different methods are available for collecting individual-level digital trace data, with most of them falling into one of three categories (Breuer et al. 2020; Ohme et al. 2023). First, application programming interfaces (APIs) afford researchers direct access to data from social media platforms. However, the platforms are increasingly restricting API access, if there ever was one. Access to the Facebook API was discontinued after the Cambridge Analytica scandal in 2018 (Bruns 2019). Recent changes to the APIs of Twitter and Reddit have shut down free access to social media data for most researchers (Calma 2023; Ledford 2023; Sarraf 2023).
A second option for collecting individual information on social media usage is to ask users to install a tracker on their device(s) that continuously logs what URLs are visited and what apps are used (Christner et al. 2021). While this approach allows for an unobtrusive, continuous collection of digital trace data, many users express privacy concerns and report low willingness to install trackers (Keusch et al. 2019, 2021; Makhortykh et al. 2021; Revilla et al. 2019; Wenz et al. 2019) with systematic nonparticipation leading to biased samples (Gil-López et al. 2023; Keusch et al. 2022).
Recently, researchers have asked users to download their retrospective digital trace data from social media platforms and share them with the researchers, an approach now commonly referred to as data donation (Boeschoten et al. 2022a, 2022b; Halavais 2019; van Driel et al. 2022). Little is known about how to best implement data donation and how willing users are to share their data in this way. We contribute to the growing methodological literature on data donation with empirical evidence from Germany, answering research questions around the willingness and success to donate Facebook data and bias that stems from not donating data.
Data Donation as a Novel Form of Digital Trace Data Collection
Data donation takes advantage of legislation such as the European Union’s General Data Protection Regulation (GDPR) (EU 2016).
1
This kind of legislation grants individuals the general right to receive the information held by a
For the researcher, the advantage of data donation lies in not having to rely on a commercial platform providing an API for structured access to data. Instead, they can go directly to the users to access rich individual data, including information that typically cannot be accessed via trackers (e.g., behavior within an app), and that can be further linked to information participants provide in a survey (Ohme et al. 2023). Compared to trackers that need to be installed on all devices to receive a complete picture of a user’s behavior on that platform (Bosch and Revilla 2022), DDPs include information on all activities regardless of the device used (Breuer et al. 2022). For the participant, data donation provides more control over what data are shared with the researcher (Ohme et al. 2023). Other than in the case of a tracker, which, once installed, continuously collects data in the background of a user’s device, participants actively donate their existing, retrospective data. Given the explicit informed consent, this approach is considered a more ethical way of collecting digital trace data (Halavais 2019).
Nevertheless, data donation also faces challenges as it requires participants to perform several steps that can vary by data source and research project (see Boeschoten et al. (2022a, 2022b) for a general data donation framework description). Depending on the complexity of the task and the participants' technical savviness, researchers need to thoroughly guide participants through this process (Ohme et al. 2023; van Driel et al. 2022).
First, participants need to request their data from a platform, usually by specifying the type of DDP as well as the format (e.g., HTML, JSON, CSV) and the timeframe. The requested DDPs are then made available for users to download to their computer. van Driel et al. (2022) describe this process for data from Instagram, and Silber et al. (2022) report how to access health data from various smartphone apps. In a next step, study participants need to share the data with the researchers. Different approaches have been used for this step. For example, Silber et al. (2022) asked participants to share their DDPs via a commercial file transfer platform. Other researchers have commissioned market research companies to collect social media DDPs in person (Breuer et al. 2022; Kmetty and Németh 2022). Recently developed data donation platforms streamline the process and incorporate additional measures to preserve the participants' as well as any third parties' privacy. Platforms such as OSD2F (Araujo et al. 2022) and PORT (Boeschoten et al. 2022a, 2022b) allow participants to review and curate their data before donation, and they automatically strip all personally identifiable information from the data in a process locally executed on the participant’s computer before the data are shared with the researchers.
Willingness to Donate Digital Trace Data for Research
Concerns about the potential sensitivity of digital trace data and the rather cumbersome process involved in data donation raises questions about the feasibility of this approach, willingness to donate data for research, and systematic nonparticipation bias. So far, little research exists on these questions. Ohme et al. (2021) asked respondents in a Dutch web survey to take screen shots of the Screen Time function on their phone and upload them in a web survey. Out of 404 survey participants, 76% agreed to do so. van Driel et al. (2022) recruited adolescent Instagram users in the Netherlands out of a larger study. Of the initial 388 participants, 74% obtained parental assent for Instagram data donation and 38% gave informed assent. Silber et al. (2022) asked 2,040 smartphone owners in a web survey in a German online access panel to donate their health app data. Twelve percent of iPhone owners and 14% of Samsung smartphone owners consented to data donation. Most recently, Pfiffner and Friemel (2023) asked 833 members of a Swiss online access panel about their hypothetical willingness to donate data from various platforms. The mean willingness to donate data was rather low (between 2.9 and 4.4 on a 7-point scale). With our first research question, we extend the existing research on data donation providing new evidence on:
How willing are Facebook users to donate their data in a web survey?
As part of the investigation into the willingness to donate Facebook data, we are also interested in the effect of the framing of the data donation request. Building on Prospect Theory (Kahneman and Tversky 1979, 1984), some studies found that survey respondents are more likely to provide additional data (Kreuter et al. 2016; Tourangeau and Ye 2009), if the request was framed as a loss (i.e., the survey data would be less valuable to the researchers without the additional data) compared to a gain (i.e., the survey data would become more valuable to the researchers with the additional data). We investigate the effect of the framing used in the data donation request:
What effect does the framing of the data donation request have on willingness to donate?
While willingness to donate might be driven by factors such as concerns about privacy and attitudes toward the entity that requests the data donation, the success of the donation process itself is likely influenced by how cumbersome the process is. The empirical studies using data donation approaches show a stark difference between stated willingness to donate and actual donation behavior. Ohme et al. (2021) found that out of the 307 respondents who indicated willingness to donate smartphone screen time information, only 15% successfully donate their data. In the study by van Driel et al. (2022), out of 148 who had assented, 102 participants (69%) eventually uploaded 110 useable Instagram DDPs to a protected server at the researchers' university. Silber et al. (2022) showed that only three and less than 1% of iPhone and Samsung smartphone owners successfully donated their health app data to the researchers through a commercial file transfer platform. We thus also investigate the question:
How successful are Facebook users donating the data?
The share of people who donate data is only one of two components determining whether the results of a study suffer from bias. The other factor is whether participants and nonparticipants differ systematically from each other in the variables of interest (Groves and Peytcheva 2008; Tourangeau 2017). So far, studies using data donation have found mixed results on differences between donors and non-donors on socio-demographics, user behavior, and privacy-related concerns. Ohme et al. (2021) found that mobile phone savviness was a significant predictor of the stated willingness to donate smartphone data, but not gender, age, education, privacy concerns, and privacy literacy. Mobile privacy literacy was correlated with actual data donation. van Driel et al. (2022) showed that those who donated Instagram DDPs were more likely to be female and following a higher educational track compared to those who did not. Silber et al. (2022) report that older age, lower education, higher health app usage, and experience with privacy intrusion reduced the likelihood to donate smartphone health data. In the hypothetical willingness study by Pfiffner and Friemel (2023) positive attitudes toward data donation, the donation purpose, and its perceived relevance were related to greater willingness. The lower the perceived sensitivity of the data and the higher the perceived behavioral control over the data donation process, the higher the willingness to donate. While younger respondents expressed higher willingness to donate, gender, education, and frequency of use of a platform did not influence willingness.
With our study, we expand the current literature on data donation by not only exploring bias regarding socio-demographics and privacy concerns but also substantive Facebook use measures:
What bias does arise from selective willingness to donate and successful donation of Facebook data?
Methods and Data
Survey
We implemented a data donation request in a web survey among Facebook users in a non-probability online panel in Germany in December 2021. To be eligible to participate, panel members had to report using Facebook at least several times a month in one of the three earlier waves of the study. All participants in the longitudinal study had a meter installed on at least one laptop/desktop or one mobile device. 2
In the web survey, respondents were asked questions about Facebook usage behavior, generalized trust, trust in university researchers and Facebook, and general privacy concerns (see Online Appendix A). Respondents received 0.5 Euros worth in panel points on completing the web survey.
At the end of the survey, the 1,092 respondents were asked whether they would be willing to continue with another task as part of the study. While we did not immediately reveal what the additional task would be, we informed respondents that it would take five more minutes and that they would receive an additional 2 Euros worth in panel points. 3 To continue with the task, participants had to have access to a desktop or laptop computer. A total of 913 participants proceeded with the task, constituting the analysis sample for this study (see Table B1 in the Online Appendix for descriptive statistics of the sample).
Data Donation
Those who proceeded were then informed about the Facebook data donation task. We explained to participants that we would like them to download two DDPs from their Facebook account containing the following information from the past three months: (1)
All other participants were guided through the data donation process via multiple pages including screen shots and detailed instructions (see Online Appendix A). Respondents were instructed to go to https://facebook.com/dyi and download two separate DDPs (“Security and login information” and “Your themes”) for the past three months. After downloading the two zip files, participants needed to temporarily store them on their computer. To be able to link the Facebook data back to the survey data, participants received a randomly generated four-digit, alphanumeric code that they had to manually append to the file name of both DDPs. The participants were then instructed to go to https://facebook-data-donation.de/upload and share the two DDPs with us. The upload site was built on the OSD2F platform (Araujo et al. 2022) and allowed participants to review the donated data and delete specific entries before uploading.
Back in the web survey, participants were asked whether they had successfully donated both data packages and if not, why not. Participants who reported trying to donate the data packages but were not successful, received 0.50 Euros in panel points.
All procedures of the study were approved by the Ethics Commission of the University of Mannheim (EK Mannheim 49_2021).
Analysis Strategy
To answer RQ1, we calculate the share of respondents willing to donate their Facebook data over all respondents who started the data donation module of the study (i.e.,
Results
Regarding RQ1, we see that out of 913 eligible survey respondents, 725 (79%) indicated that they are willing to donate their Facebook data (see Figure C1 in the Online Appendix for the participant flow). The type of framing (RQ1a: gain vs. loss) had no significant effect on the willingness to donate (Χ2 = .024, df = 1,
Among the reason for not being willing to donate (
We received 722 individual data packages. Not all donated data packages included a valid ID, thus we had to link the data based on a combination of the time stamps available from the web survey and the data donation platform and available IDs (see Online Appendix E for our linking strategy). A total of 684 donated data packages could be linked to 345 survey participants (remember that participants were instructed to donate two DDPs), that is, 48% of those who were willing to donate successfully donated at least one data package (RQ2). We could not link the remaining 38 donated data packages to individual respondents.
Out of the 41 participants who answered the open-ended question on why they did not donate, 83% reported some type of technical problem when downloading the DDPs, extracting them on their computer, or uploading them to the donation portal (see Table D2 in the Online Appendix).
Figure 1 visualizes the results of two logistic regression models predicting stated willingness to donate Facebook data and successful donation given willingness (RQ3; see Table F1 in the Online Appendix for full regression results). Women had a seven percentage points (p.p.) lower average predicted willingness to donate compared to men. Respondents with higher trust in researchers were significantly more willing to donate compared to people with lower trust (+1 p.p. for each point on the 11-point trust scale). We found no significant correlation between willingness and general trust, trust in Facebook, privacy concerns, frequency of Facebook use, age, education, and framing ( Average marginal effects and 95%-confidence intervals for estimates predicting willingness to donate and successful donation.
The probability for successful donation, given willingness to donate, increased by two percentage points for every additional point on the 11-point trust in researcher scale. At the same time, donation was significantly less likely to be successful for respondents with lower trust in Facebook (−3 p.p. for each point on the 11-point trust scale). We did find no significant effects for general trust, privacy concerns, age, gender, education, frequency of Facebook use, and framing on donation success (
Discussion
In summary, our study on the willingness, success, and bias in Facebook data donation reveals the potential but also the current challenges of this novel approach of digital trace data collection. Eighty percent of the German web survey respondents in our study indicated willingness to donate Facebook data, which is in line with earlier research showing relatively high initial willingness (Ohme et al. 2021; van Driel et al. 2022). Whether we used a gain or a loss framing in the data donation request did not make a difference. While general privacy concerns were not predictive of willingness to donate, the answers to an open-ended question reveal that privacy was a major driver for not being willing. Along the same lines, the higher the trust in researchers the more likely individuals were willing to donate. Thus, clearly communicating with potential participants how their data will be used, giving them agency to review and, if necessary, delete certain data points before the donation, and generally increasing trust seems key for researchers who want to implement data donation. We strongly recommend researchers to use designated data donation platforms that allow for a privacy-preserving participation flow (Araujo et al. 2022; Boeschoten et al. 2022a, 2022b).
The generally high stated willingness stands in stark contrast to a lower success rate of donation: eventually only one third of the web survey respondents in our study donated their data. Once the decision to donate is made, privacy concerns seem to become less of an issue. Participants who initially indicated their willingness but then did not donate reported that they mainly had technical issues with the data donation process. These results are in line with earlier studies that used slightly different, but similarly complex approaches for data donation (Ohme et al. 2021; Silber et al. 2022; van Driel et al. 2022). Apparently, the process to request the data from the Facebook platform, store them locally, unpack, and rename the files, and then donate them via another platform was too cumbersome for many individuals. We also failed to link some of the donated data back to the survey responses, because some respondents did not provide their assigned ID when donating. For data donation to be useful to answer substantive research questions, researchers need to make the process of accessing and donating data as seamless as possible. We are hopeful that the participation flow in data donation studies will become less cumbersome over time. Recent approaches for better technical integrating of data donations into web surveys are promising (Haim et al. 2023).
One particularly interesting finding is that individuals who expressed lower trust in Facebook were more successful in donating their data, but not more willing to donate. A possible explanation for this finding is that users who are skeptical about Facebook were especially motivated to go through the data donation process to learn more about what information Facebook had about them. The data donation task could thus provide intrinsic motivation for people who did not even know that they can access their own data, and researchers might use this knowledge in future studies to motivate respondents. Whether this finding can be generalized to other social media platforms needs to be further tested.
Another promising finding of our study is that donors and non-donors did not differ in self-reported frequency of Facebook use, indicating no bias in this substantive measure. Data donation seems like a promising approach to collect digital trace data both for methodological research, for example, when the donated data are used to study bias in metered data (Cernat et al. 2023), as well as for studies with substantive research questions on social media use. Since bias depends on the concept that ought to be measured with donated data, future research should investigate the effect of nonparticipation in data donation on other measures of Facebook use (e.g., engagement with certain topics).
Notwithstanding the interesting findings, our study has limitations. First, we requested data donations from members of an online access panel, people who regularly provide their data for research, in this case not only via web surveys but also through passive data collection using a meter. We thus have to assume that the willingness to donate data in our sample is an upper bound and will likely be lower in the general population. Consequently, the effects of trust in researchers and concerns about privacy on willingness might be underestimated in our study. For practicality reasons, we limited the data donation request to two data packages from one specific platform. Whether data from other platforms (e.g., Instagram, WhatsApp) and with other content (e.g., personal interactions) yield similar results need to be further studied.
Supplemental Material
Supplemental Material - Do You Have Two Minutes to Talk About Your Data? Willingness to Participate and Nonparticipation Bias in Facebook Data Donation
Supplemental Material for Do You Have Two Minutes to Talk About Your Data? Willingness to Participate and Nonparticipation Bias in Facebook Data Donation by Florian Keusch, Paulina K. Pankowska, Alexandru Cernat, and Ruben L. Bach in Field Methods
Footnotes
Acknowledgments
Declaration of Conflicting Interests
Funding
Data Availability Statement
Supplemental Material
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
