Abstract
Media use, particularly among younger people transfers more and more to mobile phones (e.g., Anderson & Jiang, 2018; Silver, 2019), with almost half of adolescents stating that they are online almost constantly (Anderson & Jiang, 2018). For researchers who are interested in understanding smartphone-usage patterns, as well as the causes and effects of smartphone use, it is crucial to get accurate estimates of smartphone and app usage. This is particularly important because self-reports have been shown to be unreliable (e.g., Araujo et al., 2017; Araujo & Neijens, 2020; Naab et al., 2019; Ohme et al., 2020; Parry et al., 2021). For example, in a recent meta-analysis, Parry et al. (2021) found only moderate associations between measures based on self-reports and logged media use data, questioning the validity of using self-reports for assessing media use.
Although, in the past years, a plethora of apps have been developed to track smartphone use unobtrusively, these apps have three shortcomings for use in social scientific research. First, most apps that allow tracking app usage data are limited to Android phones (or provide only limited data for iPhones). This is problematic as approximately 25% of smartphone users worldwide and more than 50% in North America use smartphones which run on iOS rather than Android (statcounter, 2020). Second, these apps are created by third parties which limit the control of researchers regarding measurement, storage, management, and protection of the data (Breuer et al., 2020). Finally, participants in studies employing these apps have limited control over the type and amount of data that is logged from their phones, and have limited knowledge on the exact amount and type of information they are sharing during their research participation. This might infringe on participants’ decision rights concerning their own data (Boonstra et al., 2018; Harari et al., 2016; Schneble et al., 2020).
To circumvent these problems, there have been recent attempts to request data donations from participants based on their iOS
More granular information on app usage is provided by the iOS
Digital Trace Data in the Social Sciences
Due to the increased digitalization of society, a wealth of data is tracked by and stored on the personal devices people use. The capabilities of data logging and storing on personal devices grow exponentially, with chips growing smaller and multiple sensors being developed. In many cases, mobile phones are the hub of unobtrusive data collection. As a result, for every individual that takes part in social scientific research, there is potentially already an abundance of data available (e.g., Fukazawa et al., 2019; Montag et al., 2020; Trifan et al., 2019). These data are typically referred to as digital trace data, and can be defined as “
Digital trace data provides researchers in a wide range of fields—including psychology, sociology, and communication science—with incredible opportunities (e.g., Choi, 2020; Rafaeli et al., 2019; Stachl et al., 2020; Stier et al., 2020). Digital trace data are celebrated because they offer researchers the opportunity to collect media use data in accurate and unobtrusive ways (Araujo, et al., 2017; Boase, 2016; Jones-Jang et al., 2020; Reeves et al., 2019). However, one of the most important concerns of researchers who are considering the use of digital trace data might be the ethical ramifications of such an approach (Boeschoten et al., 2020). At the moment, ethical guidelines for the use of digital trace data are largely missing and the benefits of measuring something objectively and unobtrusively may not outweigh the infringement on privacy inherent to these approaches (Christen et al., 2017; Stier et al., 2020). To make it possible to use digital trace data in an ethical manner, alternative methods have been devised, including digital data donations (e.g., Boeschoten et al., 2020; Halavais, 2019).
Challenges for Data Collection and Processing
Digital data donations consist of data that are automatically tracked by digital devices, and that users then make available to researchers (e.g., Boeschoten et al., 2020; Halavais, 2019; Ohme et al., 2020). The main advantage of data donations in comparison to automatic tracing data is that participants have some control over the amount and type of data they are sharing with the researchers. At the same time, there are concerns that using a data donation approach in research might be less inclusive and may result in attrition, because data donations require several additional actions and skills from participants (e.g., Elevelt et al., 2019). In the case of smartphone or social media data donations, participants have to follow a number of steps to make their data available to researchers (see Figure 1). All these steps require a basic level of digital literacy from participants. Multi-step data donation process: From data collection to data processing.
First, participants need to use the requested media (e.g., social media), or media devices (e.g., smartphones) as they usually would. Second, participants should be able to locate the data requested by the researchers on the mobile device. For this, they need to access their mobile device, including the specific widgets, smartphone features, or social media accounts. Third, they need to either directly download the requested data on their personal device (as is possible, for example, for social media accounts, or video streaming accounts such as Netflix), or they need to capture the requested data with screen shots or video recordings (e.g., if data from smartphone features is used). Fourth, once the data has been downloaded and/or recorded, these files need to be shared with the researchers (e.g., uploaded to a server, or securely mailed to researchers). Specifically, for smartphone data based on the Battery or Screen Time feature, participants have to access the respective feature, capture the information with a screen shot or video recording, and finally share the captured data with the researchers.
Considering this multi-step process of data donations, this approach has at least three challenges which are related to data collection and data processing that need to be considered to fully capitalize on this method. With regards to data collection, following all these steps and sharing data over a longer period of time requires both technical as well as psychological compliance from the participants. Moreover, participating in a data donation study might lead to reactivity to the method . Finally, after participants have shared their data, researchers need to process the acquired information in efficient and accurate ways.
Challenges for Data Collection: Compliance and Reactivity
Compliance in Multiple Day Data Donation Studies.
Compliance is typically seen as the willingness to participate in a study, and to follow all requirements related to study participation. Willingness to comply is particularly problematic in intensive longitudinal studies such as experience sampling or diary studies in which participants have to respond over the course of several days and sometimes several times per day (Rintala et al., 2019). In social scientific research, these types of methods are becoming increasingly popular, and combining experience sampling with digital trace data or data donations provides many opportunities for understanding media use, and its effects in ecological valid ways (Choi, 2020; Stier et al., 2020). Typically, compliance decreases over the course of a longitudinal study (e.g., Rintala et al., 2019). The willingness to participate in a longitudinal study including data donations might be even lower as active steps are required from the participants to make data available in comparison to just answering a few questions (e.g., Silber et al., 2021; Skatova & Goulding, 2019). Willingness to comply can be compromised not only by the experienced burden of participation, but also due to forgetting to complete the requested actions.
In addition to the psychological willingness to comply, in research that involves data donations, compliance could also be affected by the technological skills of participants and technical difficulties encountered at each of the aforementioned steps (e.g., Ohme et al., 2020). Although technology dependent compliance is partly outside the control of researcher (e.g., someone’s phone is broken), the researcher is able to scaffold the knowledge of participants with limited technical skills. This can be done by making the data donation approach as simple and parsimonious as possible, limiting the number of actions a participant has to take, and making sure that all steps of the process are well explained to the participants. These instructions provided in real time by the researchers can be supplemented with recorded instructions which the participant should be able to access during later stages of the research. A technology helpdesk could also be a way to limit the impact of variance in respondents’ technical skills.
There are only a few first studies investigating compliance for data donations (Gower & Moreno, 2018; Ohme et al., 2020). Studies indicate that the willingness to participate in mobile data donation studies are rather low and partly depend on individual characteristics of the participants, such as age and conscientiousness (e.g., Elevelt et al., 2019). However, these studies used cross-sectional designs or a maximum of two data collection moments. In the present study, we therefore test compliance rates and changes in compliance over the course of a 7-day study. We expect compliance to decrease over the course of the study.
Reactivity in Multiple Day Data Donation Studies.
Participating in a longitudinal study may result in a change in respondents’ behavior or attitudes, which is typically called reactivity or panel conditioning (Halpern-Manners et al., 2017; Van der Zouwen & Van Tilburg, 2001). Reactivity has been observed in a variety of contexts and might be particularly problematic in studies using smartphone data donations for two reasons. First, by taking part in a study in which smartphone use is tracked, participants might believe that they need to adapt their smartphone behavior in order to demonstrate smartphone use patterns similar to the majority of users. Thus, they may adapt their use in order to comply with social norms. Second, by making screen shots or videos of their own smartphone use statistics, participants get very clear insights into their daily app usage and screen time. This knowledge might make them more aware about the amount of time they spent with their smartphones, and may motivate them to change their behavior.
Both sources of reactivity, that is, 1) reactivity due to mere study participation and 2) due to daily exposure to personal media use insights, can lead participants in data donation studies to either change the overall time they spend with their smartphones or to change the use of specific apps during the course of the study. For example, when participants see that Instagram is the app they use most often, and feel this is more often than they had expected, they might decide to use this app less the next day. Whether reactivity plays a role in data donation studies has not yet been investigated. This is surprising because reactivity might pose a major challenge to data donation approaches. We will, therefore, investigate whether participants change their smartphone behavior, in terms of overall screen time and the types of apps used over the course of the study. This will allow us to get insights into the magnitude and pattern of reactivity for longitudinal data donation approaches.
Challenges for Data Processing: Accuracy
Although the collection of data donations can be easily scaled up, the processing of the collected data requires additional steps. Particularly for larger samples, or if more time points are assessed, an automatic processing of the collected data is necessary in order to make it feasible for usage in larger studies. An automatic approach to process the data will allow researchers to collect more data and to use similar approaches in a variety of studies. To our knowledge, no automated approach that processes the data of such data donations has been published yet. We will, therefore, present in this study a Python script that automatically traces all data in the screen recording videos (for an impression of a screenshot of these videos see Figure 2a and b). Example of a manually annotated image used for model training showing various detection fields Visual representation of the object detection model output showing several recognized app fields.

The Current Approach
The data donations used in this study are based on information provided by the iOS Battery Section. The Battery Section provides detailed information about app usage in minutes for all apps used (on screen or in background) during each hour of the day. To receive this information, participants were asked to make screen videos of this information by capturing the information provided by the Battery Section. For this, participants had to start the screen video function on their phones, then enter the Battery Section, and finally tap on each hour of the day to show which apps have been used during each hour. Creating these videos took approximately 1 minute per day.
Script for Automated Processing of Battery Section Recordings
In order to allow quick, easy and accurate processing of the video recordings, a Python script was developed. This script processes iPhone screen recordings of the Battery Section pages and provides a CSV file with the following information: which apps were used at each hour of the day, time in minutes that each app was open on screen per hour, and time in minutes an app was open in the background per hour. This is accomplished by (1) transforming the input video into a series of frames (as numeric arrays), (2) performing object detection to extract the relevant fields (app name, time indication, etc.) in each frame, (3) sorting and removing irrelevant fields, (4) running optical character recognition (OCR) to transform the detected fields into text data, and (5) processing the text data to re-combine and sort the relevant fields for the CSV output. The script currently processes videos based on the English or Dutch version of the Battery Section.
Video to Frames Conversion (1)
The videos are read using the OpenCV library (2015) and loaded into memory frame by frame as NumPy (Oliphant, 2006) arrays. Since consecutive frames are likely to contain identical information, the script compares the information in each frame to the information in the subsequent frame and records the absolute difference between the two. This is accomplished by converting copies of the frames into grayscale, applying Gaussian blur (as to minimize the effect of artifacts), and subtracting one frame array from another. Then the baseline in the resulting array of differences is estimated and the values sufficiently distant from the baseline are indexed. These indices are used to select the relevant frames from the initially read NumPy arrays. Currently, the sufficient distance from the baseline is estimated automatically such that the number of relevant frames returned by the function is at least twice the length of the screen recording in seconds.
Object Detection (2)
Once a selection of relevant frames is made, the frames are run through an object detection model to detect the following two fields: indications of which hour is being shown and apps with their logos, names, and durations of activity on screen and in the background. To detect these fields, a TensorFlow (Abadi et al., 2016) model was trained on manually annotated screen captures (screenshots,
Sorting and De-duplication (3)
Then OCR (see below) is performed on the hour indication fields to: (1) detect frames that show app usage not by hour but for the given day overall and remove them, and (2) sort the detected app fields into their corresponding hours. The sorting is done by grouping together all the app fields that appear either in the same frame as the hour indication field or—in case the hour indication field is not in the frame—the app fields in all frames that precede a frame with an hour indication field showing a different hour. Thereafter, an OpenCV (2015) implementation of the scale-invariant feature transform (SIFT; Lowe, 2004) algorithm is used to remove duplicated app fields within each hour to speed up the subsequent steps.
Optical Character Recognition (4)
Another instance of the object detection mode is run on the previously detected app fields to extract the image coordinates of the app name and the time for which it was used in the given hour. As before, to do so, a TensorFlow (Abadi et al., 2016) model was trained on manually annotated app fields ( Example of an app name field before (left) and after (right) pre-processing.
Data Sorting (5)
Example of the Battery Section Video Output.
Method
Sample and Procedure
The data donation approach described in this paper was part of a larger study on smartphone use and sleep. Ethical approval was granted by the ethics board of the Communication Science Department of the University of Amsterdam. The data of 93 participants is presented in this paper
4
. All participants were university students (66% female,
Participants downloaded an app (MyPanel) on their phone which was used for daily screen video uploads and additional daily surveys (not reported here). Finally, they filled in an online survey to collect information about demographics, personality traits and their general smartphone use and sleep patterns. After the introduction meeting, they were asked to upload a screen video of their Battery Section every morning (starting the next day) for the next 7 days. A link to an instruction video with a short description of how to take the screen videos was accessible to all participants via the app so that they could check this in case they did not remember how to record the videos. After the 7-day period, participants returned to the lab for an exit meeting, during which they completed a final questionnaire answering questions about their perceived difficulty of sharing the screen recordings, and their perceived reactivity. Participants could choose to be rewarded with 10€ or research credits for participation. After all participants had taken part in the study, the Battery Section videos were processed with the Python script described above.
Results
Descriptives
In total, participants used 773 unique apps during the course of the study. Overall, app use showed a long-tailed distribution, with a few apps being used very frequently, and many apps being used only occasionally. Figure 4 shows the distribution of the top 50 apps. As can be seen the most used app was the “home and lock screen” which is used when unlocking the screen, followed by WhatsApp, Instagram, and Snapchat. Figure 4 includes an overview of the amount of time apps were used per hour (combined for all seven days). It shows that smartphone use is highest in the evening hours, and particularly high between 9:00p.m. and 11:00p.m. Frequency of the 50 most used apps (A), frequency of apps per hour over all participants (B), and frequency of apps used per hour (per participant) (C).
Script Accuracy
The accuracy of the automated video-processing script was assessed by comparing automated script output of 19 videos (1242 app entries) to the manual coding of these same videos (1258 app entries). Both datasets contained the following four columns: app name, time of use on screen, time of use in background, and the hour in which it was used. The accuracy was calculated as the percentage of entries that matched between the manually coded and the automatically processed datasets. Separate scores were calculated for the on screen time, background time, and total time (on screen and background use combined) matches. Forty-two app entries that had neither an indication of time on screen nor of time in background (e.g., some instances of Siri and Torch apps) in the automated output were removed from the manually coded dataset prior to the analysis as such apps were skipped in the automatic coding by design. Similarly, seven apps whose names were not in a Latin script were removed from the automatically generated dataset as these apps were not coded manually. To account for possible minor differences in the app name spelling between the two datasets, fuzzy string matching was used.
The on screen use accuracy (both app name and the on screen time matching between the two datasets) was 86.33%. The background use accuracy was 90.03%, and the total use accuracy (on screen and background use times combined) was 89.60%. In total, 1206 app names (95.86%) present in the manually coded data were correctly identified during the automated processing. Out of these apps, 100 had incorrectly identified screen time, 112 incorrectly identified background time, and 52 had incorrectly identified total time. Finally, 25 apps that were not present in the manual dataset were erroneously included in the automated one and 57 apps were overlooked by the video-processing script; 23 of them due to an entire hour having been missed during the automated processing.
Compliance
In total, 544 videos were uploaded, which amounted to an average of 5.85 ( Amount of participants that shared videos on a minimum of 1 to a maximum of 7 days (A), Total submitted videos per day,(B) V.
The self-report data collected at the end of the study showed that the majority of participants agreed or strongly agreed to the statement that recording the screen videos was easy (56%). However, some somewhat disagreed (19%) or strongly disagreed (6%) with this statement and thus perceived recording the videos as rather difficult indicating that compliance might be partly inhibited due to technical difficulties some participants encountered.
Subjective and Objective Reactivity
Concerning self-reported perceived reactivity, the large majority of participants stated that they believed that participating in this study did not change their smartphone use behavior during the course of the study (85%). However, 10% indicated they used their phones less, and 3% stated they used their phones more because of study participation.
Objective reactivity was assessed by examining Battery Section data over time. More specifically, we conducted a linear mixed model using the lme4 package in R (Bates et al., 2015), with time as a predictor, the total minutes spent for apps per day as the outcome measure, and a random slope for participants by time. The random slope takes into account individual differences in app time across days and the unbalanced data (i.e., some participants submitted more data than others). Examining the app use data, it seems that time spent on the phone actually somewhat increased over the 7-day course of the study, Average time spent on apps per study day (A), the top five most used apps per submission day (B), and the actual fluctuation of app usage per participant (C).
Discussion
The present study introduced a novel approach to collect and process data based on the iOS Battery Section. To our knowledge this is the first study presenting an automated script to process video data from the iOS Battery Section, and the first study reporting compliance and reactivity for a longitudinal data donation study. We hope that the detailed account and the availability of the Python script 5 will allow other researchers to easily implement this approach in their study, and will enable them to collect accurate smartphone use data.
Concerning compliance, the findings of this study are in line with previous studies showing that compliance typically declines over the course of a longitudinal study (e.g., Rintala et al., 2019). Nevertheless, participants uploaded videos on average on more than five out of the seven days, and more than two-third of the sample uploaded video for six or seven days. Given that we gave participants instructions to upload these daily videos but did not remind them during the study period, the study shows rather high compliance rates and underlines the feasibility of our approach. Several participants noted during the exit meeting that they just forgot to upload videos. We, thus, believe that compliance can be easily heightened through reminders, and are especially important to consider when studies are longer than five days. The study also suffered from diminished compliance due to technological failure. During the exit interviews some participants mentioned that they had encountered problems while uploading the videos to the app that was used in this study. These problems can be easily circumvented in future studies by making other uploading options available to participants. Thus, we recommend creating fall back options, for example, allowing participants to upload their files through a research website rather than the research app on their phone.
It is important to note that participants in the current study were all highly educated University students who might have above average technical skills. To use this approach among other age groups and people with lower media and technology literacy might require additional training sessions to improve compliance. Future studies should test this approach and compliance among more diverse samples. However, we believe that at least for the age groups that use smartphones the most in their daily lives (Pew Research Center, 2019), the current approach is feasible and produces high compliance rates.
A second promising result for future studies is that reactivity seems to play a minor role when using data donations. We expected that awareness and social desirability might lead to a decline in smartphone use, however, this did not occur. In contrast, we rather found a slight increase in smartphone use over the course of the study. One potential explanation is that participants decreased their smartphone use at the beginning of the study and then returned to normal levels during the course of the study. Importantly, the types of apps that were used in the sample, did not change over the course of the study. Also, only very few participants indicated that they changed their smartphone behavior as a result of taking part in this study. Although this is definitely a good sign, other types of reactivity might still have occurred that are not reflected in the data. For example, participants might have chosen to not use specific apps during the course of the study that they normally would use, or they might have used them on other devices. Future studies are needed to test reactivity in more detail.
The study also showed that the developed Python script worked well and provided accurate app usage data. However, the script can still be improved. The key frame detection algorithm should be optimized in future versions of the script to improve both the app recognition accuracy and the video-processing time. The current approach extracts some superfluous frames (thus unnecessarily increasing the processing time) and can sometimes miss relevant ones (thus reducing the overall accuracy). Better performance can be achieved by using more sophisticated peak detection algorithms and baseline estimation techniques. For instance, an automatic scene change detection algorithm could be introduced and separate baselines estimated for each scene. Another issue that should be addressed in future revisions is the script’s incorrect categorization of the apps that were open exclusively in the background in a given hour. Presently such apps are erroneously categorized as being open on screen. This issue arises from the script relying only on the app time indications and their relative positions and can be solved by also detecting the words ‘on screen’ and ‘in background’ and employing that information in the categorization.
Implementation Recommendations for Future Research
To enhance the successful use of data donation in combination with the presented Python script, we recommend researchers adhere to the following guidelines when designing their study.
First, it is helpful to organize intake meetings with (small groups of) participants to explain the different actions that are required during the study and include a live demonstration of these actions. This intake meeting can be organized both offline as well as online, where instructions can be provided in a web conference environment with screen sharing features. Second, it is useful to provide instructions for creating data donations as reminders to participants either via video instructions, or other forms of easily comprehendible and accessible information. In the present study, we provided instruction videos implemented in the research app, so that participants could easily access them to refresh their memories. This ensures that participants have access to clear instructions about the most important part of the data donation procedure. When finances allow it, we would also recommend implementing an online helpdesk accessible through the app where people can indicate when they experience any technical difficulties.
Third, the study should be fielded among participants who use the same language, or carefully instruct users to change the language settings of their phones to match the language that the script is developed for, in this case English and Dutch, although additional languages can be easily added to the script. These instructions can be provided during the online or offline intake meeting. Finally, it will be important to develop a method that anonymizes video content before it is stored long term on researcher’s cloud space. Until that is possible, to ensure the privacy of participants researchers using this method should instruct respondents to change the name on their phones for the course of the study. If this step is not followed, the uploaded videos are potentially identifiable. In addition, participants should be instructed to re-record their screen recording when a pop-up notification appears including personal information, for example from WhatsApp.
Conclusion
The present study introduced a new, transparent, data collection and processing method to gain accurate app usage data based on the iOS Battery Section. Although the data donation approach used in this study required additional steps from the participants, the compliance rate of participants was high. Moreover, this study provided first evidence that digital data donations do not change the studied behavior of the participants. This highlights the usability of this and similar data donation approaches for research. Moreover, the developed Python script makes the processing of the data donation videos easy and fast, and thus will enable future researchers to use this method without additional work for processing the data manually. We, therefore, hope that the presented approach will encourage more researchers to use this or similar approaches to gain accurate smartphone use data in an ethical way.
