Abstract
Introduction
For much of the last 50 years, medical research primarily involved testing potentially harmful interventions on humans. However, when research is limited to linking and then analysing pre-existing data, different risks are created. This raises the ethical question of whether consent to link pre-existing data is necessary, particularly since requiring consent may hinder research, distort results and create novel privacy risks. In this paper, I focus on the secondary use of pre-existing de-identified public sector health data in linkage research, where linkage is defined as the process of matching records from multiple datasets that belong to the same individual (Winglee et al., 2005). I will argue that consent from data subjects is not required before linking pre-existing data, when considering the potential benefits and risks involved. I will also argue that citizens have a conditional moral responsibility to make their data available for such research because maximal benefit from this type of research is realised only when datasets contain information about sufficiently large numbers of data subjects. We each have an individual and collective responsibility to contribute to the ‘herd knowledge’ that these datasets yield. This paper adds to the literature by exploring novel arguments for this position, based on the rescue principle, distributive justice and vaccination ethics.
I will make several assumptions to limit the scope of my arguments. First, I will only consider linkage research involving administrative data that has been stripped of identifying information. I will not consider research involving access to descriptive content data, such as clinical records. Second, I will only consider data linkage research projects that are of sufficient scientific merit and social benefit to have received Human Research Ethics Committee (Institutional Review Board) approval. Third, while attempts should always be made to obtain some form of consent from data subjects prior to data collection, this paper only addresses the question of what to do with potentially useful data that has already been collected without consent. As such, I will not consider the relative merits of different types of consent models, such as broad consent, opt-out consent, dynamic consent or meta-consent (Teare et al., 2021). Finally, I will only consider public sector health data collected by democratic governments for use in publicly funded research. Important concerns about the nefarious use or sale of health information collected by private corporations such as Google and Facebook, or health insurers, are beyond the scope of this paper and described elsewhere (Véliz, 2020).
Why consent to linkage research is unnecessary
Sensitive information about individuals is routinely generated, collected and compiled into large datasets by governments and healthcare providers without consent. Examples include the notifications of births, deaths, cancers, infectious diseases and hospital admissions. Digitisation has accelerated the capacity to extract, manipulate, aggregate, share and analyse this information. This creates innovative research opportunities that is supported in many jurisdictions by open data policies (Andreu-Perez et al., 2015). For example, the Australian Government has promised to transform public sector data sharing from ‘aversion and avoidance’ to ‘transparency and confidence’ by allowing researchers to access de-identified data for research with clear and direct public benefits that minimises privacy risks (Anton, 2020).
Benefits and risks of data linkage research
The compilation of datasets pertaining to entire populations over many years facilitates the efficient investigation of rare diseases, rare outcomes and under-researched populations. Findings drawn from these datasets can be statistically powerful (Corrada et al., 2014). Linkage between datasets can uncover new information, without exposing research participants to potentially harmful experiments or interventions. In some circumstances, this is the only morally acceptable way of evaluating ‘real world’ outcomes in groups, such as pregnant women, for whom interventional research may be too risky (Nguyen and Barshes, 2010). However, data linkage is not itself without risks.
Data linkage involves the storage, transfer and analysis of sensitive information. Privacy breaches can occur. Researchers can match de-identified data with publicly available information, allowing sensitive inferences to be drawn from non-sensitive data (Townsend, 2021). More importantly, data linkage research can adversely impact minority groups, including Indigenous communities. This can occur when they are excluded from data collections or when research questions, methodologies and analysis do not appropriately reflect Indigenous priorities, values, identity and diversity. ‘Indigenous data sovereignty’ allows Indigenous people to be the primary beneficiaries of their data, knowledge and cultural heritage by entrusting Indigenous communities with control over what and how data is collected, analysed, interpreted and disseminated (Kukutai and Taylor, 2016).
To reduce data linkage risks, Human Research Ethics Committees usually require researchers to separate identifiable data (e.g. name, address and date of birth) from non-identifiable data and use a non-identifiable ‘linkage key’ to compares data from the same individual across multiple datasets (Emery and Boyle, 2017). Also, in Australia, data linkage often only occurs within accredited data linkage integrating authorities (Australian Government, 2020) where researchers must consider the privacy impacts of funded research and securely store data on encrypted drives and adopt stringent data management plans (Christen et al., 2020). Together with growing understanding of Indigenous data sovereignty, these measures have proven largely effective in mitigating the risks, with few complaints or breaches registered (O’Keefe and Connolly, 2010).
Benefits and risks of abrogating consent
Central to research ethics is the requirement that human research participation should follow a free and voluntary choice made by potential participants based on sufficient understanding of risks posed by participation (World Medical Association, 2013). The compulsory acquisition of data for future use is potentially coercive because it removes the choice of citizens not to yield their data (Wertheimer, 1996). This moral cost is worth carefully considering, given the salience afforded to personal autonomy and privacy in liberal democracies (Herring and Wall, 2017). Uncritically abrogating informed consent may be ‘morally hazardous’ (Boyd, 2007) because privacy rights protect people from unsolicited surveillance of, and intrusion upon, their private lives. Moreover, enforcing research participation could undermine public trust if research governance inadequately protects subjects from risks they want to avoid (Carter et al., 2015). As a result, some citizens might avoid accessing health services or withhold information from health providers, resulting in suboptimal health outcomes.
Nevertheless, requiring consent for linkage research may be unfeasible and may even create new risks. First, it would be almost impossible to obtain consent from the millions of data subjects whose data is contained within some large datasets. The time and cost expended could curtail important and time-critical research if entire research budgets were spent obtaining consent rather than analysing and interpretating data. It might also be impractical to obtain consent when data from smaller datasets has been collected over a prolonged period because data subjects may be uncontactable if their name or contact details have changed since enrolment, or if they have died.
Second, requiring consent can reduce inclusion of sufficient numbers of research subjects through low response rates and attrition, leading to sub-optimal or incomplete linkage (Bohensky et al., 2010). In the case of research involving rare diseases or rare outcomes, it can also skew results, causing ‘consent bias’. This occurs when those who consent to research participation systematically differ from those who do not in ways that make those studied unrepresentative of the population (Mazzali and Duca, 2015). While there is debate about its frequency and impact (Ploug, 2020), it likely reduces the reliability and validity of research findings to some extent.
Third, the information used in data linkage research already exists, which means that declining consent to its use does not guarantee secure collection or storage. Most importantly, reidentifying anonymous data subjects for the purpose of obtaining consent aggravates the risk of privacy breaches to de-identified individuals because it exposes their identity to researchers.
Balancing private interests and public benefits
Clearly, linking data without consent entails balancing autonomy infringements with potential healthcare benefits (Gostin and Wiley, 2016). Many moral frameworks converge on the premise that coercive policies may justify curtailing individual freedoms in pursuit of anticipated social benefits (Department of Health, Education, and Welfare; National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, 2014). Even small benefits may allow for interventions that are relatively costless (Childress et al., 2002), provided they are effective, necessary and proportionate (Bradfield and Giubilini, 2021). In this case, the re-use of data without consent is both
Therefore, while consent buttresses individual autonomy and maintains trust in public health research, it ought not be required for linkage of data already collected because: (1) the risk of using de-identified information is minimal; (2) the potential benefits to others are significant; and (3) requiring consent can create novel privacy risks and frustrate the realisation of the full potential of large datasets. Indeed, not only is consent not required, but there is a moral obligation on citizens to contribute their data to research.
An obligation to contribute data to research
The established view that research participation is voluntary and supererogatory is increasingly being challenged by the view that citizens have a moral duty to participate in some research (Stjernschantz Forsberg et al., 2014). In this section, I will argue why citizens have a moral obligation to contribute their data to linkage research.
Duty of easy rescue
According to the rescue principle, when the actions required by an individual to avoid possible harm to others are sufficiently easy and costless, an individual has a moral obligation to act. Peter Singer conceives this duty through his much-cited shallow pond example:
The moral duty to rescue is sculpted by the urgency of the situation, the consequences of inaction and the exertion required by the potential rescuer to prevent those consequences (Bradfield, 2021). Data contribution entails a relatively small cost to individuals. Although the individual impact of a privacy breach may be significant, the likelihood is small. Likewise, when Indigenous and other minority groups affected by research projects are involved in the planning, conduct, analysis, interpretation and publication of linkage research, group harms can also be mitigated. On the other hand, data contribution also entails a potentially large benefit to others in terms of knowledge gained that may save future lives or prevent serious suffering. Some have therefore argued that a ‘duty of easy rescue’ grounds an individual’s duty to contribute data to research (Porsdam Mann et al., 2016).
However, there are two criticisms of applying this principle to the duty to contribute data. First, unlike some potentially life-saving interventional research, data linkage research is not ‘an instrument for rescuing sudden victims of accident’ (Ploug, 2020). Instead, its aim is to prevent harm to potential future patients. Second, there is no guarantee that data linkage research will uncover useful information that will prevent future harm. Even if the information is useful, there is often a ‘lengthy, complex and unpredictable path’ (Ploug, 2020) from scientific discovery to improvements in healthcare.
These criticisms weaken, but do not invalidate, the argument that a moral obligation to contribute data could be grounded in a duty of easy rescue. Some research may be urgently required to address issues of pressing global significance, such as the need to understand, trace and prevent COVID-19 transmission (Bhattacharya et al., 2021).
Distributive justice
A stronger argument in favour of a duty to contribute data to linkage research is grounded in the principle of fairness. In his seminal work on the ‘duty to research’, John Harris argues that those who benefit from medical research without contributing to it are free riding because they benefit from others’ efforts without reciprocating (Harris, 2005). Being a free rider is unfair and people have a moral obligation not to act unfairly. Therefore, fairness requires that we contribute data to research from which we will benefit.
Brassington disagrees with Harris’ fairness argument. He asserts that there is no free riding problem if people who do not participate in research have paid for medical benefits they receive through taxes, medical insurance premiums or other means (Brassington, 2007). However, while Brassington’s response addresses the question of free riding in relation to the end benefits of research, it does not solve the free riding problem with respect to the research discovery process, which may lead to those medical benefits. Applying these opposing views to vaccination, if individuals fail to vaccinate themselves, but society at large is sufficiently vaccinated, then unvaccinated individuals benefit from herd immunity (Kim et al., 2011). On the one hand, Harris might say that this is a form of free riding because unvaccinated individuals benefit from the indirect protection afforded to them by those who are vaccinated. On the other hand, Brassington might say that unvaccinated individuals are not free riding because their taxes paid for the vaccines given to others. However, neither position is entirely correct because achieving herd immunity requires, amongst other things, payment of taxes
Paying taxes without also being vaccinated, or
Another reason why data contribution might be fair is because the amount of health data generated by individuals should be directly proportional to the amount of healthcare they receive as patients. Patients who receive more clinical care generate comparatively more health information that is available for research use. These people theoretically stand to benefit the most from the data generated and collected from others that can improve treatment of medical conditions they have. However, this argument assumes that access to health services is equitable, which we know is often not the case. Those who generate the most data may not always receive the most healthcare and this is problematic because information they have given up may not be fairly exchanged for care. Therefore, this argument only holds if everyone receives the quality and quantity of care that is commensurate with how sick they are. Moreover, we must also protect vulnerable groups from potential exploitation. For example, minority groups and Indigenous communities have not always benefited from public health research (Walter et al., 2021). Many Indigenous and minority ethnic communities have suffered significant cultural harm and discrimination from the inappropriate use of public health data, poor quality data and erroneously implied causality. Nevertheless, fairness can still be achieved if these communities are afforded control over how linked data is analysed and interpreted.
While there are limitations with the fairness argument, routinely collected health data is a public resource and, to the extent possible, should be used in ways that maximises public benefit. The curation and generation of these datasets requires public expenditure and the information contained therein provides a platform for answering important questions that can drive public healthcare improvements.
Contributing to ‘herd knowledge’
I propose a third and novel basis for justifying a moral obligation to contribute data to research that draws upon arguments used to justify a moral obligation to vaccinate people in pursuit of ‘herd immunity’. Like individual vaccination decisions, individual decisions about data contribution affect both the individual and society. ‘Herd immunity’ describes the indirect protection from infectious diseases afforded to unvaccinated people when a sufficient proportion of the population is vaccinated (Randolph and Barreiro, 2020). By analogy, only when enough people contribute their data to large data linkage studies can the results emanating from large datasets be statistically significant, meaningful, reliable and applicable, even to those who do not contribute their data to the research from which they are benefitting.
I suggest the term ‘herd knowledge’ to describe the benefit that flows to society when potentially powerful information is generated from research using sufficiently large datasets. Like herd immunity, the pursuit of herd knowledge may ground an individual moral obligation to contribute data because it requires the cooperation of a sufficiently large number of people for potential collective benefits to be realised. If, like vaccination, there is a collective moral obligation to realise herd knowledge, then each of us has the individual moral responsibility to contribute to data linkage research, all things considered.
When a sufficient portion of the population is vaccinated, vaccinating more people usually contributes little more to herd immunity and individual vaccination risks outweigh the collective benefits of herd immunity (Giubilini, 2019). Even though viruses are dynamic and the threshold for herd immunity can shift, the incremental benefits in vaccinating above the threshold for herd immunity may be limited. Likewise, while collecting and analysing ever more data can produce more powerful, representative and granular results, this will likely produce diminishing returns from subsequent additions to a dataset. Therefore, just as herd immunity generally occurs only up to a threshold level of vaccination coverage, there is also likely to be a ceiling effect when it comes to the benefits of contribution of data to linkage research. Above that ceiling, the focus is likely to change from maximising recruitment numbers to focussing on who is included and excluded from the dataset.
However, there are some important disanalogies between the putative moral obligations to pursue herd immunity and herd knowledge. First, while the failure to vaccinate can create a direct harm to others by facilitating viral transmission, no such direct harm exists when we fail to contribute our data. Research using a small dataset may yield valuable results, but research using a larger dataset may yield more valuable results. The failure to fully realise the benefits of existing information by not consenting to its linkage does not create tangible harm. However, a person could still be individually blameworthy for failing to contribute data to potentially beneficial research because it is a failure to ‘do her part in a collective action’ (Giubilini, 2019) that could derive knowledge and information about the treatment of medical conditions.
Second, the benefits of vaccination are primarily individual and secondarily collective, whereas in the case of data contribution, the potential benefits are primarily collective, but can also be individual. Although those who contribute more data may have more to gain because they utilise more health services (as outlined above), this individual benefit is unlikely to be sufficiently direct or contemporaneous to act as any kind of strong incentive for individuals to contribute data. It is precisely this absence of personal incentive, coupled to the potential social benefits of data research, that justifies the compulsory acquisition by the state of citizen’s data for use in data linkage research without the individual’s consent.
Third, the achievement of herd immunity has been used to justify mandatory vaccination, but not the forced vaccination of individuals against their will (Bradfield and Giubilini, 2021). On the contrary, I argue that the achievement of herd knowledge can be used to justify the analysis of collected information without any consent. The difference lies in the nature of the autonomy infringements at stake. While enforced vaccination invades bodily integrity and can give risk to physical harm, the infringement suffered by individuals when collected data is linked and analysed without consent is limited to one of informational control.
Conclusion
Recent empirical studies (Kalkman et al., 2019) show widespread community support for the use of de-identified routinely collected health data in research. Therefore, most people seem to largely recognise that health data research can lead to new treatments and better patient care and they support the use of health data in linkage research without consent, provided data is de-identified (Xafis, 2015). In this paper, I have argued that data linkage research can transform healthcare at relatively low cost to individuals. Therefore, when balancing the benefits and risks of data linkage research, it is morally permissible to use and link people’s data without their consent. Moreover, I have shown that the general population also has a moral obligation to contribute their data to such research endeavours based on arguments of beneficence, fairness and the attainment of ‘herd knowledge’. However, even if linkage of large datasets can uncover important information at low cost to individuals, robust systems of research governance remain necessary to ensure privacy impacts are minimised and Indigenous data sovereignty is protected.
