Abstract
Introduction and context
This article examines and synthetises current discourses and practices on the governance of data. It scrutinises different approaches for accessing, controlling, sharing and using data in today’s platform economy and derives four emerging models of data governance. The current platform economy is mainly characterised by the asymmetry of power of a few technology corporations and telecommunication companies that have established de-facto quasi-data monopolies. The negative societal implications of this system, including biases in algorithmic decision-making, nudging and manipulation, and privacy violations are increasingly highlighted by research (e.g., Beer, 2017; Kitchin, 2017; Taylor, 2017), while scandals such as Cambridge Analytica raise the awareness among public opinion and policy makers, at least in Europe, that the distortions of this model need to be addressed. The General Data Protection Regulation (GDPR) is an important step in this direction, even if with some limitations (Delacroix and Lawrence, 2019), and further new measures are being prepared in the European Union, including a Digital Services Act and a Data Act (European Commission (EC), 2020a). Given these forthcoming legislative changes, we deliberately eschew the legislative dimension in this article and focus instead on the social practices implemented and theorised for governing (big) data. In doing so we respond to scholars and policy makers advocating for a reorientation of the scientific gaze, from the critique of the current data landscape to the possibilities of agency from the ‘bottom up’ (Beraldo and Milan, 2019; Couldry and Powell, 2014; Kennedy et al., 2015), the ‘good data practices’ (Mann et al., 2019) and the ‘alternative’ data governance models (Morozov and Bria, 2018; Carballa Smichowski, 2019). We contribute to this line of inquiry adopting a
The article addresses the following questions: (1) What configurations of roles and relationships between stakeholders can we identify in the emerging models of data governance? To what extent are other actors beyond corporate data platforms able to participate? (2) What kind of value is pursued and how is it redistributed across actors and society? What mechanisms and arrangements are set in place to generate value from the data?
The dominant model of data governance in current ‘platform society’ is the one established by a few corporate big tech platforms (Srnicek, 2017; Van Dijck et al., 2018; Zuboff, 2015), but other actors beyond ‘big tech’ are progressively becoming involved in controlling personal data and producing value from it through different data governance models. These alternative models are the focus of this article. We are cognizant that platforms too have been addressing concerns related to power asymmetries, for instance through technical changes and transparency efforts (Gorwa and Garton Ash, 2020). However, these attempts often function as ‘technologies of legitimation’ (Harrison and Mort, 1998: 60) as they do not enable a
The policy relevance of this article stems from the current geopolitical competition around Artificial Intelligence (AI) seen as central to the development of our increasingly digital societies (Craglia et al., 2018; EC, 2018a). China, the USA, and many other countries are investing heavily in AI, and Europe is responding with its own coordinated plan (EC, 2018b) and a strategy for data (EC, 2020b) that stresses the value of extracting greater benefits from, and exercising greater control over, European data. In this phase of the policy and academic debate, we see the emergence of many terms and concepts like data trusts, data sovereignty, and so on with unclear or contradictory definitions and usage. The analysis of data governance models of this article clarifies some of these concepts and is therefore relevant to both research and policy.
The article is organised as follows: after this initial introduction, the next section defines our conceptualisation of data governance, while the subsequent section details the research strategy and the dimensions that informed the analysis. We then illustrate the data governance models that resulted from our research. In the discussion, we critically examine these models addressing the research questions. After specifying the limitations of the study, we conclude highlighting the contribution of the article to the current policy debate on data governance.
Data governance: A social science-informed definition
The term governance has been extensively used in the last two decades but its meaning is still ambiguous (Colebatch, 2014; Rhodes, 1996).Our understanding is informed by existing debates in the political science and risk scholarship (Colebatch, 2014; Kooiman, 2003; Rhodes, 1996) where, for example, governance has been framed as ‘the multitude of actors and processes that lead to collective binding decisions’ (Van Asselt and Renn, 2011: 431). Governance broadly refers to the web of actors involved, with different roles, in the process of governing a system. The term stresses a discontinuity from so-called ‘command-and-control’ by the State, and acknowledges that a broader set of actors and institutions are (also) involved in managing societies (private sector, civil society and other non-government entities; Kooiman, 2003). Governing is the result of a process, which does not only occur through rule making and rule enforcing but develops also from (social) interactions, cooperation and negotiations between stakeholders at the horizontal level (Colebatch, 2014).
The notion of governance has a descriptive meaning but also has a normative scope capturing a way of ‘rule-making’ (including strictly laws, but also regulations, standards, etc.) and the relative allocation of responsibilities and liabilities. Governance embraces a broader transition from more centralised to decentralised forms of rule-making. Van Asselt and Renn (2011: 435) underline the normative understanding of governance describing it as a ‘model or framework for organizing and managing society’. This normative understanding promotes wider participation and increased accountability in formal decision-making processes. Participation of a broader set of stakeholders itself is indeed acknowledged as a
As Wolf (2002) puts it, the governance phenomenon takes place within
Based on this understanding of governance, we examine in this article ways in which personal data collected through datafication processes is and could be governed. There is an extensive literature on data governance in the fields of privacy regimes (Bennett and Raab, 2018) and of information systems, from which we draw for the definition of the analytical dimensions (Abraham et al., 2019; Winter and Davidson, 2019; Khatri and Brown; 2010). This contribution, however, adopts a social science-informed perspective of data governance that complements other framings, such as those of platform governance or privacy and data protection law. Our perspective on data governance draws in particular from STS and CDS, which informed our work through concepts of data infrastructure (Kitchin and Laurialt, 2014) and data politics (Bigo et al., 2019; Ruppert et al., 2017).
Following the conceptualisation of infrastructure in STS as a heterogeneous, relational, and complex socio-technical ‘assemblage’ (Slota and Bowker, 2017), a data infrastructure is seen as an evolving ecosystem with a plurality of actors having multiple interests, agendas, goals and strategies, and interacting with an array of tools, mechanisms, systems, interfaces and devices for governing data (Kitchin and Laurialt, 2014). A data infrastructure is implemented not only to support certain practices, but also to cultivate a specific imaginary, that is, a particular vision of data and its possibilities (Beer, 2017; Gray et al., 2018).
The notion of data politics emphasises the ‘performative power of data’. It understands data not only for its representational capacities, but also as a force ‘generative of new forms of power relations’ (Ruppert et al., 2017: 2), as how data is collected and processed generates power imbalances and information asymmetries in bringing into being the subjects and objects that such data concerns. Data politics conceives data as ‘an object of investment’ that is ‘produced by the competitive struggles of (actors) who claim stakes in its meaning and functioning’ (Ruppert et al., 2017: 5). It underlines the role of data subjects and asks questions about their position
Drawing on CDS and STS we use data infrastructure as an analytical lens to conceptualise the identified data governance models as a situated, contingent and relational instantiation of the stakeholder roles, their interrelationships, their articulations of value, and the organisations of governance principles, instruments and mechanisms in each model. Likewise, by acknowledging the asymmetries of the current data landscape and the public debate on how to challenge them, we direct our attention to the issue of power within the politics of data. Informed by these scholarships and concepts, we understand data governance as
Based on this conceptualisation, the article examines four emerging data governance models. These models could be understood as inventive practices that problematise current arrangements and reassemble them in accordance to the interests of the actors involved. Through the analysis we can scrutinise the ‘desirable futures’ these models promote (Jasanoff, 2015) and see whether they address the asymmetries of power of the current data landscape.
Research strategy
To identify emerging data governance models, we delved into grey and academic literature, as well as news articles and websites of recent projects and initiatives. The collection of resources started in preparation of a workshop held in October 2018 on data governance with 17 invited experts from academia, public sector, policymaking, research and consultancy firms (Micheli et al., 2018). The workshop highlighted a widespread lack of knowledge and practical understanding of alternative models to the ‘data extraction’ approach of big online platforms (Zuboff, 2015), the need to find ways to use data collected by private companies for the public interest, and the urgency to consider data subjects as key stakeholders for the governance of data (Micheli et al., 2018). The outcomes from the workshop were therefore critical in informing our study.
As a preliminary research strategy, we retrieved articles using Google Scholars and the Web of Science Core Collection, but we then decided to proceed without being constrained by keywords since the results were not pertinent 1 to our research objectives and our conceptualisation of data governance. Consequently, we adopted a flexible search strategy and used a snowballing approach including progressively new sources according to their relevance to the theme of interest. The initial sources considered for this research were identified for the preparation of the workshop and from the inputs provided by the workshop participants (Micheli et al., 2018). Such resources addressed the power unbalances of the current datafied society and advocated a democratic and equitable digital transformation, with particular emphasis on socially beneficial uses of data held by the public sector and citizens empowerment through data (e.g. Couldry and Powell, 2014; Andrejevic, 2014; Kennedy and Moss, 2015; Symons and Bass, 2017; Morozov and Bria, 2018; Villani, 2018; Winter and Davidson, 2019; Shkabatur, 2019; Ilves and Osimo, 2019; Carballa Smichowski, 2019). A subsequent step was to review related work, which addressed similar issues or was directly linked to the sources examined. Simultaneously, we kept track of new publications and on-going projects or initiatives. The review strategy proceeded iteratively, until the typology of the models was consolidated.
We considered the above strategy appropriate for two reasons: the understanding of data governance developed for this contribution diverges from interpretations prevailing in scholarly literatures (coming especially from management, information systems, and law), thus it needs a broader research procedure. Furthermore, the object of the study is a rapidly evolving field for which an established shared vocabulary is lacking. The various labels that are being proposed in current data policy discourses tend to be used equivocally to refer to different concepts (technical solutions, legal frameworks, economic partnerships), with their meaning shifting according to the context. Therefore, we did not bind the research of resources to predefined labels. Although the method is fairly speculative compared to systematic literature reviews, it was nevertheless suitable for the purpose of synthetising, and critically inquiring, a moving target: the emerging models for the governance of (personal big) data.
The review covered documents, publications, news and websites in English that addressed emerging practices for the governance of data with a focus on the European context. On the whole, it included 72 academic articles, 16 book chapters, 63 reports and policy documents, and 22 websites of projects/initiatives. The resources were collected in the time span from October 2018 to July 2019, with nine documents added during the review process. Most of these are recent, as 74% have been published from 2017 onwards. Scholarship on data from a ‘Global South’ perspective (Arora, 2019; Milan and Treré, 2019) suggests that different geographical, political, social, organisational, and jurisdictional contexts also affect roles and power in the (data) governance discourse. This contribution however, primarily takes a European standpoint in investigating data governance models. In doing so, we acknowledge that we may have missed insights deriving from other geographical arenas and that our models need to be read as relative to the European context.
Analytical dimensions
To guide our analysis and description of the emerging models of data governance we used the following analytical dimensions, drawing in particular from Abraham et al. (2019) and Winter and Davidson (2019) (see Table 1). These dimensions relate with STS and CDS by bringing to the fore the interests and goals of the main stakeholders involved in data governance, and – helping us to make visible the power relations and the different forms of agency in each model.
Analytical dimensions.
Summary of data governance models.
Stakeholders
Stakeholders are all actors, such as individuals, organisations and groups, who are affected by, or have an effect on, the way data is governed and the value that is created from it.. Stakeholders differ widely in terms of possibilities to access, control and process data, as well as knowledge about how data is collected and treated. They also hold different values and interests about data, and norms regarding its use (Winter and Davidson, 2019). They include ‘data subjects, data controllers (and processors), and third-party data users’ (Ho and Chuangt, 2019: 203). Stakeholders encompass private sector, public sector, academia, scientific and civic organisations, activists, social entrepreneurs and citizens (Calzada, 2017).
Governance goals
Governance goals are value-based objectives that different stakeholders have established for governing data (Winter and Davidson, 2019). These goals are the meanings data represents for the interested actors. While some goals may be broadly acknowledged or even shared within and across use contexts, others might be opaque or disputed between different actors or contexts. A straightforward goal for companies is to maximise financial returns through data sharing and aggregation (Srnicek, 2017). Policy documents, instead, might cite public interest as one of the key goals to pursue with data sharing agreements. Another goal for data governance could be increasing data subjects’ control of their data or giving voice to disadvantaged groups (Beraldo and Milan, 2019; Winter and Davidson, 2019).
Value from the data
It refers to the kind of value that is created from data through aggregation, analytics, and business intelligence, by the various stakeholders who may reap different benefits from these processes (Winter and Davidson, 2019). The value that stakeholders gain varies, from economic revenues, to public good and citizens’ self-determination. This dimension assesses to what extent a model foresees that data is used in ‘socially progressive ways’ (Kitchin and Laurialt, 2014). Is any form of public value created or only private value for companies and/or the individual users? Through this dimension we enquire whether the value and knowledge produced through data aggregation and analysis is redistributed between actors and across society (Mulgan and Straub, 2019).
Governance mechanisms
With governance mechanisms we refer to the strategies and instruments adopted by different agents to achieve their goals and direct change in a socio-technical system (Borrás and Edler, 2014). This dimension comprises the elements of a data assemblage (Kitchin, 2014) that frame how data is controlled, what value is created, and who benefits from it. It includes: system of thoughts, policies, regulations, committees, contracts, terms of service, standards, algorithms, interfaces and other socio-technical systems that form the governance mechanisms of today’s data infrastructures, and that some actors can exploit better than others (Kitchin, 2014; Abraham et al., 2019; Winter and Davidson, 2019). Informed by STS, the notion of governance mechanisms include also the broader ethical, political and economic principles embedded into the data infrastructures and represent the complex socio-technical ‘assemblage’ in which data governance takes place (Bowker et al., 2010; Slota and Bowker, 2017).
Reciprocity
Reciprocity refers to the power relations between stakeholders in accessing, controlling and using data. It highlights the difference between unilateral approaches, such as those in which big tech corporations hold most of the decision-making power, to mutual data governance models in which more stakeholders take part in the governance of data. This dimension links with the notion of data politics that understands data as ‘generative of new forms of power relations’ (Ruppert et al., 2017: 2).
Emerging data governance models
This section describes the data governance models identified following the five dimensions described above. These models should be understood as ideal types in the Weberian sense. They are analytical constructs that emphasise certain traits in order to synthetise phenomena that differ for the degree of affiliation to those traits (Kvist, 2007). They are not intended as an exhaustive description of the state of the art, but as a contribution in synthetising emerging data governance models. The analysis includes models that differ, to varying degrees, from the current dominant one. Therefore, we do
Data sharing pools
Different actors join a DSP to ‘analyse each other’s data, and help fill knowledge gaps while minimizing duplicative efforts’ (Shkabatur, 2019: 30). By creating these partnerships, they ease the economic need for exclusive rights and obtain limited co-ownership stakes in the resulting data pool. Data is treated and exchanged as a market commodity with the aim of producing data-driven innovation, new services, and economic benefits for all the parties involved (Carballa Smichowski, 2019; Kawalek and Bayat, 2017). DSPs are described as horizontal joint initiatives among data holders to aggregate data from different sources to create
Governance mechanisms for DSPs include technical architectures, such as data sharing platforms and Application Programming Interfaces (APIs), which facilitate a centralised data exchange within business ecosystems. However, a key mechanism is the contract, a legal and policy framework, that defines the modalities for data sharing, how data can be handled, and for which purposes. These contracts could be ‘repeatable frameworks of terms and mechanisms to facilitate the sharing of data’ between entities, which are especially useful for organisations that do not have the know-how and legal support to leverage data (Hall and Pesenti, 2017; Hardingens and Wells, 2018). Although these frameworks have been referred to as data trusts, there is not a full consensus whether they could be assimilated to actual legal trust structures or a ‘marketing tool’ facilitating the responsible sharing of data (Delacroix and Lawrence, 2019: 242).
An example of DSP is the Connected Citizens Program, a collaboration between Waze, a community-based traffic and transport app, E.S.R.I., a global commercial software company, and municipal governments (Shkabatur, 2019). As part of the pool, municipal governments share real-time construction and road closure data through the E.S.R.I. platform, and in exchange Waze 2 shares its community-collected real-time traffic data. 3 The assumption of this kind of contracts is that all parties benefit since the DSP enables them to obtain easily data that would otherwise be inaccessible. There is reciprocity between partner organisations, but only data holders are involved, as data subjects tend to be excluded from the relation and are at best depicted as passively benefiting from it. Although use cases of DSPs do exist, examples in practice are still few (Mattioli, 2017). A practical limitation consists in the transaction costs, such as data preparation, ensuring privacy and interoperability challenges, which put small businesses and under-funded entities at a disadvantage (GovLab, 2018). A further limitation is that often there is one dominant partner (Carballa Smichowski, 2019). Therefore, although involving potentially many actors beyond big tech platforms, the relations are not necessary as horizontals (and sustainable) as claimed.
Data cooperatives
DCs distribute data access/rights among actors like DSPs, but differently from those, provide higher involvement of data subjects and are guided by different goals. DCs enable a de-centralised data governance approach in which data subjects ‘voluntarily pool their data together, to create a common pool for mutual benefits’ (Ho and Chuangt, 2019: 204). Participants of DCs share data while retaining control over it, having a say on how it is managed and put to value, and not submitting to the extractive logic of digital capitalism (Borkin, 2019; Ho and Chuangt, 2019). Therefore, data subjects are key stakeholders within DCs. By establishing a relationship of trust with the cooperative that manages data on their behalf, they preserve democratic control over their data and might demand an equitable share in the benefits produced (Borkin, 2019; Delacroix and Lawrence, 2019). This model is characterised by high reciprocity since ‘all parties are stakeholders and are equally affected and bound by the governing rules they discuss, negotiate and then agree upon’ (Ho and Chuangt, 2019: 203).
The underlying principles of DCs stem from the co-operative movement, established in UK and France in the 19th century, and from the more recent platform cooperativism (Scholz, 2016). The cooperative movement promotes fairer conditions of value production, in a non-monopolistic and transparent setting, alternative to the dominant capitalist model (Pazaitis et al., 2017). Analogously, DCs address the power unbalances of the current data economy and are an explicit attempt to rebalance the relationship between data subjects, data platforms and third-party data users. Enabling mechanisms for DCs are ‘bottom-up data trusts’ (Delacroix and Lawrence, 2019): agreements and contracts that provide the means for citizens to be informed, express their preferences and concretely decide how to share their data and for which purpose.
DCs need to generate sufficient income for their maintenance and development, but are not based on profit-maximising objectives. They often aim to create public value across society, including promoting social change and addressing societal issues, for instance by fostering equality, digital rights, environmental causes or medical research (Carballa Smichowski, 2019; Sandoval, 2020). Many DCs are ‘commons-based’ and open, blurring the distinction between the notion of data commons and DCs (‘open cooperativism’) as data is shared with an open license and made public (Carballa Smichowski, 2019; Ho and Chuangt, 2019; Pazaitis et al., 2017; Sandoval, 2020).
Examples of DCs operating with health data are MIDATA.coop and Salus Coop that let citizens donate their personal health information for scientific research. Although there is a growing interest in DCs for ethical approaches to data sharing and use (e.g. Ilves and Osimo, 2019), at the moment there are only few small examples, since this model struggles to compete and scale up against big tech that are advantaged by their monopolistic position, their critical mass of users, and greater financial resources (Sandoval, 2020).
Public data trusts
PDTs refer to a model of data governance in which a public actor accesses, aggregates and uses data about its citizens, including data held by commercial entities, with which it establishes a relationship of trust (Delacroix and Lawrence, 2019; Hall and Pesenti, 2017; Mulgan and Straub, 2019). Several stakeholders might be involved in this model, including city administrators, managers of public institutions, platform companies, trusted data intermediaries, research institutions, start-ups, and SMEs. Public administrations may also invite third-parties to access their data sources and develop data-driven services and/or to offer guidance on data sharing (Hall and Pesenti, 2017; Morozov and Bria, 2018). A key goal of PDTs is to integrate data from multiple sources to inform policy-making, promote innovation and address societal challenges, while adopting a responsible approach to the use of personal data (Bass et al., 2018; Collinge, 2016; Kawalek and Bayat, 2017; Morozov and Bria, 2018; Van Zoonen, 2016).
In PDTs, public actors assume the role of trustees that guarantee citizens’ data is handled ethically, privately and securely. Thus they imply the establishment of a relationship of trust between citizens and public bodies: citizens must be reassured that public actors are capable to keep their personal information safe and secure and that they will use data to improve their lives (Collinge, 2018). To earn such level of trust from citizens, public bodies might engage in citizens’ consultations and living labs, or require the intervention of external independent organisations that act as trusted intermediaries (Collinge, 2018; EC, 2020c; Mulgan and Straub, 2019). These trusted intermediaries are new institutions that are allegedly held to account for securely managing data, preserving citizens’ privacy, and maximising the public value of data (Mulgan and Straub, 2019). These entities will be independent and unrelated to for-profit firms and big tech corporations, and guarantee that data is managed without abuses through strong accountability and standards. 4 Therefore, even if citizens are mostly seen as recipients who benefit from services and policies developed through PDTs, they might be explicitly involved in this model through ‘trust building’ governance mechanisms such as living labs, public consultations and civic society initiatives.
Examples of PDTs are pilot projects by the Open Data Institute, a non-profit private company, in conjunction with the Mayor of London and the Royal Borough of Greenwich. These projects use real-time data to improve public service delivery, such as in council-owned social houses and public parking (Open Data Institute, 2019). The city of Barcelona has also been experimenting with PDTs including ‘clauses within procurement contracts specifying that a service provider must make any data that may be of public value available to the city council’ (Bass et al., 2018: 28).
An underlying assumption of PDTs is that all data with a public interest component (even if collected by commercial entities) is part of a nation infrastructure (National Infrastructure Commission, 2017) therefore the information it affords should be ‘socialised’ to produce value for citizens and society as a whole (Cardullo, 2019; De Lange, 2019; Morozov and Bria, 2018). Currently, the involvement of private companies in such forms of data sharing takes place only on a voluntary basis while government-owned and utility companies (such as energy and transport) have more motives to collaborate with public bodies (Bass et al., 2018; Open Data Institute, 2019).
Whilst at present PDTs are largely limited to small pilot projects, a key enabler would be a legal framework mandating private companies to grant access to data of public interest to public actors under conditions specified in the law (Shkabatur, 2019). This was considered by the EC (2020c), which then appointed a High-Level Expert Group on Business-to-Government data sharing. The issue has also been discussed at national level in Europe. For instance, French Member of Parliament Belot proposed creating the legal concept of ‘territorial interest data’ to give local governments the power to demand access to data (Carballa Smichowski, 2019).
Personal data sovereignty
The PDS model is characterised by data subjects having greater control on their data, both in terms of privacy management and data portability compared to the current dominant model. The label comes from the broader principle of technological sovereignty, which concerns subjects, public administrations, or governments regaining control of technology, digital content and infrastructures – thus reducing the influence of IT commercial enterprises and of foreign States in which these companies reside (Couture and Toupin, 2018; Villani, 2018).
This model promotes a different and fairer data economy, echoing critical accounts of the dominant model of surveillance capitalism (Lehtiniemi, 2017). Data subjects are envisioned as key stakeholders together with digital service providers – which deliver the means for subjects to control, use and share their data – and re-users with whom data subjects decide to share their data (Ilves and Osimo, 2019). This governance model pursues two goals: it increases individuals’ self-determination, granting more opportunities to access, share and use personal data, and engendering a more balanced relationship between users and digital platforms; and it is expected to foster a socially beneficial usage of data through the development of new data-driven services centred on user needs (Ilves and Osimo, 2019; Lehtiniemi, 2017).
Among the main mechanisms enabling PDS are personal data spaces, like Digi.me, Citizen-me or Meeco, which consist of ‘intermediary services’ allowing users to store their personal data, collecting data disseminated in different platforms, and control their sharing with third parties (Lehtiniemi, 2017). These services, which appeared in early 2000s, have been strengthened by Art. 20 of the GDPR (data portability). They are expected to remove obstacles for individuals wanting to exchange their data for research or other purposes, acting as trusted intermediaries and improving citizens’ ability to make choices about their data (Delacroix and Lawrence, 2019).
PDS has been especially encouraged within the context of MyData, an international movement and a community of activists, non-profit organisations, think-tanks as well as commercial actors, start-ups and SMEs. An analysis of this movement (Lehtiniemi and Haapoja, 2020) highlights its inherent tensions between activists’ interests for social change and the economic interests of commercial firms. The same tension stands at the core of the PDS model and its positioning towards value generation. PDSs are expected to produce value in the form of data subjects’ self-determination, knowledge, and public interest, but at the same time foster economic growth through an ecosystem of new commercial services supporting them.
A limit of this model lies in its dependence on personal data spaces as these are currently adopted by only a niche of users and often fail to scale beyond pilots (Ilves and Osimo, 2019). As, as business entities, they may have interested in how to ‘nudge’ users and a few personal data spaces might gain more power in the market (Lehtiniemi, 2017). Furthermore, citizens have limited awareness about platforms’ use of personal data for profit and the need for alternative models of value production, and the majority would not be capable, nor have the time to, take advantage of the opportunities offered by these intermediary services (e.g. Andrejevic, 2014). Envisioning citizens as ‘market agents’ (Lehtiniemi and Haapoja, 2020) free to choose from an ecosystem of personal data spaces might not fully address the asymmetries of power of the current data landscape.
Discussion
In this article we contribute to the literature and the policy debate on data governance using a socio-technical perspective to describe four emerging models of data governance: DSPs, DCs, PDTs and PDS. The models are abstract conceptualisations (Kvist, 2007) that do not necessarily represent discrete implementations of data governance. A single initiative could embrace more than one of these conceptual models simultaneously, or be inspired by one but not embrace it fully, as reality is messier than abstract constructs. Nonetheless, they provide a foundation for discussion on alternative approaches or ‘desirable futures’ for accessing and sharing data in the age of datafication (Jasanoff, 2015). They could be read as inventive practices that problematise current arrangements and reassemble them in accordance to the interests of the actors involved. Table 2 presents a summary of the main features of the models on the five analytical dimensions that guided our analysis.
All models highlight a concern for redressing the structural power imbalances between corporate platforms and other actors, such as data subjects, public bodies, third parties, civil society and researchers. There are nonetheless substantial differences regarding which stakeholders exert influence over data, and what value is pursued through data use. Drawing from the notions of data infrastructure and data politics, we highlighted the plurality of actors that affects or is affected by the way data is made accessible and used in each model. The actors’ roles and their power to control data are situated and contingent: they relate to the broader ethical, legal, political and economic principles that are embedded in the data infrastructure and the various governance mechanisms that enable each model. The governance goals of more powerful actors both support and are supported by the ‘imaginaries’ that prevail in each model, which in turn influence value generation and redistribution.
In DSPs, one of the classic rhetoric of Big Data is embraced: data creates more value if aggregated. In that spirit, two or more data holders (both private and public) join forces and establish data sharing agreements. They analyse each other’s data filling knowledge gaps and fostering data-driven innovation. On the surface this model promotes reciprocity between, potentially many, data holders, as it is based on horizontal relationships. Yet, it also fosters power asymmetries. Data holders with more resources or that possess more valuable datasets have greater power to set the terms on how data is accessed and used. Furthermore, data subjects (and citizens in general) do not have a voice in this model; they are not included in the relation and are at best depicted as recipients of the innovations developed through it.
PDS on the contrary place data subjects at the centre of an ecosystem of new services that provide them the means to access, control, share and analyse their data. Based on the principle of sovereignty, this model emphasises individual control over data and self-determination, and it is in strike contrast to surveillance capitalism (Lehtiniemi, 2017). A movement of data activists is promoting this progressive goal, at the same time commercial actors are interested in it as a means to support an ecosystem of new services. In PDS different kinds of actors, with different interests, converge for the promotion of a ‘fairer data economy’ (Ilves and Osimo, 2019). This tension leads to some inconsistencies. A fully private ecosystem of intermediary services, even if allowing users to control their data, would leave the incumbency of addressing the power asymmetries of the current data landscape to the market (Lehtiniemi and Haapoja, 2020). From studies on citizens’ perspectives on their digital data, it also appears that the majority would not have the skills and interest to take advantage of the opportunities offered by these intermediary services (e.g. Andrejevic, 2014).
A wider range of actors is involved in PDTs, with public bodies taking the lead. The underlying principle of PDTs is that all data with a public interest component is part of a nation infrastructure, and therefore the information it affords should be ‘socialised’ to produce value for citizens and society as a whole. Actors from public sector, non-profit, business and academia take part in PDTs. Data subjects, however, are not just recipients of the services developed through data. To be effective, PDTs imply the establishment of a relationship of trust with citizens who must be reassured their personal information is protected and that will be used for the public interest. To achieve this objective, public bodies need to build trust listening to requests from civic society. As recent failed initiatives (Sidewalk Toronto), as well as the debate for COVID-19 contact tracing apps (Ada Lovelace Institute, 2020), show this is something that public bodies are (still) learning to do. Another challenge for PDTs is to establish data sharing agreements with private companies, which might have data of public interest but not be willing to share it unless regulations mandate it.
DCs are a grassroots-driven decentralised data governance model in which citizens voluntarily pool their data together establishing a relationship of trust with the cooperative that manages data on their behalf. Data subjects preserve democratic control over data and have an equitable share in the benefits produced, which are often aimed at the public interest (such as medical research). Drawing from the principles of the cooperative movement and platform cooperatives, this model is in stark opposition to platform capitalism and aims to be a fairer, transparent and non-monopolistic alternative. Big tech platform might be completely excluded from data governance in a DC – even if informally present as the ‘antagonistic actor’ – or be included only as a data provider from which users take their data. Although DC raise a lot of interest in the policy debate as an ethical approach to data sharing, they struggle to scale up and reach a critical mass of users (Sandoval, 2020).
With respect to the kind of value pursued, we see that DPS mainly focus on producing economic value, while other forms of value gradually ‘chime in’ in the remaining models, such as social change, public interest, fairness, and data subjects’ self-determination. It should also be noted that for the most part, these models could be found in niche initiatives or pilot projects, and there is still limited research concerning the value they generate and their sustainability over time (Borkin, 2019; Verhulst, 2019). Value production and redistribution, thus, can be assessed more at the level of the imaginary, than from evaluations of tangible outcomes. Nevertheless, adopting a normative perspective informed by CDS, we could ask to what extent these data governance models foster a redistribution of value.
In DSPs, data is a ‘market commodity’ and economic value is redistributed horizontally among data holders who join the partnership. PDS put forward important innovations for data subjects’ exerting digital rights, but do not question the datafication and commodification mechanisms of the platform society (Van Dijck et al., 2018). They are oriented towards the creation of value for the individual (self-determination) and new commercial actors (data services), with public interest as a by-product of these. The remaining models expressively pursue the public interest: DCs allow data subjects to collect and aggregate their data for the public interest, while PDTs act on behalf of citizens, aggregating and analysing different data sources to inform policy-making and address societal challenges. If in DCs a cooperative has to be trusted, in PDTs is a public body. Yet, in the latter, it might also be that a trusted external independent organisation acts as a data intermediary between citizens and a public body; this demonstrates how the abstract models can easily overlap in practice.
PDTs represent a form of public-driven governance that could significantly redistribute the value of data and increase fairness, but requires the support of a new legal framework mandating access to data for public interest. Similarly, DCs are a fairer alternative of surveillance capitalism, but struggle to find financial sustainability and to reach a critical mass of users. Therefore we did not find a single model to be ‘recommended’ or ‘promoted’ for a fairer data landscape. Instead, a combination of all these models should be envisioned for a ‘desirable future’ (Jasanoff, 2015). In particular, to oppose the privatisation of internet governance (DeNardis, 2019), and the resulting dominant model of data governance stirred by big tech platforms, it is advisable to look at the inventive data practices of civic society and public bodies as it is from these actors that we have found more interest in the redistribution of value generated through data.
An important dimension to discuss is the extent to which these models democratise data governance. To answer this question we turn our attention to three models that involve data subjects. In all cases, data subjects can choose a trusted intermediary for their data, being it a commercial service from an ecosystem of personal data spaces, a cooperative that allow to keep democratic control over data and share responsibilities (DCs), or a public body that is entrusted by citizens to use (their) data ethically and for the public interest (PDTs). Involving subjects in the governance of data is a key strategy to address, and avoid, many of the possible negative consequences of data governance, such as dataveillance, function creep, technocratic governance, etc. (Kitchin, 2014). The more powerful data subjects are in a data governance model, the greater accountability is required to the data holders, which in turn limits risks and data misuses. At the opposite end, DSPs are only accessible to data holders. How does that model guarantee that needs and interests of data subjects (citizens at large and marginalised groups) are accounted for? To address this, and for good data governance, it may be advisable to combine DSPs with the others models that offer more guarantees, at least in principle, in terms of accountability.
The findings of this article highlight that the same ‘buzzword’ can be associated to different rationalities of data governance, since the notion of data intermediaries and data trusts is included somehow in all models. This underlines how important it is to think critically about data infrastructures as socio-technical products, moving beyond mere instrumental and technical aspects. Data trusts might be powerful means to reduce the power unbalances of the current data economy if adopted within DCs, while they may foster very different aims in DSPs. Indeed, in the first case these would be ‘bottom-up data trusts’ that act in behalf of citizens’ interests and preferences (Delacroix and Lawrence, 2019), while the latter would be repeatable frameworks of terms and mechanism to facilitate the sharing of data (Hall and Pesenti, 2017). Conversely, data trusts could also be a service offered by the public sector in a top-down manner to earn trust and foster the public interest, as in PDTs.
A final consideration concerns the intertwined relationship between the data practices we have examined and the regulatory frameworks in which they exist. These data governance models can only develop further if they are sustained by appropriate legal frameworks, such as the GDPR for personal data or a new legal act to mandate access to commercial data of public interest. With the recent developments in data policy (EC, 2020a), the EC is strengthening its role as transnational regulator of technology with repercussions on a global scale. In doing so, it will be crucial to engage with the wider set of stakeholders identified in our research including local administrations and many actors from civil society who have an important role in shaping the emerging forms of data governance that address the asymmetries of the current data landscape.
Limitations
The lack of established terms in the field, as well as the ever-changing nature of the theme, has complicated the selection of documents. The procedure adopted for retrieving resources, detailed in the methodology section, is not entirely systematic, hence the findings have to be contextualised in that approach, including the time span in which the research has been conducted and the European focus. Furthermore, the described models do not claim to represent the full spectrum of emerging models currently developed for the governance of data. Being ideal types, they have a heuristic value as tools to be adopted for further studies.
Another limitation relates to our multi-dimensional framework that derives from our research questions. In focusing exclusively on the dimensions of stakeholders, governance goals, value from the data, governance mechanisms, and reciprocity, we acknowledge that many more lenses could have been adopted, such as, for example, the perspectives of trust and distrust, privacy and data protection, authority and authorities, law and regulations. We selected the dimensions we thought were the most appropriate to examine emerging data practices from a social science perspective, yet we acknowledge other important elements might have been included.
A final limitation is that we do not include primary data collected to investigate what lays behind the
Conclusions
Notwithstanding its limitations, this study shows that many actors – from public sector, academia, businesses, civic society, as well as activists and social entrepreneurs – are seeking alternatives to the dominant data governance model. We discussed four of the models that emerge from the practices of these actors. The social practices for data access, sharing, control and use, and the derived models come at a crucial time as the discussion on data governance and data sovereignty is building a new momentum in Europe with the emergence of AI as a strategic area of policy for the future of society. Therefore, we believe it is (and will be) important to examine data governance arrangements in a rapid and timely manner (Mann et al., 2019), and understand how to incentivise uses of data at the service of the public good. To do so, it might be helpful to adopt also a social science perspective on data governance that allows seeing ‘through the infrastructure’ to ask: what principles guide data sharing and use? What is done with data and who can access and participate in its governance? What value is produced and how it is redistributed? Ultimately, this article wishes to encourage more ‘normative conversations’ on socio-technical imaginaries and ‘desirable futures’: the kind of society we want to live in and how we can shape the digital transformation accordingly (Jasanoff, 2015; Kitchin, 2014).
Supplemental Material
sj-pdf-1-bds-10.1177_2053951720948087 - Supplemental material for Emerging models of data governance in the age of datafication
Supplemental material, sj-pdf-1-bds-10.1177_2053951720948087 for Emerging models of data governance in the age of datafication by Marina Micheli, Marisa Ponti, Max Craglia and Anna Berti Suman in Big Data & Society
Footnotes
Declaration of conflicting interests
Funding
Acknowledgements
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
