Abstract
Introduction
The concept of open government data (OGD) has drawn much attention. It refers to government data that can be (re)used by anyone without barriers such as fees, legal objections, or technical difficulties.1–4 Many governments around the globe have taken action to open up government data.5,6 Actions comprise for instance programs to stimulate the opening of datasets by government agencies, the development of web portals making government datasets findable and accessible, and the organization of competitive events to encourage creative use of the data by citizens, entrepreneurs, and academics. These initiatives to make government data open are on the one hand motivated by aspirations to provide more transparency on government functioning and to enable citizen participation in policy making. On the other hand, governments are opening datasets from an economic point of view as the availability of open data fuels innovations. It permits companies and citizens to add value to these data which in turn can lead to the development of (information-based) products and services.3,5–7 Within academic literature, the concept of open (government) data has been widely described ranging from benefits and risks concerning the opening up of data to the mapping of stakeholders and interests.8–10 At the same time, frameworks regarding governance or ecosystems have been elaborated to guideline open data policies.11–13 In most research, the topic of OGD is broadly approached not differentiating between types of government data. Some types of data (e.g. geographical data) are already widely available on OGD portals, while other types are still difficult to find, to access and to (re)use. Not every type of data can be “opened up” easily. As reflected in the citation of McGrail et al.
14
: “There is a big difference between making available a century of weather pattern data and last year's detailed records of acute inpatient hospital use.” This article focuses on health data which are managed by governmental actors. Several open data manifestos mention the topic health as a promising subject for open data policies. Although proponents belief that the (re)use of health data leads to societal and economic advantages, sensitivities related to health may hinder free and public access. Interested in knowing the particularities of these types of data, the article explores how existing open data ecosystems can be applied to health data. The metaphor of ecosystems will be used to analyze the socio-technical environment in which open health data are created and in which values from these data are derived. An ecosystem contains the interactions between people such as data producers and data users, infrastructure or processes related to data management, and formal or informal institutions (e.g. motivation, policy, legislation).4,15 Such interactions influence the publication or not of datasets and the reaping of expected benefits.
Several generic OGD ecosystems have already been developed. However, it is recognized that according to the policy domain
In order to fulfill the mapping and the testing of an open health data ecosystem a systemic plan is applied. As mentioned in the introduction, a wide range of general OGD theoretical frameworks have already been elaborated. Keeping these general frameworks in mind, this paper seeks to distinguish itself from the generic approach to government data by focusing on one particular type of data, moreover health data. Before the actual mapping of an open government health data ecosystem can start, the meaning of health data requires clarification. Therefore, this research plan begins with a definition of health data to delineate the type of data involved and to situate its meaning within the concept of OGD. Besides a theoretical exercise of defining, the second step of the plan consists of an observation of some existing published open government health data sets and publication tools. This observation contributes to awareness on current practices that can be integrated in the mapping of the ecosystem. As OGD is often considered as a positive trend towards transparency and value creation, the third step explores the expected benefits related to opening up health data. The authors assume that comprehension regarding the expected benefits will help to understand some interests at stake related to health data. Although, there may be multiple benefits the particularities of health data may have an impact on whether or not certain data sets are published. Therefore, a fourth step consists of the decomposition of the characteristics of health data. By doing so, it examines what might distinguish this type of data from other government data. While the first four steps focus on health data knowledge gathering, the fifth step provides an overview of the elements of an open government ecosystem. This fifth step will enable the mapping of the accumulated knowledge about health data to its own customized ecosystem.
In sum, some preliminary questions are addressed to facilitate the creation of the open health data ecosystem:
RQ1: How to define health data in the context of OGD? RQ2a: Which kind of government datasets are involved? RQ2b: Which kind of publication tools are used? RQ3: What are the expected benefits of opening up health data? RQ4: How to characterize health data? RQ5: Which are the key aspects of an open data ecosystem? analysis of academic and grey literature in the field of e-government, data governance, health law & health ethics, medical informatics, etc. observations of government practices by means of document analysis consisting of government information material Recent reports from supranational organizations (e.g. Organisation for Economic Co-operation and Development, World Health Organisation, G8, European Union) or private think thanks that describe evolutions towards more innovative data use. Such evolutions include inter alia the presence and exploitation of (more) open data and big data. References in these reports also led to the inclusion of some relevant papers on current promises and challenges concerning health data management as the health(care) sector is considered to profit from data driven innovation. Literature related to societal debate on the potential impact of privacy or other (restrictive) regulations on the possibility whether or not to perform health(care) research. Such publications were chosen as they included arguments for data protection related to the personal sphere as well as arguments for data reuse to gain relevant knowledge for health(care) improvements. Both types of arguments were valuable to better understand the peculiarities of health data and aspects with regard to making data open or keeping it closed.
These questions will be answered via
Regarding the literature analysis, the authors started with some key literature on the general topic of OGD that is frequently cited in academic journals or books linked to e-government, information management, and/or public policy. Several of these articles have bundled insights on the concept, expectations, constraints, practices, and participants of OGD. Next, a search of the term “open health data” in the LIMO database of the authors’ research institutions was executed. Considering the limited search results, the selection was added with:
Supported by the answers to the preliminary questions, the open health ecosystem will be mapped and afterward tested via a case study concerning the introduction of an open health data policy in Belgium. In other words, the preliminary questions help to answer the overarching research question: “How to map an open health data ecosystem.” A summary of the different research steps and methods included are presented in Figure 1.

Study design.
Outline of the article
Following the introduction, the second part of the article starts with defining health data. After analyzing existing definitions from academic and gray literature a pragmatic working definition will be proposed. The third part provides a view on the state of the art regarding open health data. It explores which kind of health related datasets are already opened up within existing government practices and via which tools. Fourthly, the article offers an outline of the expected benefits. How can actors involved profit from access to these data? In the fifth part, the special nature of health data is investigated in order to express its specific characteristics and understand potential differences with other government data. Sixthly, the concept of open data ecosystems is introduced by providing information on the use of the ecosystem metaphor concerning open data policies and its key components. After the theoretical analyses aimed at grasping the foundations of health data & open data ecosystems in the former parts, the seventh part of the article is dedicated to the design of an open health data ecosystem. Components of a general OGD ecosystem are applied to the specific context of health data. Next, the designed open heath data ecosystem will be tried out via a use case concerning the introduction of a policy intended to open up health datasets. Finally, the article ends with conclusions and ideas for future research.
Health data: what?
Since this articles focuses on data related to health the first step of our research consists of giving meaning to the term health data. While scrutinizing existing definitions of health data, one can notice a term consisting of several components (see Figure 2). A first component regards the level as health data can either refer to the health of an individual person or to the health of a population. 16 On an individual level health data are extremely personal as they are related to one single person. 17 By contrast, data on population level are aggregated data providing bundled information on a (inter)national, regional or local group of people. A second definition component, includes the type of health normally not differentiating between physical and mental health.17–19 This component stipulates the broad and modern interpretation concerning the concept of health. As a consequence, health data ranges from information on minor body ailments to information on emotional well-being. 20 Thirdly, definitions of health data incorporate a component related to a health biography or a health status. This means that health data comprise topics providing a historic and/or current view on the health of an individual or a population.17–19 General topics include for instance the incidence of diseases, reproduction, aspects regarding quality of life, and causes of death. 16 In other words, health data encompass details concerning the presence or absence of health related problems for a person or population. Finally, a fourth definition component clarifies that information regarding the health biography or health status of an individual or a population can be either directly or indirectly derived. This component touches upon the sources of health data. Information on the health of an individual or a population might for instance be directly obtained via health questionnaires filled in by citizens or by examining the content of patient records maintained by health care providers. These patient records harness detailed information on patient demographics, clinical risk factors, diagnoses, immunizations, medications, and medical devices, medical test results, and care plans. 21 However, the information can also be indirectly induced when analyzing for instance financial data related to public or private health insurance and administrative data from the health care system such as hospital check-ins.17–19,22 Medical cost payments and registrations regarding the use of health care services can unintentionally reveal if a person or a population suffers from a certain health problem.

Definition components of health data.
In the context of open data, the topic health can also refer to health care data. Health data & health care data are closely related to each other. Health care data, a term the authors did not found frequently within their literature analysis, are defined as “that information used to provide, manage, pay and/or report on the services used across the entire health care system.” 23 Health data are required to manage health care services and consequently could be considered as a part of health care data. A government administration will rely for instance on information concerning the health status of individuals and populations to foresee a proper level of qualitative health-related services. Nevertheless, health care data might be interpreted more broadly than a patient or a population their health status as these data could also entail information on the availability of health care providers in a region and/or the adherence to quality standards, policy guidelines, and regulations within the health care system. As the theme “health” is often not defined when referred to as a promising topic for open data policies, a wide meaning is applied in this primary exploration regarding an open health ecosystem.
Hence, this article choses a pragmatic approach to deal with the topic health in the context of open data. It aims to explore the opening up of all types of datasets linked to the topics health or health care which are managed by government agencies. Therefore, within this article, health data points out to any government data concerning information on the health of a population and/or the management of a health care system. Open data are by default anonymized data resulting in the fact that the data can only be at the population level. However, it is important to remember that in the case of health data the data often originate from personal data. As will be discussed later, risks regarding the (re)identification of individuals cannot always be excluded. The working definition on open health data in Table 1 gives on the one hand meaning to the term health, on the other hand, it links this term to the characteristics of open data more specifically an unfettered accessibility.
Overview of definitions.
Government data related to health: involved datasets and publication tools
In addition to the theoretical reflection on the definition of health data, the authors examine which datasets are already being published in practice and via which means. Several international initiatives which aim to stimulate the opening up of government data have considered health as a valuable subject for open data policies. For instance:
Health is one of the themes of the Open Government Partnership, a voluntary partnership between more than 70 governments and civil society representatives taking actions to improve governmental transparency, accountability, and responsiveness. The Open Government Partnership beliefs that data on the health and medical history of citizens, is a vital resource to improve health systems and patient care. They promote open health data for instance to empower citizens to have more choices and to take control of their own medical care.
24
The leaders of the G8 countries Human health & safety is one of the themes within the INSPIRE Directive, a legal framework from the European Union intended to make spatial data infrastructures more interoperable and stimulate the intergovernmental and public sharing of environmental spatial information.26,27 Although the theme originally focused on the link between human health and the environment, member states of the European Union can choose freely to accommodate health data more broadly. Theme components within the INSPIRE Implementers Roadmap include for instance data on the prevalence of diseases, data on the availability of health care/health services, and health determinant measurement data.
Following these initiatives that promote open data policies, the authors expect to encounter some practical examples. Consequently, the creation of the health data ecosystem can also take into account existing examples. Secondarily, these examples provide an opportunity to assess the proposed working definition for open health data. Observing open data in practice, one can notice that already several governments publish few health datasets on their national, regional, or urban open data portals.19,28 In these cases, health is one of many topics among for instance culture, tourism, or education on which open data can be found. As shown in Table 1, the health datasets on general open data platforms as the ones from New Zealand or the Canadian city of Surry range from statistics on smokers to the use of mental health services. Besides the inclusion of the health topic on general platforms, some limited examples exist of portals exclusively dedicated to the topic of health.
29
These health data platforms are intended to serve as a central location where existing health datasets are findable and depending on the dataset potentially accessible. Such portals like the ones in the United States, the State of New York, and Scotland are managed by governmental agencies responsible for health policies in their country or region (see Table 2). By means of these portals, the governments involved try to stimulate the reuse of health data by citizens, entrepreneurs, and researchers in order to enhance health outcomes for everybody.
Examples of portals with accessible and/or findable health datasets.
Overviewing examples of datasets in Table 2, a simplified categorization of available open health data can be deduced. In current practice, open health data consists of datasets related to:
The availability of health care practitioners (e.g. doctors, pharmacies, dentists) & health care services within a geographic area The quality or performance of health care: for instance measures regarding patient satisfaction, health care-related infections, and hospital readmissions Epidemiology: the prevalence of health-related problems such as cancer, diabetes, suicide, or influenza Health determinants, providing information on how factors as sex, ethnicity, health insurance status, or environment might influence the health status of citizens
30
Reimbursement schemes, health costs, and management of resources: prescription data, number of specific surgeries, medical products…
31
General health and population statistics: e.g. statistics about births, smokers, vaccination rates…
These derived categories may overlap with each other. For instance, depending on government goals certain epidemiology data, data on health determinants, or population statistics are also useful to monitor health care performance. This can be the case when governments strive to diminish the prevalence of certain diseases, the health improvement of vulnerable population groups, or the spread of antitobacco measures. The observed categories are in line with our working definition of open health data as they comprise both information on the health status of a population as management aspects of a health care system.
Expected benefits of open health data
As mentioned earlier, health is a theme that is strongly promoted to include in open data policies of governments. How to explain this enthusiasm of open data advocates concerning health? Gaining insights into the expected benefits will help to incorporate certain interests within the ecosystem. Within literature regarding innovative data use, the expected benefits of open health data can be summarized as the ability for governments to make better-informed decisions for policy making, the development of health(care) related products by the private sector, more knowledge acquisition on the health care system by citizens/citizen groups and the use of data for science.29,32,33
Drawing upon available data regarding aspects as evolutions within population health or the outcomes of treatments, policy makers are able to make justified decisions concerning health care like tackling the underuse or overuse of specific medical interventions. By making relevant data available for several policy levels and entities, open data policies have the power to overcome potential reluctance on intergovernmental data sharing which is widely observed within e-government literature. 34 As the competence for health (care) policies, is often divided between distinct government levels or agencies open data is a helpful tool to obtain more quickly and easily access to data.
Data is a crucial resource for industry to determine which domains are worth investing, to discover opportunities and create innovative products ranging from medicines to medical measure equipment or health apps. 35 Current barriers to data access for industry include for instance rigid legal frameworks, restrictive requirements regarding partnerships with governmental data owners or distrust caused by isolated examples of data misuse. 36 Therefore, having access to more health data via open data policies will support certain business activities. An open dataset can be used for a single application, or it can be linked with other datasets potentially generating big data. Employing big data analytics for single or combined datasets proves inter alia promising for the pharmaceutical industries to improve their stagnant R&D and customize innovations for instance in the field of personalized medicine.37,38
Transparency concerning the supply of health care services and their quality provide citizens and nongovernmental organizations on the one hand the opportunity to match their health care demands. For instance: the use of open data to answer questions as “Where to find suited care providers for his/her health problems” and “Are there any waiting times?” On the other hand, transparency makes deficiencies within the health care system visible, valuable information that can be used to claim improvement actions from responsible governments. 29
Lastly, open health data facilitates scientific health projects. The slogan “Data saves lives” is often used to stress the importance of data for medical knowledge and to promote conditions that facilitate access to health data for science. 39 Without data, the causes of several diseases such as smoking and lung cancer would have never been discovered. 31 In this context, open health data avoids certain administrative, legal and financial hurdles complicating scientific health research. Certain datasets of government agencies become suddenly accessible without facing complex and long request procedures.
The characteristics of health data
Despite the observation of limited open health data examples and the high expectations regarding benefits, several barriers and challenges exist to make (more) health datasets publicly accessible. To comprehend the tension between expected benefits and obstacles concerning opening up health data, the distinctiveness of this type of data should be investigated. Which are the characteristics of health data potentially impeding or limiting its openness?
Two first characteristic of health data is that the data are
Related to the characteristics of extreme personal and subject to the risk of misuse concerns the characteristic that
Health data are data with a
Health data are
Ecosystems and OGD
Having analyzed and outlined some foundations of health data
Within open data literature, the metaphor of an ecosystem is often used to map the flow of actors, activities, and (technological) tools constituting an open data environment. 15 The metaphor, derived from nature, stipulates how interrelationships between people, infrastructure, and institutions are crucial for either great or limited success of open data settings. Van Loenen et al. 4 define an open data ecosystem as “a cyclical, sustainable, demand-driven environment oriented around agents that are mutually interdependent in the creation and delivery of value from open data.”
Researchers have inter alia applied the ecosystem metaphor to support the design and assessment of OGD programs, 11 to compare similarities and differences of national open data government policies 12 and to detect essential factors enabling innovation with open data. 50 As illustrated in Table 3, examples of open data ecosystems find in literature consist of different elements. Considering the elements of these existing open data ecosystems, they show commonalities which will be discussed next. First of all, the ecosystems are composed around stakeholders and their interests concerning open data. Traditionally, stakeholders consist of the governmental organizations producing the data, the actors that use the data to create added value and eventually the consumers of products and services created with open data. Each actor has their own perspectives and possibilities to engage in open data, different interests have to be appeased. User needs are for instance measured or gathered via specific studies, public consultations, expert groups, or online platforms to submit ideas. 13 The second commonality of open data ecosystem models, is the presence or lack of information policies. It refers to legislation, practices, and policies influencing the opening up of government data. 51 Thirdly, data preparation activities are a shared part of open data ecosystems. Prior to publicizing any dataset for reuse, preparations are required to enable data reuse. These include inter alia assessments of the data quality, provision of metadata, and choosing appropriate data formats. 52 Fourthly, each open data ecosystem mentions the infrastructural elements. Portals to find and/or access government datasets and tools regarding data analysis or visualizations are typical examples. Lastly, there are the drivers giving impulse to become involved in open data. Global trends, culture within government administrations, demands from stakeholders or ambitions on the level of economy, and transparency may for instance motivate policy makers to open up data. Dynamic relationships between these common elements impact the outcomes of open data initiatives over time. For example, feedback loops from open data users, foreign hypes regarding innovative data use, or technological progress might influence open data policies in ways of organization, amount of published datasets, or the available budget.
Examples of open data ecosystem elements.
OGD: open government data.
Mapping the open health data ecosystem
By making use of the acquired knowledge on health data and inspired by existing literature on OGD ecosystems, the generic ecosystem components will be mapped to the matter of health data. What constitutes the socio-technical environment in which health data managed by governments are opened up or potentially stay closed? This environment and its dynamics will be mapped according to the commonalities of open data ecosystems: (a) stakeholders and their interests, (b) information policies, (c) data preparation activities, (d) infrastructure, and (e) drivers. The mapping of the open health data ecosystem is summarized in Figure 3. This section ends with a reflection on the mapping result in relation to the generic health data ecosystems.

The open health data ecosystem.
Stakeholders and their interests
Regarding stakeholders, a first actor includes the governmental data producers which manage health datasets. Diverse government agencies or institutions can be responsible for aspects of health policy, a wide-ranging policy domain covering multiple aspects. Aspects include for instance the funding and the quality of the health care system, the utilization, and evaluation of pharmaceuticals and medical technologies in a country, the surveillance of (infectious) diseases, the prevention of certain health problems, and the encouraging of healthy lifestyles, the implementation of measures to diminish health inequities within a population, etc.
53
The organization of health policies differs between countries, which means the responsibilities of health policy can be either centralized or decentralized both on the level of state structure and the level of governmental departments. Therefore, cautiousness regarding generalization should be applied. Nevertheless, considering these sub-domains of health policy we expect that government agencies or government institutes dealing with public health insurance, the supply and audits of health care services, the regulation of pharmaceuticals and medical devices, disease control, health promotion, health technology assessments, public health research and general national statistics
On the side of the data users, we distinguish citizens, the private sector, and the academic sector. Regarding citizens, a further division into individual citizens and citizen groups is made. An individual citizen, for instance in need of health care, could use open health data to obtain information about the availability of caregivers or the quality of the health care system. However, the open datasets could be too raw for data laypersons and/or difficult to interpret. Therefore, the individual citizen might require support from open data products
The second type of data users concerns the industry. In the case of health data, the industry likely to be interested in open health data consists of pharmaceutical companies, firms related to medical devices and technologies (the so-called MedTech manufacturers), private health insurance companies, etc.47,48,57 Their use of open health data could either immediately lead to value as is the case when its utilized to decide about areas worth investing or the creation of uncomplicated health products or services. Nevertheless, as the development and market introduction of pharmaceuticals and medical devices is due to safety reasons highly regulated and subject to lengthy trials, some products partially generated with the aid of open health data could take years to become market proof and available to end-consumers. The interest of the industry concerns the facilitation of business activities by employing open health data as a free-available resource. 58
In case of the third user, the academic sector, first thoughts go out to (bio)medical and pharmaceutical sciences which could employ the data for research concerning inter alia the causes, the evolution, and treatment of health-related problems or the organization and adjustment of medical practices.33,57 However, considering the broad definition of health policy and existing health datasets, other academic disciplines should be added. Open health data are for instance also valuable for health economists, investigating which investments in health (care) give the greatest return, 59 or health sociologists interested in health inequities within a population. The academic use of open health data is linked to their interests concerning research publications and groundbreaking findings related to health(care).
Information policies
An open health data ecosystem is the one hand surrounded by general open data policies developed by a national, regional, or local government. Within these general open data policies concerning legal and policy frameworks to stimulate the (re)use of government data, nonsensitive health datasets (e.g. birth statistics or availabilities of hospitals) are potentially included leading to a relatively trouble-free data accessibility. However, due to the peculiar characteristics of several health datasets some specific policy frameworks shall be applicable. Within these frameworks items such as the goals, resources and responsible actors of the open health data policy need to be addressed.
A first framework comprises privacy legislation aimed to protect citizens with regard to the processing of their personal data. Although open data only includes anonymized data, health data often result from personal data. Privacy legislation determines strict conditions under which personal data such as data on an individual's health are whether or not allowed to be collected, stored, adapted, consulted, combined, etc. 22 Furthermore, these legislations might impose certain data protection measures and principles such as storage limitation having an impact on the management and opening of health datasets. Before opening up anonymized health datasets, risks assessments concerning the reidentification of individuals might be direct or indirectly imposed by privacy legislation. 17 Furthermore, advice or authorizations from national data protection authorities will in some cases be recommendable.
The opening up of health datasets is also be bounded by ethical frameworks considering questions such as “Can the data be (re-)used for the Public Good?” Like in medical science projects, government agencies could request an opinion from an ethical committee regarding the opening up of certain datasets. Otherwise, relevant recommendations of authoritative bodies such as (inter)national bioethical boards or deontological commissions of medical caregivers might influence government agencies in their decision process regarding the free availability of particular health datasets.
Lastly, as health data are provided by a large set of stakeholders with their own institutions, an open health data policy depends on a negotiation framework geared towards consensus in order to receive and publicly share data related to health. As health data is considered to be sensitive data, agreements or conventions with the health care sector and cocreated action plans assist governments to avoid misperceptions and negative publicity regarding the opening of health data sets.
Data preparation activities
Keeping the characteristics of health data in mind, expected data processing activities pertain to privacy safeguards and data quality. Some health datasets involve nominative or pseudonymized data. Prior to opening up these datasets, a process of data de-identification is necessary. This involves the detection, removal, or modification of information that could lead to personal identification. 40 Some information requires for instance a higher level of aggregation to avoid unintended disclosure of personal data. When working with smaller populations (for instance patients with a rare disease) challenges can arise or even inhibit the opening up of certain health data sets.14,60
Another important preparation activity preceding the opening of health data includes the assessment of quality. Medical records of caregivers are in several cases the primary source of the government health dataset. As this source is originally intended to support clinicians in diagnosing and treating their patients other objectives such as reporting for public health matters might not always correspond. Furthermore, skepticism exists concerning the completeness and accuracy of medical records. 42 Consequently, the manager of the open health dataset needs to evaluate if the dataset is fitted enough to use in an open data context where (re)use of data could be used for a wide variety of purposes.
Metadata is required to facilitate correct interpretations of the data and to give potential data users information about the opportunities and limitations of the health dataset. Multiple studies have already proven cases regarding public misinterpretation of health data. 41 As misinterpretation might do harm, the opening of a dataset should evaluate the perils of misinterpretation. Sathianathen et al. 61 showed for instance how misinterpretation of publicly reported surgical outcomes could lead to disadvantageous care choices by (vulnerable) citizens. The provision of sufficient contextual information and assistive data visualization tools may possibly mitigate the risks of misinterpretation.
Infrastructure
Concerning infrastructure we refer to part 3 of this article where it was observed that portals serve as a useful instrument for open health data policies. Two types of portals were detected: on the one hand open data portals giving direct access to health datasets managed by the government, on the other hand information portals designed to give an overview of available data on the government side and by whom these datasets are managed. 13 A combination of both types is also possible, namely a portal providing direct access to some health datasets and an overview of other existing health datasets which do not have direct accessibility.
Drivers
Various factors drive the creation or enhancement of open data policies. A government can be inspired by the practices of another government, recommendations by supranational organizations, or international data sharing trends The United States was one of the first countries to explicitly, actively, and visibly include health in their OGD ambitions. Around 2010, their opening up of several health datasets, formerly closed to the public, were for instance widely applauded. 32
Concerning other drivers for open health data policies, one can think of the current huge promises regarding the potential of big data projects related to inter alia the analysis of disease patterns, a faster tracking of disease outbreaks, health claim fraud detection, and the development of customized treatments for individual patients.38,42 These projects require accessibility to lots of health data. Do governments believe in such data innovation projects and are they willing to share health data with actors wanting to execute such projects like academics, pharmaceutical companies, and health insurers?
As health data is due to its characteristics considered to be sensitive, trust will be a key driver promoting or confining the opening of health datasets. Is there a strong belief that the health datasets will generate societal benefits (for instance breakthroughs in cancer research)? Or is the perceived danger regarding misuse of health data stronger? Positive or negative attention concerning health data initiatives could influence further policy outcomes. History already showed that distrust in projects concerning the sharing of health data has the potential to scuttle these intended projects. 44
Reflection on the mapping result
The common elements of generic open data ecosystems proved to be useful to customize an ecosystem dedicated to the theme health data. It allowed to identify fundaments of an ecosystem namely people, infrastructure, and institutions related to health(care). Certain particularities within the ecosystem dynamics related to health data lie in
the diversity of governmental data producers and their high dependency on (nongovernmental) data providers to gather qualitative data for several datasets lower visibility of derived open data products because certain health products require heavy market authorization procedures due to safety concerns the presence of bioethical frameworks and relevant governance authorities that might influence data access the execution of stringent de-identification processes (if the data are derived from personal data) compared to many government datasets with none or low identification risks the existence of portals that make existing datasets transparent but not directly accessible the impact of societal debates about the benefits or dangers of data sharing (in comparison to other data types that are less sensitive)
Testing the open health data ecosystem
Having mapped the open health data ecosystem, the next research goal is to test the system via a case study in which an open health data policy was introduced. The selected policy concerns the initiative “Data for Better Health” of the federal government from Belgium launched in 2018. The policy aims to facilitate the reuse of the many existing data resources in health and health care. It believes that a more intensive use of these health (care) databases, developed by or on behalf of the government, will lead to more knowledge and innovation in the field of public health. 62
Being a recent policy, the case allows us to examine the ecosystem dynamics at an early stage where several stakeholders laid the first foundations of an open health data environment. The sources used to study the case comprise the website dataforbetterhealth.be, the portal fair.healthdata.be, government documents
Similarly, three portals were introduced. The Belgian FAIR-portal (fair.healthdata.be) provides a transparent metadata-catalogue of public health databases. This information portal shows which health related databases exist, the person of contact, information concerning access, etc. Related to this FAIR-portal a web application metadata-healthdata.be was developed. This metadata-portal allows data managers via user accounts to update the metadata of their database and to add new data collection projects. Their input is automatically published at the FAIR-portal. Data managers were invited to complete the metadata of the databases for which they are responsible. Although, several already did a large amount of them still proved to be little responsive, having doubts concerning data quality or not believing in the benefits of open data. Thirdly, a request-portal was put into production where interest data users can fulfill a request form to demand access to databases that are indexed at fair.healthdata.be. The awareness, accuracy, and user-friendliness of these portals are considered to be areas for improvement. In this early stage of the
Considering the case study results, the developed open health data ecosystem is viable to identify the elements that constitute a socio-technical environment promoting or obstructing the opening up of health data (see Table 4). While less emphasized in our mapped system, more general open data aspects like the presence of open data communities and data driven startups or the applicability of legislation concerning reuse of public sector information were also noticed within the case. Hence, one needs to remind that within a specific health-related ecosystem general open data aspects keep playing a role.
Ecosystem elements of the data for better health initiative.
Conclusions and future research
Former research on OGD ecosystems stated that the actors and dynamics of such ecosystems could differ according to the policy domain in which data are produced. Based on knowledge concerning the peculiarities of health data, this article aimed to map and test an open data ecosystem concerning the policy domain of health. We managed to create and assess an open health data ecosystem consisting inter alia of particular stakeholders, interests, information policies, and activities. Particularities regarding an open data ecosystem for the policy domain of health include inter alia de-identification activities, (bio)ethical assessments, and the specific role of data providers (
Considering our observations and the obtained understanding of the broad domain of health policy, it needs to be mentioned that the dynamics of an open health data ecosystem could vary according to the health dataset. An administrative dataset concerning for instance an overview of hospitals in a region will likely face less constraints to open up than a dataset related to a disease registry containing for instance information on the prevalence of a health problem affecting only a small population. Depending on the data sensitivity and risks regarding reidentification subdivisions in the open health data ecosystem may arise. Within existing literature on the topic of open health data distinction between types of health data is often neglected, nevertheless, it could be recommendable for future research. Which sort of health datasets are easily to publicly publish and which ones should have restricted access policies because of privacy risks?
The selected case study did not allow to more deeply investigate the degree of participation by potential data users. At first instance, most interested users of the health data seemed to come from the side of the industry to support business activities. Do other potential users have in practice enough capacity, time and skills concerning data processing? Additional research could focus on how other potential users such as patient and citizen organizations may employ the data to inform citizens about health(care) aspects and to strive for health care improvements. Knowledge of the resources required to exploit the benefits of open health data can enrich the described ecosystem.
Finally, an in-depth comparison with other types of data that are considered to be sensitive like security data or financial data can be valuable in putting certain characteristics and dynamics regarding health data into a broadened perspective.
