Abstract
Introduction
In a context of increasing volumes of spatio-temporal data, data analysis is a powerful tool for supporting epidemiological research. There are various approaches to processing this type of data, such as statistics, data mining, machine learning, etc. However, a solid understanding of the data is necessary to choose the most appropriate approach. Visualization satisfies these needs through two approaches: data presentation and in-depth exploration. As data becomes increasingly voluminous, it is becoming crucial to develop suitable, high-quality, interactive tools.
We propose Epid Data Explorer (EDE), a visualization tool enabling fast, simplified exploration of spatio-temporal data. A particular feature of this tool is its ability to facilitate comparisons between indicators for different geographical areas, dates and/or periods. EDE can also be used to visualize and compare trends. Another notable advantage of EDE is the ability to import new datasets securely, which guarantees data confidentiality when required. In short, Epid Data Explorer (EDE) meets an essential need in the field of epidemiology by providing a platform for the visualization and comparison of spatio-temporal epidemiological data. With EDE, the exploration and comparison of epidemic data becomes more accessible and more efficient.
Requirements
This work is conducted within the framework of the European project MOOD (MOnitoring Outbreak events for Disease surveillance in a data science context), engaging 25 partners from 12 countries. The project members include public health agencies, veterinary health agencies, and surveillance practitioners. It is dedicated to developing innovative tools and services through close collaboration with professionals involved in the early detection, assessment, and monitoring of current and potential infectious disease threats in Europe and beyond.
In this context, we defined, in collaboration with a diverse group of users, including public health officials who monitor disease outbreaks and coordinate responses, data analysts who process large datasets and identify trends, and individuals responsible for health surveillance to prevent crises, a set of requirements that have guided the design of our platform. This process was carried out iteratively: users defined a set of requirements, we then proposed a prototype, which led users to specify or redefine new requirements, prompting us to make a new proposal, and so on. This approach was strongly inspired by the recommendations of Munzner 1 and the design process proposed by Sedlmair et al. 2
These requirements cover both the functionality of the platform and the system architecture itself. End-users have expressed the desire to visualize multiple datasets in a unified platform so as to identify potential correlations between indicators through comparative analysis. They also want the ability to track changes in indicator values over time. These tasks can be facilitated by supporting simultaneous comparisons within a single view. Additionally, as certain data may be sensitive and require confidential storage, the tool must implement robust data security measures. This includes the ability for users to import their own datasets while maintaining strict data privacy. Based on these needs, the following requirements have been identified for the Epid Data Explorer tool:
Related work
There are many web tools available for epidemiological data surveillance visualization. In this work, we only focus on those that enable tracking epidemics in both geographical and temporal dimensions.
We identified two types of such tools: event-based surveillance and indicator-based surveillance tools. The former processes unstructured event reports, while the latter directly visualizes structured data.
We don’t include simulation tools such as GLEaMviz. 4 Even if they provide numerous functionalities for visualizing epidemiological data (GLEaMviz contains dynamic maps and charts describing the geo-temporal evolution of diseases), they require an epidemic model and a simulation scenario to simulate the spread of infectious diseases, and do not focus on the visualization of actual epidemiological data per se.
Event-based surveillance
For event-based platforms, there are specific tools dedicated to monitoring particular diseases. For instance, Monitoring Rabies in Media 5 is an alert system designed to monitor the daily circulation of press articles related to rabies. The EpidNews analytical visualization tool 6 tracks source data related to animal epidemiology to observe the spread of epidemics. It extracts information on locations, dates, and symptoms. The EpidVis tool 7 simplifies web searches for animal-disease detection and monitoring. It is a visual query tool specifically designed for animal health experts. A visual analytics interface featuring coordinated views has recently been developed by Kuo et al. 8 This interface also focuses on animal epidemics and allows the investigation of epidemics by identifying the relationships between livestock farms. The consequences of a disease outbreak, its severity, and its scale are estimated using unsupervised machine learning methods. Another automated system, HealthMap, 9 queries, filters, integrates, and visualizes unstructured reports on disease outbreaks using text processing algorithms. Users can also propose alerts. Similarly, BioCaster 10 detects and tracks the distribution of infectious diseases by continuously analyzing RSS feeds.
Indicator-based surveillance
One of the platforms used for indicator-based epidemiological data visualization is Epi Visualization. 11 This tool allows for the exploration of health data and of estimates related to epidemics. The available data is quite extensive, covering 369 diseases across 204 countries and territories from 1990 to 2019. With Gapminder, 12 users can visualize data spanning various subjects, not only in the field of health but also in economics, the environment, demography, and many others. The platform Empres-i 13 visualizes data sourced from various international animal health authorities including ministries of agriculture and health and the Food and Agriculture Organization of the United Nations. The European Centre for Disease Prevention and Control (ECDC) provides interactive databases that offer tabular and map views of ECDC data. One of the tools provided is the Surveillance Atlas of Infectious Diseases, which allows the manipulation of data collected via the European surveillance system TESSy. The ECDC also provides Dashboards. One such dashboard is the Polio dashboard, which provides an overview of the global poliovirus situation.
The global COVID-19 pandemic that started in 2020 led to the development of many visualization tools. One of the best known is the COVID-19 Dashboard, 14 developed by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. The visualizations proposed by the COVID-19 Data Explorer tool from Our World in Data, 15 in the form of diagrams, maps, and tables, are also widely recognized. Additionally, other applications such as COVID-19 CG 16 and CoV-Spectrum 17 focus specifically on mutations of the SARS-CoV-2 virus. These applications offer functionalities to explore and analyze data on virus sequences through phylogenetic trees, as well as mutation and lineage tracking by locations and dates of interest. Moreover, there are other applications with similar functionalities such as Nextstrain 18 and VirusViz, 19 which can address different types of viruses. There is also a dedicated dashboard for monitoring the status of COVID-19 vaccines called the COVID-19 Vaccine Tracker. For further exploration, interested readers can refer to, 20 which provides a survey of COVID-19-related visualization dashboards.
Fulfillment of requirements by indicator-based platforms. ✔ indicates that the platform fully meets the requirement, ○ indicates partial fulfillment, and ✗ indicates that the approach does not address the requirement at all.
This analysis shows that none of the indicator-based platforms presented fully meet all the requirements. We are therefore proposing an optimized solution that addresses all user needs.
The EDE platform
The Epid Data Explorer (EDE) web platform offers two ways to visualize data on maps. The first, which is displayed by default, consists of the juxtaposition of two maps. The second is a view with a single map. The data used to demonstrate the functionality of EDE comes from the ECDC, 22 Our World in Data, 23 Google, 24 Government Response Tracker, 25 Obépine 26 and Historique méteo. 27
Homepage
Before accessing the visualizations, the user must select the datasets of interest using the list of datasets on the left (Figure 1(a)) and/or the list of dataset groups on the right (Figure 1(b)). Groups allow the datasets to be assembled by theme. In Figure 1, the COVID-19 group is selected, which means that by clicking on “Validate”, the user will be able to visualize all the datasets available in EDE related to the COVID-19 pandemic. The “Meteo (Europe - historique-meteo.net)” dataset is also selected, so it will also be displayed. The homepage also provides an overview of the purpose and principle of EDE (Figure 1(c)). There is a menu (Figure 1(d)) in the top-right corner of the platform. Epid Data Explorer homepage.
Proposed views
The 1-map view, as shown in Figure 2(a), is displayed by default. In this view, the user is able to focus on one element, being provided with an overview of an indicator that can be explored from a temporal and geographical perspective without the clutter of a second map. If the user needs to compare two maps, the 2-map view (Figure 2(b)) is more suitable. It provides two juxtaposed maps, allowing easy comparisons. Switching between the two views is very simple. To go from a 2-map view to a 1-map view, the user simply clicks on the button located in the upper-left corner of the desired map (Figure 2(b)-ⓐ or Figure 2(b)-ⓑ). To add a second map, the user simply clicks on the button (Figure 2(a)) located in the upper-left corner of the map. Epid Data Explorer views. (a) 1-map view. (b) 2-map view.
Main components
Map components
The map (Figure 3(a)) facilitates the exploration of various geographical features. In Figure 3, the country granularity is selected. It is possible to modify the level of detail via the input field composed of radio buttons (Figure 3(b)). In this example, the possible granularities are country, region and subregion and correspond to the ISO 3166 nomenclature.
28
Depending on the dataset, the nomenclature is either ISO or NUTS.
29
In the latter case, the granularity has 4 levels: NUTS0, NUTS1, NUTS2 and NUTS3. The map can display values at the level provided in the dataset and at higher levels if aggregation was carried out during the importation of the data. For example, if the data uses ISO 3166 geocoding and the granularity is set to region, aggregation can be carried out to view the data on the country level. This satisfies requirement R2, i.e. providing access to different spatial granularities and aggregating data. Main components of a map.
The map shows the value of the indicator for each geographical entity at a certain date or period. The color of each geographical entity is defined using a color scale according to the value of the indicator. The legend below the map (Figure 3(c)) clearly displays the indicator values and automatically adjusts to the selected data type. EDE can handle both ordinal and quantitative data types, as shown by Figure 4. Two examples of legends for quantitative data are provided: Figure 4(a) for a continuous scale and Figure 4(b) for a divergent scale. A divergent scale is used when data values move in two opposite directions. In the example, the values are positive and negative, and the pivot value is 0. Another example of a legend, for ordinal qualitative data, is shown in Figure 4(c). EDE generates the legend by analyzing the dataset values for adapting the legend to the selected indicator. Legends according to the type of data to be visualized. (a) Legend for quantitative data based on a continuous scale. (b) Legend for quantitative data based on a divergent scale, data values go in two opposite directions. (c) Legend for ordinal qualitative data.
Several datasets are available, and each of them is composed of several indicators. In the example in Figure 3, the dataset “COVID-19 (Our World In Data)” is selected (Figure 3(d)) and the indicator is “Total confirmed cases”. The selection of an indicator (and thus the associated dataset) can be easily changed using the drop-down list (Figure 3(e)).
In addition to selecting an indicator from a dataset, a user can also select a date via the timeline (Figure 3(g)). Like the geographical entities, the time dimension also has several granularities: day, week, month and year. It can be modified by selecting the time level via the radio buttons (Figure 3(g)). This satisfies requirement R2: providing access to different time granularities. The minimum granularity level is determined by the granularity of the dataset. The system also automatically aggregates the data from the lowest level (by day) to the desired higher level and updates the legends accordingly, as shown in Figure 5. This functionality satisfies requirement R2. Available timelines according to the temporal granularity selected. (a) Annual timeline. (b) Monthly timeline. (c) Weekly timeline. (d) Daily timeline.
Tooltips
While navigating on a map, the user can easily access more detailed information on the selected indicator for a geographical entity. Tooltips are displayed by clicking on a geographical feature. A tooltip shows the values of the current indicator for the selected area for the whole period covered by the dataset. It can be repositioned on the map and resized. Figure 6 shows an example in which two tooltips are open: one for Andalusia, a region in Spain, and the second for Nord-Norge, a region in Norway. The indicator is the maximum temperature. The simultaneous display of several tooltips corresponds to requirement R3. Example of a map with two open tooltips. The indicator is the maximum temperature for the day of September 27, 2021. The spatial granularity here is NUTS2.
Figure 7 shows the key components of a tooltip. The values for the selected geographical entity over time are displayed as a black line (Figure 7(a)). The evolution of the median value, computed from the values for all geographical entities, is represented by a blue line (Figure 7(b)). The lighter blue zone (Figure 7(c)) represents all of the values for all geographical entities, with the delimitations corresponding to the minimum and maximum values. The dark blue zone (Figure 7(d)) around the line of the median value corresponds to the values between the first and third quartile for all geographical entities. All this gives context to show how the entity is situated in relation to others. The user can select different options (Figure 7(e)) to adapt the graph displayed. The first option,
When hovering over the graph, a vertical line that corresponds to a date on the Tooltip components.
Functionalities of the 2-maps view
The 2-map view facilitates the comparison of different time periods, geographical regions, and indicators. These three elements correspond to the dimensions mentioned for requirement R1. The 2-map view consists of a juxtaposition of two maps next to each other as shown in Figure 8. These two maps can be navigated in exactly the same way as the single map. Additional components in the 2-map view.
Synchronization
We use a padlock system to implement synchronization and thus facilitate comparison (Figure 8(a)–(c)). This padlock system allows any combination of the 3 dimensions of requirement R1 to be used for comparison.
The user can synchronize the navigation of the two maps. By locking the padlock at the top (Figure 8(a)), the slightest change to one of the maps (zoom or pan) is automatically applied to the second map. Conversely, if the padlock is unlocked, the two maps are completely independent of each other. The principle is exactly the same for the synchronization of the indicator and the dataset (Figure 8(a)) and the choice of date or period (Figure 8(c)).
In the example of Figure 8, the padlocks for map navigation (Figure 8(a)) and synchronization of the temporal dimension (Figure 8(c)) are locked. The geographical area for both maps is Europe and the selected period is January 2023. The third padlock, associated with the choice of indicator, is unlocked (Figure 8(b)). This means that each map can represent the values of a different indicator. This is indeed the case: the map on the left shows the values for “Total confirmed cases” while the map on the right shows “Total confirmed cases per 1,000,000 people”. In this example, we can therefore compare these two indicators in Europe for the period January 2023, which corresponds to requirement R1.
System architecture
During the application design phase, we identified a close relationship between requirements R4 and R5: users want to access both public and private datasets while ensuring the privacy of their own data. The general principle of the architecture is illustrated in Figure 9. Epid Data Explorer system architecture.
The Epid Data Explorer application is available online with public data. However, to fulfill requirements R4 and R5, a local version of EDE must be installed. First, users need to download the EDE application at: https://gite.lirmm.fr/advanse/EDE/epid-data-explorer and install it on their own computer. To further facilitate the process, a new interface called
The current design uses the
Use cases
In this section, we present two use cases to demonstrate the relevance of EDE’s visualization functionalities.
Restrictions and mobility
During the COVID-19 pandemic, various countries worldwide implemented restrictions to combat the spread of the virus. These measures significantly affected people’s everyday lives, particularly in terms of mobility. For our initial use case, we will focus on the implementation of these restrictions in Europe. EDE allows us to observe the impact thereof, as shown in Figure 10. Visualizations of stay-at-home restrictions (maps on the left) and evolution of mobility in transit stations (maps on the right). (a) Visualization for the week of March 2 to 8, 2020. (b) Visualization for the week of March 23 to 29, 2020.
Two distinct time periods are depicted: the week of March 2 to 8, 2020, and the week of March 23 to 29, 2020. In Figure 10(a), which corresponds to the first period, Italy is the sole European country to have implemented a lockdown (Denmark had only recommended restrictions), as illustrated by the map on the left. With regard to public transport usage (map on the right), there was a slight decrease in some countries, such as France, Germany, and the UK, while others experienced a slight increase (Spain, Portugal, Ireland, Norway, etc.). Notably, Italy had already witnessed a decline in public transport usage at that moment owing to the implemented restrictions. During the week of March 23 to 29, numerous countries implemented lockdown measures, leading to a widespread and unprecedented impact on mobility. This phenomenon can be easily observed in Figure 10(b). By visualizing the two maps side by side, we can assess the effect of the implemented restrictions on mobility across European countries. Additionally, we can compare the evolution of the same indicator in multiple countries over the same period. In Figure 10, the tooltips for Italy and France are open on all the maps, enabling a convenient comparison of the implementation of lockdown measures and trends in public transport usage between these two countries. On the left-hand maps, we can observe that Italy implemented measures earlier and maintained them for an extended period, whereas France exhibited two distinct periods: from March 2020 to the summer, and from the end of 2020 onwards. Furthermore, a similar trend in public transport usage can be observed in both countries, but Italy tends substantially lower and more frequently below the median value.
This first use case illustrates the effectiveness of the platform in highlighting phenomena and facilitating comparisons.
COVID-19 and weather
Numerous research studies have extensively examined the correlations between weather conditions and COVID-19. For example, Ganslmeier et al. 30 conducted an analysis using a comprehensive dataset. In a separate study, Majumder et al. 31 performed a meta-analysis. The results highlighted a notable correlation between temperature, humidity, and wind speed and both the death rate and incidence of COVID-19. In a similar study, McClymont et al. 32 conducted an analysis of 23 articles after a rigorous selection process. The researchers found that temperature and humidity were frequently reported as a significant factor in many studies.
EDE can be used to analyze data and identify potential correlations that may require further in-depth analysis. For example, if the user wants to examine correlations between weather and COVID-19 in France, they can retrieve data from different sources.27,33 With the EDE Datastore interface, this data can be effortlessly integrated. To do this, the original data undergoes preprocessing, during which it is transformed according to geocoding standards such as NUTS. This transformation utilizes data processing resources available from the EDE Data Store. The integrated data can then be seamlessly incorporated into the platform. The data in this example is at the department level (NUTS3), and users can select the aggregation function for each indicator to move to higher levels such as region or country. Finally, users have the flexibility to choose the specific indicators they wish to track. Figure 11 showcases how EDE enables the comparison of weather data and COVID-19 in France during the third wave of the pandemic to identify potential correlations. It allows an analysis of variables like temperature, humidity, and wind speed in relation to the incidence rate or any other chosen indicator across different geographic regions and over time. The left-hand map in Figure 11 displays the maximum humidity recorded, while the right-hand map shows the incidence rate. Visualization of the weather (humidity max) and COVID-19 indicators (incidence rate) during the third wave of the pandemic in France. (a) 2021-03-07. (b) 2021-03-22.
This second use case illustrates the usefulness of importing data to explore and observe possible correlations. These observations can then lead to a more in-depth and optimized data analysis by allowing the appropriate approach to be selected.
Conclusion
Epid Data Explorer is a flexible data visualization platform that lets users explore and compare spatio-temporal datasets. The system’s architecture has been designed to be able to import and compare any private dataset with public datasets in a secure and simplified manner. As for the visualization tool, it has been designed to enable the detailed exploration of data, as well as the comparison of different indicators. The visualization is interactive: the user can browse the maps, select the date or period of interest, as well as easily change indicators or even datasets. All the visualization functions are available simultaneously in a single window. EDE can be an excellent tool to support epidemiologists, researchers and healthcare professionals in monitoring infectious diseases or even generating new hypotheses based on their findings when exploring new data.
