Abstract
Keywords
Introduction
There has been an indisputable increase in international migration, development and globalisation in the new millennium. The number of international migrants annually has increased significantly from 173 million in 2000 to 281 million in 2020 (IOM UN Migration, 2022; McAuliffe and Triandafyllidou, 2021). There have been several attempts to define the relationship between international migration and global development, including potential causal effects in both directions. However, to date, no conclusive results have been presented explaining this relationship. The core reason is a lack of adequate and high-quality data on migration. At the same time, due to the lack of reliable data, the global migration system confronts serious challenges in understanding the patterns, managing, organising and forecasting international migration (Kraler and Reichel, 2022; Willekens, 2018). As a result, policymakers and practitioners are challenged to make evidence-based policies (Baldwin-Edwards et al., 2019).
The lack of reliable data on international migration has been acknowledged at the global, national and local levels (de Beer et al., 2010). In the absence of detailed data on migrants, many aspects of international migration remain understudied, and countries cannot exploit the many benefits of migration (Clemens, 2014). Additionally, the existing international migration data are subject to significant gaps and deficiencies (Bircan et al., 2020). These gaps fall into five conceptual categories – namely, inconsistent definitions, scant information on the reasons driving migration decisions, uneven geographic coverage, missing information on the demographic characteristics of migrants and outdated (obsolete) data (Ahmad-Yar and Bircan, 2021).
Several factors underlie the lack of valid data and gaps in migration statistics. Collecting data on migration and other populations requires funds and well-founded administrative and operative systems (Bilsborrow et al., 1997). Countries with better economic conditions collect more data, and economically challenged countries cannot prioritise collecting data due to financial restrictions. Increasing the quality of data requires political will at the international level and consistent coordination across countries. Although international organisations, including the United Nations (UN) and the International Organization for Migration (IOM), are trying to improve the quality of the data, the process is very time-consuming and it will require several more years to achieve the goal (Laczko, 2016).
Hence, there is an urgent need for alternative sources to improve, complement or even replace traditional migration statistics (Beduschi, 2017). The rapid growth in digital technology and computer sciences has revolutionised human lives in the last few years. One of the contributions of digital technology is the way it can auto-generate data and information on the activities of its users. Migration and mobility are activities that digital devices can gather crucial data on (Rango, 2015). Data gleaned from digital technology in this way has been conceptualised as
A recent crop of scholarly studies has asserted that National Statistical Institutes (NSIs) in countries like Australia, the Netherlands and Italy are using big data to complement or even replace their traditional data (Daas, 2022; De Boom and Reusens, 2023; Rango, 2015; Struijs et al., 2014; Tam and Clarke, 2015). However, to date, no evidence or systematic study indicates the extent to which official statistical organisations have used big data to complement traditional migration statistics. We are also in the dark regarding which aspects of the gaps in official statistics have been fixed or complemented by big data and what the potential of big data is in this area.
Against this background, in this study, we try to explain NSIs’ use of big data, particularly in migration and mobility, which shortages in data might be covered by big data, and what the NSIs understand about big data. The article is based on data we collected from 29 NSIs across different regions. We relied on an expert questionnaire methodology to gather substantiated information from formal sources. The remainder of the article proceeds as follows. First, we define the problem and the issues with current data. We then detail the methodology we used to collect and analyse the data. In the third and fourth sections of the paper, respectively, we present our main findings and discuss the challenges and potential of using big data in migration statistics. The paper concludes with a short section drawing together the findings and marking out avenues for future research.
Definitions and problem presentation
The shortage and lack of high-quality traditional statistics on specific aspects of human migration and mobility have urged researchers and statisticians to look for alternative sources. The shortage includes data covering immigration within and outside the traditional definitions. Following a suggestion by the UN, data were collected on long-term and short-term migration. A long-term migrant is a person who moves to a country other than that of his or her usual residence for a period of at least a year (12 months) and a short-term one of at least three months but less than a year (12 months) (United Nations, 1998). Data on temporary migrants and mobile populations outside this range is also necessary to study population movements. Also, traditional data are often released with a time lag, given the time needed to collect and prepare it. Data on migrant stocks and flows are not always up to date, and data collection is costly.
As mentioned previously, new innovative data derived from electronic sources and technological devices are, to some extent, used to compensate for the shortage of such data, commonly referred to as big data across various disciplines. However, the interpretation of what constitutes big data varies considerably among different scientific areas, including studies of human migration and mobility. Hence, before examining how big data enhances or is applied in existing data sets concerning human movement, it is crucial to clarify the notion of big data both in the context of human migration and more broadly (Bircan et al., 2023).
De Mauro et al. (2016) define big data as ‘the Information asset characterized by such a High Volume, Velocity and Variety to require specific Technology and Analytical Methods for its transformation into Value’. This definition is very relevant from a technical point of view, explaining the conversion of technology-based information into values. However, it is criticised for being solely from a data science and data analytics perspective and not accommodating sociological aspects of the data (Lupton, 2015). Favaretto et al. (2020) conducted interviews with 39 experts and researchers working with big data to ask how they define the phenomenon. They conclude that there is no single comprehensive way to define big data that corresponds to all disciplines and fields of study. They also suggest that the data users define and explicate their data and methods before concluding. In the same vein, big data is defined differently by scholars of migration and mobility studies.
The most common definition of big data among migration studies scholars is automatically generated numerical data, text data, images, audio and geolocation data derived from digital sources often collected by private companies to offer services to customers (Tjaden, 2021). These data can be obtained from diverse sources, such as online platforms like social media and search engines, as well as satellite and geolocation data, among other options (Salah et al., 2022b). Other data are sourced or generated by digital devices but are often not considered big data as these gadgets are created explicitly for collecting data. This includes digital tools used by administrative employees, survey devices and data collection software.
To fully understand the ramifications of employing big data in migration statistics, one must critically examine its importance and assess the possible inconsistencies it could introduce into measurement and reporting techniques. While the notion of ‘better’ data is frequently invoked, it is essential to delve deeper and explore the divergent attributes associated with various forms of data employed by NSIs (Ahmad Yar and Bircan, 2023). By embracing big data sources, NSIs can augment their repertoire of tools for measuring and reporting migration trends. Including diverse data types, such as digital traces, social media activity and mobile phone records, offers a comprehensive and nuanced understanding of human mobility. Unlike traditional data sources, which often rely on surveys and administrative records, big data provides real-time, granular and extensive insights into migration phenomena. This enriched perspective can enable NSIs to capture internal mobility, discern intricate patterns of movement, analyse stock and flow dynamics and even ascertain the impact of migration on various societal dimensions (Ahmad Yar and Bircanm, 2023). Such a paradigm shift in data sources holds profound implications for migration policymaking, particularly within the European Union (EU). Using big data sources can equip policymakers with timely and accurate information, enabling them to make informed decisions to address the challenges and opportunities associated with migration. As it is collected in real time, big data enables the prompt identification of emerging migration trends and allows for swift responses to evolving circumstances. Furthermore, the ability to capture fine-grained details and analyse mobility patterns enables the development of targeted policies that address specific migratory needs and concerns.
Having said that, it is paramount to establish a clear understanding of what constitutes big data within the realm of national statistical sources. Previous studies have revealed that countries like Australia, the Netherlands and Italy used big data to complement and replace traditional data (Tam and Clarke, 2015). The UN Economic and Social Council (ECOSOC) established a working group involving experts from NSIs at the global level to share methodological developments, best practices for strategic issues and training opportunities, as well as building partnerships for the use of big data (IOM, 2021). In the report, it is also mentioned that big data is defined and perceived differently across NSIs. Agencies’ use of computers and electronic devices produces a lot of administrative data, which many NSIs are currently using as a replacement for traditional surveys. Administrative data are collected by organisations about their operations. Some countries consider administrative data as big data, while others consider only automatically generated data as a result of users utilising electronic devices.
To that end, we asked the opinion of NSIs on what they consider big data and their use of it by them. While a few years back, some NSIs used to consider administrative data as big data; in our study, the NSIs followed the definition of automatically generated data derived from digital sources, including internet-based data such as social media, email and search engine data and data other innovative sources such as satellite data, mobile phone data, scanners and censors’ data. Data derived from administrative sources were not considered big data in the current study.
Methodology
To understand the extent to which big data is used across countries for migration by official statistical organisations, we must collect substantiated information from formal sources across countries. Expert interviews allow for gaining explicit knowledge rapidly, objectively and cost-effectively (Bogner and Menz, 2009). We used the expert interview methodology developed by Meuser and Nagel (1991, 2009) for the current study. Expert interviews allow for rapid and unproblematic access to objective data for empirical social researchers. In Meuser and Nagel's (2009) expert interviews, the knowledge of the expert and the knowledge of the organisation that the expert is affiliated with are equally important, with each having its traits necessitating a particular methodological and analytical approach.
To conduct expert interviews, researchers should predefine certain premises, referring either to existing practice or prior research. First, interviewees should specify the problem and its causes. Second, interviewers should ensure that the expert has superior and comprehensive knowledge or skill in the specific domain of the predefined problem (Blanchard and Allard, 2010). Third, the researchers must design the questionnaire and prepare to conduct the interviews based on guidelines provided by Meuser and Nagel (2009). The method suggests that the questionnaire should be utilised as a thematic guideline rather than a questionnaire to be administered verbatim (Meuser and Nagel, 2009). Finally, data analysis and theory building should reflect the knowledge of the expert interviewees. According to Meuser and Nagel (2009), certain aspects of data analysis should be considered, including the quality of data derived from experts, knowledge of the experts and the expert's credibility before including the data for analysis.
We followed these guidelines in the present research. We first conducted thorough desk research to define the problem and create a questionnaire. Our focus was to evaluate how big data can help fill the gap in formal international migration statistics. We approached over 50 NSIs from different regions and continents without location restrictions, aiming to include as many countries as possible. To find experts within the NSIs, we used various strategies. We actively contacted the NSIs through phone calls and emails, asking for professionals who are knowledgeable about the subject. Additionally, we searched for experts by reviewing their published works and their involvement in relevant workgroups, UN assemblies and academic conferences. Communication with these experts was done through email and phone conversations, ensuring a smooth and efficient exchange of information. A total of 29 NSIs agreed to collaborate, and experts who specifically worked on the topic provided written responses and completed the questionnaire.
The experts initially completed the questionnaire in the written format, and in certain cases, follow-up online interviews were conducted to clarify any unclear responses. Certain criteria were predefined for each expert contributing to the study. The first and foremost important criterion was that the expert would be formally affiliated with the NSI and had first-hand experience or competence in using big data. Second, the expert ought to possess sound knowledge about the overall role and activities of the NSI, and lastly, should be able to obtain information from other colleagues and staff in case certain aspects of the questions were not based on his/her expertise. Among the countries that participated in the study were those not using big data for migration. We included them because we needed to know
Results and analysis
The implication areas of big data for migration statistics
Innovative methods are being used to improve official statistics and estimates for human populations, and survey data are used to evaluate and improve the quality of insights derived from big data across disciplines (Hill et al., 2020). Different methods and strategies are used to complement traditional statistics with big data (Hsiao et al., 2023; Japec and Lyberg, 2020). As part of this research, it was important to investigate which innovative methods NSIs are using, particularly to extract official statistics on migration, and what the most common data sources are. Experts from Australia, Austria, Belgium, Bulgaria, Canada, Columbia, Croatia, Czech Republic, Denmark, Estonia, France, Georgia, Germany, Hungary, Italy, Latvia, Lithuania, the Netherlands, North Macedonia, Poland, Portugal, Slovakia, Slovenia, South Africa, Spain, Sweden, Switzerland, the United Kingdom and the United States replied to the questionnaire and sent their responses through email and other forms of communication. Among these countries, 15 of 29 used big data for migration, had a pilot project to experiment with doing so or have been involved in joint initiatives with other countries and data owning private companies to utilise big data. Table 1 below shows which countries use big data for general purposes and which countries use it for migration.
Use of big data by NSIs across countries.
Among the participating countries, 19 had experience with big data (i.e. Australia, Belgium, Canada, Columbia, Croatia, Estonia, Georgia, Germany, Hungary, Italy, Latvia, Lithuania, the Netherlands, Poland, Portugal, Slovakia, Slovenia, Switzerland and the United Kingdom). These experiences include exploring the possibility of complementary aspects of big data for official statistics, pilot projects, ad hoc projects, testing methodologies, pursuing proof of concept projects, comparing estimations and experimental analysis. Countries that used or conducted experimental use of big data, particularly for migration purposes, include Australia, Belgium, Columbia, Estonia, Germany, Hungary, Italy, Latvia, the Netherlands, Portugal, Slovakia, Slovenia, Switzerland and the United Kingdom. Fourteen countries use big data to study specific aspects of migration and mobility. The principal aspects of mobility and migration that were covered included (i) mobility patterns, (ii) internal migrations, (iii) stocks or real-life migration population estimations and (iv) flows and daily or short-term mobilities.
Mobility patterns in populations
Across countries were the highly explored concept that big data was used to extract statistics and information. Belgium, the Netherlands, Estonia, Germany, Hungary, Italy and Latvia reported using big data primarily to study mobility patterns. Through a pilot project in 2016, Belgium sought to understand mobility patterns by using mobile phone data, but the project stopped due to the provider's lack of access to the data. Estonia developed ad hoc projects during the COVID-19 pandemic that exploited mobile phone data to investigate the mobility of people during lockdown but not for official statistics purposes. Hungary uses web scraping and sensor data to experiment with mapping mobility. The UK Office of National Statistics initiated the ‘Data Science Campus’ to analyse and explore big data for the public good. One of their projects focused on mobility patterns, for instance.
Internal migration or mobility
Is the other concept that big data was used to produce statistics and information. Italy, Germany and Estonia used big data for internal migration. Germany uses big data to answer questions in the field of commuting, employ satellite and aerial imagery in crop statistics and maintain address registers. At present, France does not use big data to estimate migrant stocks and flows but is exploring using cell phone data to map internal migration. Italy has been working on big data since 2013 to complement its official statistics on mobility, enterprise characteristics, tourism, land cover and labour and price statistics.
Migrant stocks or real-life population estimation
Understanding migrant stocks or real-life population estimation is the other common concept that big data is currently being evaluated to produce statistics. Columbia calibrates the census list frame with new weights to calculate the unsatisfied basic needs in hard-to-reach places of the Population and Housing Census. They also use satellite images to estimate where there is no information. The Netherlands had a pilot project to use mobile phone data to estimate the real-life population at a specific time in particular locations. Due to privacy issues, this project has been put on hold. Statistics Lithuania is also working on a project for more detailed statistics incorporating small-area estimation methods and including administrative, new or alternative data sources.
Migration flow and short-term mobility
Are other aspects of migration for which big data is currently being explored to produce formal statistics. Latvia, for instance, has a partnership with mobile data holders to experiment with migration flow estimates and inland mobility. Portugal uses Facebook's ‘Data for Good’ initiative to investigate the population's daily mobility indicators at the regional level. Switzerland also had a pilot project to collect data on local daily mobility using mobile phone data.
Big data sources
Various big data sources were used to explore the possibility of extracting official statistics on the abovementioned concepts. Mobile phone, web scraping and social media data were, to a great extent, used to study mobility patterns in populations. Web scraping and mobile phones were used to map internal migration, and satellite images were the primary data source for statistical explorations on population estimations and stocks. Mobile positioning data (MPD), vehicle sensors and websites of accommodation systems were the principal sources of big data for daily and short-term mobilities. Other big data sources were also used for migration purposes but not by the NSIs under study.
Additionally, further common types of big data used for exploring the possibility of generating migration or mobility statistics include satellite images to determine population density in places that the census could not reach, mobile phone data, web scraping and social media data, including from LinkedIn, Facebook, Instagram and X (formerly Twitter). In tourism statistics, MPD and information about inbound and domestic visits and their characteristics are explored. Moreover, web portals holding online job vacancies, vehicle sensor data and smart sensor data are used to extract to study mobility.
Aside from migration and mobility, the most common big data sources used by NSIs include web-based data, including enterprise websites, labour and price portals, scanner data, some aspects of registers data and scanner data of retail chains. These sources are used to extract data on price statistics, tourism, supply chain and population statistics, among others. All these sources have been used for experimental purposes and to cover specific aspects of the topics.
Big data usage beyond migration
Beyond migration, countries experimented with big data in several areas to explore the feasibility of using big data for official statistics. NSIs under study mentioned using big data for extracting data on consumer price index, tourism statistics, business and financial transactions, agriculture statistics, job vacancies, inflation, locating residential addresses and unsatisfied basic needs.
Inflation and consumer price indices
Consumer price indices are the most investigated concepts by NSIs through big data. Canada, Georgia, Hungary, Lithuania, Portugal, Slovakia and Germany used big data for this specific purpose. Canada has exploratory initiatives to extract data on economic accounts and price indexes, among others. Hungary is using web scraping to calculate consumer price index and sensor data in tourism statistics for experimental purposes. Additionally, Statistics Lithuania possesses a test scanner dataset from one of the largest trade companies and has started analysing possibilities to use it for price statistics. Portugal uses internet search data from specific websites to extract consumer prices and international trade information. Germany uses web scraping data to produce data on inflation and prices.
Tourism statistics
Tourism statistics is the second most used concept that big data is used to extract information. Georgia is planning to use MPD for tourism statistics as a complementary data source. MPD will be used, along with ongoing tourism surveys in geo-stat, to get more detailed information about inbound and domestic visits in Georgia. Slovakia also uses web scraping data for tourism statistics.
Business and financial transactions
Extracting data on businesses and financial transactions was one of the major reasons for using big data. Portugal used data from online government platforms for business reporting (such as its E-factura and E-invoice portals) and tax and customs authorities to trace regional economic activities and evaluate financial transactions. Statistics Poland does not have access to the data on financial transactions carried out with payment cards but hopes to gain access to such data as the agency believes it could help with estimating the number of foreigners or analysing migrant populations in Poland.
Agriculture statistics
NSIs also report extracting statistics on agriculture and farming activities from big data. Various sources of big data were used for this purpose. For instance, Statistics Lithuania works together with Kaunas Technology University on projects related to the usage of satellites for agriculture statistics (e.g. crop identification) imagery data with the additional incorporation of georeferenced administrative data. Portugal uses imagery-derived data like land cover and land use maps to extract information on regular production and dissemination. They also provide territorial and environmental statistics from weather pollution sensor data.
Job vacancies
NSIs also reported utilising big data to extract information on other critical indicators such as employment and job vacancies. Statistics Lithuania is currently working on a web scraping data project to gather information on local job vacancies, enterprise characteristics and online prices.
Locating residential addresses
Estonia reported ad hoc projects using big data to compare residential addresses in administrative data to residential anchor points derived from call details record (CDR).
Measuring unsatisfied basic needs
NSIs also reported using big data to track unsatisfied basic needs as a way to measure household poverty. Columbia, for instance, uses big data to calculate indicators of unsatisfied basic needs.
Cross-border sales
NSIs also use big data in relation to cross-border sales. In a study reported by the Georgian NSI, the institute is evaluating various online features of a selected group of local firms. These features include whether the companies offer online bookings and ordering, maintain active social media profiles, provide advanced purchase or ordering functions on their websites and have the capability to execute cross-border sales. Portugal uses internet search data from specific sites to estimate the volume and prices of electricity on the wholesale European market.
Residential property price index
Countries conducted experiments to extract statistics on residential properties. Georgia used scanner data and web scraping to gather statistics in property price statistics and web scraping to collect data on dwelling prices and their characteristics. Besides, web scraping is used to collect prices for certain items in the consumer basket and canner data is currently being implemented in the survey of consumer prices.
Replacing or complementing traditional statistics
Studies have revealed big data's capacity to complement (and sometimes even replace) traditional statistics (Hsiao et al., 2023; Rango, 2015). As part of this research, we asked the NSIs how they used big data to complement migration statistics. The NSIs reported that big data is currently used to extract additional information (e.g. on temporary migration) rather than complementing or replacing traditional data. However, it is believed that in the future, it may have the potential to complement some specific missing indicators or variables of data on migration. Administrative data have the advantage of covering (almost) all changes of residence in countries. Therefore, it would be hard to fully replace them with the same level of quality and coverage (in particular for migration stock and flows). Also, big data cannot provide full and equal representation across demographics because administrative data and survey data is required to validate and calibrate statistics based on big data.
Another reason for the limited uptake of big data is the absence of a legislative framework to govern privacy protections. Even for mobility and short-term migration, big data cannot fully replace traditional statistics because it cannot collect all the necessary information about visits. Some information – expenditure, for example – can be partially collected, but not as reliably as traditional statistics using surveys.
LinkedIn is another potential source of complementary big data to obtain statistics on the educational attainment of migrants. The use of big data presents challenges when it comes to replacing all traditional survey indicators in those countries trialling this approach. While big data can offer broad insights, it often lacks the level of detail that traditional surveys provide, making it difficult to completely phase out such surveys. Additionally, the information derived from the new innovative sources could be used as auxiliary information to increase the accuracy of survey indicators. The core challenge, however, is ensuring representativeness or estimating bias using alternative or big data sources. Although many countries are trying to experiment with big data, to date, no pilot project has been conducted to compare the results of estimates from big data with traditional statistics by the NSIs. The UN and the Joint Research Centre of the European Commission have compared the overall Facebook data with the stock of migrants in Europe and beyond (Spyratos et al., 2018). The countries that are currently employing big data to augment their traditional statistics are often those that have been exploiting their administrative data for big data analyses. Finally, reasons cited for not using big data as a complement include limited regulatory frameworks, privacy and ethical issues and a lack of access to big data.
Barriers preventing the use of big data in migration statistics
Investigating the reasons countries select not to use big data for official migration statistics is equally important as those that used and addressed associated challenges in data. Understanding what hinders NSIs and other institutions from using big data is essential. Among the 29 countries responding to the questionnaire, 15 did not utilise big data for official migration statistics, and some of the countries that initially used it eventually discontinued its use. Several reasons were cited for not utilising big data.
The principal and heavily emphasised reason was the lack of access to such data. Countries such as Belgium, Latvia, Poland, Slovenia and others stated that the data is owned by private entities, limiting access for NSIs. The second reason is the absence of legislation to guide privacy rules and the management, storage and manipulation of data. Austria, Canada, Estonia, Georgia, Germany, Italy, the Netherlands, Poland, Spain and the United States have expressed concerns regarding privacy violations and deficient regulatory frameworks. Although authorised persons in Poland may legally access and use such data, the government has yet to establish stringent guidelines on privacy and ethics. Initiatives in the Netherlands and Belgium had to be halted due to the lack of government regulations.
Other reasons include determining which data to use for specific purposes and addressing data gaps. For example, France has expressed interest in utilising big data but lacks confidence in selecting the appropriate data for specific purposes. Similarly, countries such as Georgia are in the process of creating new methods to utilise big data and are actively investigating its potential applications. Currently, there is no standardised methodology for making statistical inferences, extrapolations or estimations based on innovative sources derived from big data.
Another reason for the underutilisation of big data is the lack of partnerships between data providers, universities, experts and NSIs. While NSIs are interested in assisting and making statistics available to researchers, they require guidance from researchers regarding the type and nature of data needed. Lastly, some countries, including Switzerland, Denmark and Sweden, did not employ big data as they believed their existing data is complete and there was no need to compensate for missing data using big data. Additional surveys and initiatives, such as those assessing the educational attainment of migrants, were used to complement missing information in their traditional statistics.
Challenges with exploiting big data for migration statistics
Some studies indicate that big data can address the gaps in traditional data for official statistical purposes, including statistics on international and national-level migrations (Tam and Clarke, 2015). However, NSIs believe that, currently, big data has limited utility in filling the gaps in migration statistics. For instance, Canada has stated that considering the already available big data, it would have limited value in measuring migration. Colombia, on the other hand, believes that big data could primarily aid in understanding the mobility and movements of populations rather than filling the gaps in migration statistics. Apart from timely measurement of population mobility and rough estimates of migration stocks, other gaps in migration data pose significant challenges.
During discussions, NSIs highlighted various reasons for the challenges in using big data to produce official statistics. The first recurring reason is the lack of access, as NSIs have been unable to thoroughly explore and experiment with a diverse range of big data sources and obtain complete observations of the data. The second reason is the lack of expertise and prior experience in utilising such sources among NSIs. NSIs often lack the necessary expertise to generate official statistics from big data sources. Many NSIs also indicated a preference for observing pilot initiatives by others before instituting in-house approaches. Countries with access to certain data noted the difficulty distinguishing migrants from non-migrants and other categories of travellers within the datasets. Additionally, determining the usual place of residence for migrants in the datasets proved challenging. Furthermore, the coverage and territorial specifications of the data are not always reliable.
Some NSIs, like Colombia's National Administrative Department of Statistics, lack the infrastructure to store and process large volumes of information and face financial constraints in acquiring big data. Estonia, which has experimented with big data, pointed out that the data does not equally cover all demographic groups. Using cross-border data poses particular problems for Germany since such data are difficult to access and the rights for utilising such data depends on serval authorities. Moreover, different countries have different privacy and ethical regulations, leading to uncertainty regarding which rules apply in particular cases. Overall biases in terms of data coverage and representativeness also present significant challenges. Italy is among the countries that address their statistical shortcomings using big data but faces issues with temporal coverage due to limited access to mobile network operator data, typically available for only a few weeks. Consistency, comparability and data quality and capacity are additional challenges to ensure regarding the given data.
Ethics and privacy
As mentioned above, privacy and ethics are major reasons preventing NSIs from using big data. Also, companies and entities that possess big data cannot share big data with NSIs and other research organisations for ethical reasons. Big data is considered sensitive, and countries have strict rules about sharing it. NSIs believe that big data has much potential for studying social phenomena and generating hard-to-collect data on different topics. Therefore, for the NSIs, more than privacy and ethics, having clear guidelines on usage, anonymisation, access and elaboration on ethical rules are more important. The Netherlands and Belgium, for example, saw considerable potential in big data, but they could not explore it due to privacy and ethical concerns.
Countries that use big data for official statistics are concerned about the ethics and privacy aspects of handling big data and have put in place safeguards to ensure that personal data is not exploited or misused. Countries had different strategies for using the data. Canada, for example, reported that sensitive information is accessed on a need-to-know basis, and their access to data must be justified officially, and acquisitions are listed on the StatCan website. Specific legislation governs such issues in other countries, including Estonia, Germany, Lithuania, South Africa and Poland. For example, in Estonia, two pieces of legislation – the Official Statistics Act and the Electronic Communications Act – govern ethics and privacy issues concerning data use. Other countries employing big data use anonymisation to protect individual data.
Legal barriers and propositions on improving big data usage for official statistics
The NSIs had specific propositions on measures to be taken to improve the usability of big data for official statistical purposes. Access to big data sources and the possibility of experimenting with big data was one of the major shortcomings. There was a broad consensus among all the countries (irrespective of whether or not they use it) on the potential of big data, which is yet to be explored. The Belgian NSI was adamant in answering questions about what could improve the usefulness of big data: ‘Data access – data access – data access – and the provision of resources to NSIs’. Other challenges include the dominance and monopoly of data by large private tech companies and platforms and having a framework and platforms where NSIs can access the sources.
According to the NSIs, most of the problems related to access, privacy and utilisation of big data could be worked out through explicit and comprehensive legal systems by the governments at different levels, including the international level, as the data, in most cases is not limited to geographical boundaries. The existing legal frameworks for official statistics across countries are designed for traditional and administrative data collection methods and are not developed for specific categories of big data and innovative sources. The NSIs also suggest that legislative acts should encompass specific and detailed policy guidance at the level of executive leadership, outlining which sources could be ingested, permitted use of such information, and additional privacy and security compliance analysis. In addition to increased funding for information technology, resources towards increasing staffing of personnel with some experience working with big data would be required to take full advantage of the available information in the data. Establishing and using common data standards would be an even greater challenge when consuming data from external sources, as entities in other countries may not necessarily follow the same (or any) data standards and definitions as those of others in the public or private sectors.
Some NSIs have already introduced initiatives to improve their big data use in building their own organisational capacity. Lithuania is in negotiation with the biggest trade companies to discuss big data. They are adapting new IT solutions, including text analysis, sentiment analysis, natural language processing and machine learning, to analyse alternative or new data sources and different methods and algorithms. Poland is developing initiatives aimed at learning and using big data. They have created a team that works on big data. Statistics Poland participates in meetings at an international level to explore using big data for official statistics. They also participate in several development projects in this field, including the European Statistical System Network (ESSNet) Trusted Smart Statistics–Web Intelligence Network 2021–2025. This project aims to create an environment for conducting research using internet data and implementing developed solutions for creating official statistics at the European and national levels. Big data will be used in internet job offers, company characteristics, property market, construction, online prices of household goods and hotel prices. Statistics Poland is the international coordinator of this project.
The other project is ESSNet Big Data 2018–2020, aiming to identify jobs by internet users, business characteristics, maritime statistics, earth observation, smart tourism, methodology and quality and smart statistics. Statistics Poland also participates in the Strategic Research and Development Programme (Gospostrateg), which uses various big data for several purposes. For instance, satellite-based crop identification and crop growth monitoring for agricultural statistics (SATMIROL) uses satellite data to identify crops and monitor the state of vegetation, building an integrated retail price statistics system (INSTATCENY) uses big data to measure changes in prices of goods and services, TranStat modernises the system for producing road and maritime transport statistics by using big data, National Statistical Organisations (ONS-UNECE) Machine Learning 2021 aims at developing cooperation and exchanging experiences in the use of machine learning in public statistics and ESSNet Smart Surveys 2019–2021 studies of the possibility of using applications (web, mobile) for the needs of European social research. Statistics Poland actively participates in the meetings of the big data community. However, they emphasised that using this type of data is not always possible, especially when private entities, such as mobile network operators, own the data.
Discussion and concluding remarks
Previous studies have shown that the utilisation of big data for official statistics presents both opportunities and challenges for NSIs worldwide (Salah et al., 2022a). Already in 2014, Struijs and colleagues laid bare the great potential of big data for official statistics and called for further research to scrutinise the use of big data for extracting different sorts of official statistics. In this paper, we show that the discussion surrounding the usage of big data for migration statistics raises several critical points and challenges. While we acknowledge the potential of big data in improving official statistics and estimates for human populations, we also highlight the limitations and hurdles faced by NSIs in utilising this data. Additionally, the existing literature simply points out the potential and challenges of big data in general or through case studies; we dig deep into the topic, foreground the perspective of experts that use the data, and discuss explicit areas in which big data is used across countries.
We argue that access to big data sources remains a recurring challenge for NSIs. Limited access prevents them from fully exploring and experimenting with diverse big data sources, hindering their ability to obtain complete observations. Lack of expertise and experience in utilising big data is another barrier NSIs face, as they often lack the necessary experts to generate official statistics from these sources. Reliance on observing how others’ initiatives pan out before implementing their own further hampers their progress. We highlight the need for legislative frameworks specifically tailored to big data usage in official statistics. Existing legal frameworks designed for traditional and administrative data collection methods are not adequately developed to address the challenges posed by big data. NSIs call for comprehensive legal systems that guide data access, privacy rules, storage, manipulation and compliance analysis. Standardising methodologies and data standards across countries is another challenge that needs to be addressed. Another significant obstacle is the issue of privacy and ethics. NSIs are cautious about accessing big data due to concerns regarding sensitive data and the strict rules in countries regarding data sharing. This ethical dilemma is further complicated by the inability of companies and entities possessing big data to share it with NSIs and research organisations. The NSIs emphasise the need for clear guidelines on usage, anonymisation, access and ethical rules, suggesting that these aspects are as critical as privacy and ethics considerations.
The issue of data coverage and representativeness also poses significant challenges. NSIs struggle to distinguish migrants from non-migrants and other categories of travellers within the datasets, making measuring migration difficult. Determining the usual place of residence for migrants is another complication. Moreover, the coverage and territorial specifications of the data are not always reliable, affecting the quality and accuracy of the statistics generated. Infrastructure limitations and financial constraints are additional obstacles for NSIs. Some countries lack the necessary infrastructure to store and process large volumes of information, while others face difficulties in adhering to the costs associated with purchasing big data. Different privacy and ethical regulations across countries create ambiguity and uncertainty regarding which rules apply, further complicating the utilisation of big data for migration statistics.
Despite these challenges, some NSIs have taken the initiative to improve their use of big data. Several countries actively explore and experiment with big data, incorporating innovative methodologies and IT solutions. They participate in international projects and collaborations to enhance their capabilities in using big data for official statistics. However, access to private entity-owned data remains a significant limitation for these efforts. The use of big data has predominantly been limited to experimental purposes, focusing on specific aspects of migration and mobility. Notably, big data has not replaced or directly complemented traditional statistics thus far. NSIs have primarily relied on mobile phone data, social media data, web scraping and satellite data to explore mobility patterns, internal migrations, population estimations and daily or short-term mobilities.
Beyond migration, big data has been employed for various ad hoc projects to collect additional information on specific variables or address specific questions. However, NSIs have not indicated a tendency to complement their traditional data with big data, even in areas unrelated to migration and mobility. Instead, big data has been used to compare the accuracy of traditional data in domains such as residential addresses and economic development. Examples include analysing tourism patterns, consumer behaviour and economic indicators derived from web scraping and other sources. Our findings demonstrate that big data sources offer the potential to overcome traditional limitations, such as the time lag in data collection and reporting. This timeliness can prove invaluable in a dynamic policy environment, where swift and well-informed decision-making is essential.
Nonetheless, it is crucial to acknowledge the challenges and barriers that impede the widespread adoption of big data by NSIs. Issues such as limited access to data, the absence of legal frameworks governing the use of big data, ethical considerations, data quality concerns and the need for specialised expertise and methodologies hinder its integration into existing statistical practices. Addressing these challenges requires concerted efforts from policymakers, researchers and data custodians to ensure appropriate data access, develop robust guidelines, foster ethical frameworks, enhance data quality assurance mechanisms and facilitate capacity-building initiatives.
In conclusion, the diverse nature of big data provides a comprehensive understanding of migration phenomena, enabling more precise measurement and reporting. While big data offers promising avenues for improving migration statistics, numerous hurdles must be overcome to maximise its potential. By incorporating real-time insights and detailed mobility patterns, policymakers in the EU can develop targeted migration policies that effectively address the complexities and dynamics of human mobility. Our study underscores the importance of defining big data, clarifying its role in statistical analysis and establishing standardised methodologies. Access to data, legal frameworks, expertise and knowledge sharing are vital factors in facilitating the integration of big data into official statistics.
By addressing these critical issues, NSIs can navigate the complexities of utilising big data and unlock their full potential in informing migration management and policymaking. To overcome the challenges, experts suggest the implementation of clear legal guidance on ethics and data access, as well as initiatives promoting knowledge sharing and experience exchange among NSIs. By addressing these issues, NSIs can deal with the challenges surrounding big data utilisation and establish a supportive ecosystem that facilitates data access, ethical considerations and the development of expertise in analysing and interpreting such data. Finally, we recommend that future research should prioritise specific case studies in various regions, particularly in countries with lower economic conditions, to generate official statistics. Additionally, it is advisable to conduct case studies where NSIs employ big data for official statistics and scrutinise their methods for potential applicability in different countries and contexts.
