Abstract
At 09:00 UTC (10:00 British Summer Time) on 24 June 2014, two different instruments located at Sheffield Weston Park weather station observed a temperature of 18.5℃. One of these instruments, owned by Museums Sheffield, generated a data point that was recorded in a CSV file. Later that day, a curator at Weston Park Museum uploaded it to an Access database. From here, the curator circulated it in the local community via Twitter, and at the end of the month, it was automatically emailed to the Met Office as part of the station’s climate record for June 2014. The second datum was generated by equipment owned by the Met Office and was automatically transmitted by electromagnetic signal to the World Meteorological Organisation’s Global Telecommunication System. From here the datum replicated as meteorological organisations around the world, including the Met Office data centre in Edinburgh, downloaded and ingested it into their systems. At this point, the datum replicated again and began to travel down a series of different paths. In almost real time, Met Office weather forecasters in Exeter incorporated the datum into their numerical weather prediction (NWP) system. Simultaneously, the Met Office distributed the datum, for a fee, to commercial actors such as firms that supply meteorological data to the weather derivatives industry. At a slower pace, the datum also travelled to the MetDB synoptic database, from where climate scientists calculated climate averages for June 2014 before saving the data to the MIDAS database to be used by other climate scientists.
The life of data
In this article, we discuss our development and piloting of a new methodology for illuminating the
The observation that data are socially constituted objects is not new. Earlier work in the philosophy of science has explored data as something that are produced within a social context (Jensen, 1950 in Kitchin, 2014b). More recently, social scientists researching scientific knowledge infrastructures have pointed to the ways in which the conceptualisation of data as neutral and objective is mistaken (Bowker and Star, 2000). For Bowker (2008), ‘“raw data” is both an oxymoron and a bad idea’. Through ‘inverting’ (Bowker, 1994; Bowker and Star, 2000) and looking beneath the surface of knowledge infrastructures and recognising them as social and relational, scholars have explored some of the complex and often invisible political, cultural and ethical processes that contribute to their development (see Bowker et al., 2010; Edwards, 2010; Star, 1999). Gitelman and Jackson (2013: 4) argue for the adoption of this ‘infrastructural inversion’ approach for understanding the socially situated production of ‘Big Data’, calling on researchers to ‘look under data to consider their root assumptions’ and to question the material conditions of their production.
Yet, for the most part, critical research on emergent ‘Big Data’ practices and infrastructures has remained at the conceptual and theoretical level (Kitchin, 2014b). Whilst various calls have been made for critical engagement with the philosophical and methodological assumptions surrounding ‘Big Data’ (boyd and Crawford, 2012; Dalton and Thatcher, 2014; Gitelman and Jackson, 2013), relatively few scholars have conducted empirical work on specific ‘Big Data’ practices. Amongst those that have, many have remained external to sites of data practices, relying upon documentary analysis to inform empirical investigation (Hogan, 2015; van der Vlist, 2016; Williamson, 2015). Yet, in order to contribute to the development of alternative futures in which ‘publics might be said to have greater agency and reflexivity vis-à-vis data power’ (Kennedy and Moss, 2015), it is important that critical ‘Big Data’ research gets ‘under the hood’ to grasp how local and situated ‘Big Data’ practices structure how data work in the world, and thus how particular practices, and their social consequences, might be ameliorated. There is therefore a growing need for methodological approaches that are able to capture detailed empirical understanding about ‘Big Data’ in practice, including how socio-material factors influence the constitution of data objects and shape how they move through space and time connecting different sites of practice across vast data infrastructures.
Outside of this body of work on ‘Big Data’, the published research that empirically examines data practices from a sociological perspective tends to be rooted in the field of infrastructural studies. Influenced by actor-network theory (ANT) and similar Science and Technology Studies (STS) methodologies, this field of research emphasises the production of detailed accounts of specific knowledge or data infrastructures and the intra-network politics of their development (e.g., Edwards, 2010; Leonelli, 2013; Ruppert et al., 2015). The
The structure of the paper is organised as follows. We begin by discussing the materiality of digital data, and how this relates to the social world. We then present a theoretical rationale for the
The materiality of data
In order to address questions relating to the
In foregrounding the materiality of data, we draw upon Harvey’s (1991) work on the ‘internal relation of the material structure of ideas’ to recognise that ideas and values do not work as an external controlling force over data practices, rather they act as a framework and justification for the activities that practitioners are engaged in (Bieler and Morton, 2008: 118). Rather than externally shaping material forms, ideas and values ‘tak[e] on substance through practical activity bound up with systems of meaning’ that are often embedded in the economy (Bieler and Morton, 2008: 119). Whilst acknowledging this interrelationship between the socio-cultural and the material, we also recognise the necessity for some analytical separation of the two categories. In so doing, we aim to avoid imagining the ‘socio-material’ as a constitutive entanglement (Orlikowski, 2007) pre-existing perception, and instead recognise socio-material structures as being historically constituted through the actions of both historic and present-day human actors (Bieler and Morton, 2001; Leonardi, 2013).
Data journeys: Theoretical framework
Drawing upon the above framework, we developed and piloted the
Interest in the movement of data through space is seen in a number of research areas. For example, Beer and Burrows (2013) draw upon Mackenzie’s (2005) concept of the ‘performativity of circulation’ to explore the role of popular culture in the accumulation and flow of new forms of social data. Similarly, researchers in the interdisciplinary field of mobilities studies have examined the ways in which the movement of people, objects, capital and information impacts social and economic life (Sheller and Urry, 2006). In the more empiricist traditions of Information Science, analysis of the flow of data across their lifecycle within information systems and knowledge infrastructures is relatively common. Sands et al. (2012), for example, develop a ‘Follow the Data interview protocol’ to study the flow of data leading into and out of astronomers’ research publications in order to understand the people and infrastructures responsible for the development of large astronomy sky surveys. Similarly, McNally et al. (2012) use the ‘data flow’ concept in research design to produce detailed accounts of the durability, replicability and metrology of flows of data within data intensive research contexts, and examine how ‘people, infrastructures, practices, things, knowledge and institutions’ work together to shape the flow of data through these spaces. These bodies of research have produced detailed pictures of data flows and practices across a range of contexts. However, in general, they have tended to emphasise the internal dynamics of specific knowledge infrastructures and information systems, and neglected to situate these practices in relation to the wider socio-material contexts and power dynamics shaping their development.
Whilst academic research in this field has tended to refer to the ‘flow’ of data within a given context, the term ‘flow’ tends to suggest a disconnect of data from physical sites of data practice. The concept of a
The term
Drawing on these ideas, we began to imagine a research design in which the researcher moves through space following data on their
These spatial dynamics of digital data are shaped by the historically constituted socio-material conditions that human actors encounter, reproduce, subvert and ameliorate as they engage in practices of data production, processing, use and distribution at different sites. In order to fully grasp the spatial dynamics of data journeys, it was therefore important also to explore their historic development, the way they have evolved over time and how the transforming shape of
Data journeys: Research design
Taking these theoretical observations into consideration, the design of the Secret Life of a Weather Datum project aimed to illuminate the socio-material constitution of meteorological data objects and flows. We began by conducting an initial mapping of key data journeys. Drawing upon initial desk research, we began by identifying UK-based sites of weather data production and use across state, science, market and civil society. We then mapped the journeys of data between relevant organisations, projects, datasets and individuals using post-it notes on flipchart paper (Figure 1). This visual representation was then adapted as new information was gathered about the detail of the Initial Rich Picture Mapping of Data Journey (anonymised).
Our initial mappings allowed us to identify a number of potential data journeys to explore, and after making initial enquiries regarding access to research sites, we decided to focus on the journeys of data produced at our local weather station (Sheffield Weston Park) and data produced by amateur weather observers and citizen scientists. We then followed these data on their journeys from sites of production on into processing by the UK’s Met Office, and on into re-use in climate science and financial markets. We also explored the intersecting journeys of data generated by amateur observers and citizen scientists. As well as identifying key informants through desk research, snowball sampling techniques were also adopted once we were in the field. In total, primary data were gathered in relation to eight sites of data practice: Sheffield’s Weston Park weather station, Met Office headquarters in Exeter, the Climatic Research Unit at the University of East Anglia, the Inter-governmental Panel on Climate Change (IPCC), archives that store historical weather observations, the Old Weather citizen science project, amateur weather observers in distributed locations, and a firm that supplies weather data to the weather derivatives market. At each of these sites, primary data were generated including, as appropriate to each site, in-depth interviews incorporating an oral history component with data practitioners and other relevant individuals, field observations involving reflective field notes and photography, digital ethnography of selected forums and Twitter hashtags and documentary analysis of policies, legislation and other relevant sources. Through adopting an element of oral history interviewing in our conversations with participants, we were able to draw upon their memories of the development of the infrastructure in order to construct an evolutionary and dynamic picture of the
The primary data we generated were used to illuminate the journey of data through and between each site, the specific data practices that people were engaged in at each site, the socio-cultural values that framed and were used to justify participants’ data practices, and the varying material conditions and institutional contexts of these practices, including an analysis of the public policies and legislation that shaped the movement of data between sites. We also aimed to uncover tensions and changes in the socio-cultural constructs that practitioners were bringing to their data work at different sites, and explore how these constructs are interrelated with the broader socio-material context.
Our initial findings have been published on a public facing website – http://lifeofdata.org.uk – that was developed as part of the project. The interactive website draws upon a tube map metaphor in order to represent visually the journey of data as they move between the different sites of data practice that we explored. Each of these sites is represented by a ‘clickable’ station on the tube map, and within each station the user is invited to explore the different data practices, cultures, and public policy frameworks that contribute to the production of digital data, and their movement between, and use across, different sites. Where permissions from research participants were, granted original research data including audio interviews and photographic images are embedded into the website to bring the story to life. The dynamic nature of the infrastructures we explored is also manifest in the design of the website through our efforts to represent participants’ memories of particular moments during the evolution of data practices over time (see Figure 2).
A screenshot of the project website http://lifeofdata.org.uk (as of 30 June 2016).
Data journeys: Insights and reflections
Through adopting the
The socio-material constitution of digital data objects
Our analysis demonstrated the ways in which the practices of those who produce meteorological data are bound up in complex systems of meaning that crystalise in the
The Weston Park weather station was founded and first began recording data in the 1880s in response to a fatal outbreak of diarrhoea in the city. It was suspected that the cause of the outbreak was related to weather temperature, but doctors needed data so they could view the problem through an informational lens, and ultimately predict future outbreaks and improve the public health of the city. The local Corporation, under pressure from the Department of Health, asked the museum curator Elijah Howarth to build and run a weather station. He decided to locate it at Weston Park, which was conveniently next to his place of work. The weather station has been active since this time and has produced one of the longest and most complete climate datasets on record.
Since the 1880s, responsibility for the station has passed down through generations of curators. The museum curator who currently looks after the station told us stories about how these individuals – barring a short period of variability in standards during the early years of the station – took great pride in looking after the station and the data it has generated (both digital and paper records), contributing to the durability and persistence of the climate dataset over the years. Data generated by the station are produced to international climate data standards. Prior to automation this was not an easy task, and over the years curators have been found at the station on Christmas day in order to reset instruments to ensure accurate data generation. The curator perceived the station as part of the local fabric of Sheffield and remembered how local people came to its rescue – fundraising £3000 in a ‘matter of weeks’ – when an expensive piece of equipment was stolen from the roof of the museum.
Over the years, the physical infrastructure of the station has survived other threats. It came unscathed through a bombing raid on the museum in World War Two – during which the curator braved high winds on the roof of the museum to capture hourly wind readings for the military. More recently, the station has been threatened by funding cuts to museums in the wake of the economic crisis, which have led to staffing cuts. In order to adapt to some of these pressures and ensure the continuity of the weather station, the previous curator allowed the national Met Office to add its own weather observation instruments to the weather station compound in 2010. This new equipment generates data alongside the museum equipment. It feeds its data in real time to the Met Office via the WMO Global Telecommunication System and is part of the national synoptic network. As the curator describes, this adaption in the physical infrastructure, whilst likely necessary to ensure the continued presence of a weather station at Weston Park, means the Met Office no longer depends on the climate data generated by the museum equipment. This development has resulted in a reduction in power for the local museum in its relationship with the national Met Office, and the emergence of a tension between the value system of the curator who looks after the weather station and perceives it and its data as part of the cultural heritage of Sheffield, and the more technocratic values of the distant national Met Office that emphasise efficiency, speed and volume of data.
In discussing his work and the history of the station, the curator expressed strong values of public service, civic duty, resilience, pride in his contribution, and responsibility to his forbearers, local community and data users. It was clear that the historical mix of cultural values, practices and material conditions of production outlined above inform and frame the values and practices of the current curator. We observed how, enabled and empowered by this history, the activity of the curator to protect, maintain and run the museum weather station and look after the Met Office equipment in the context of current socio-material conditions resulted in (1) the production of these two specific data points at that particular moment in time, influencing their accuracy, timing, unit of measurement and so on, (2) the specific
Similar observations can be made of all other digital data that we observed across sites of weather and climate data practice. For example, the specific form of the CRUTEM4 global climate temperature dataset (Jones et al., 2012), which is derived in part from data generated at Sheffield Weston Park, has been shaped by struggles between climate scientists and climate change sceptics. After a prolonged and complex struggle involving the Climatic Research unit’s (University of East Anglia, UK) email systems being hacked and a government inquiry into the Unit’s scientific practice – which found accusations of misconduct ‘patently false’ (House of Commons, 2010) – climate scientists accepted the sceptics’ demand for full transparency of the underlying weather station data feeding into the CRUTEM datasets. However, this means that some data series cannot be included in the latest version, CRUTEM4, because of the socio-material conditions of their production – that is, for a variety of economic, socio-cultural and political reasons, their source country prohibit the data being made publicly available. These struggles around the publication of underlying station data have therefore crystallised in the specific form that the CRUTEM4 dataset takes, and gained substance as a result of the practical activity of climate scientists’ decision making and negotiating around which specific data series can, and cannot, be incorporated into the global dataset.
Friction in data movement
Through examining the journey of data between different sites, we were able to identify factors that enable and restrict the movement of data across infrastructures, and observe sites of potential movement, blocked movement and lack of movement (Sheller, 2011: 6). It was evident that whilst data are often mobile between sites, they do not necessarily move smoothly or easily from one place to another – they experience ‘friction’ (Edwards, 2010) as a result of the complex socio-material contexts they exist within.
Most obviously, we can observe the diverse forms and levels of friction experienced by data as they move along different types of path through space and time from historical ships into the International Comprehensive Ocean Data Set (ICOADS). The journey begins with the slow movement of handwritten data points inscribed in the log books of Royal Navy ships. The data in these log books spent time slowly traversing the oceans, before being removed from ships and deposited in archives around the world where they were boxed up and left untouched for years in varying states of decay. We can then observe the ‘friction’ reduce as climate scientists began to recognise the
We can also observe how policy makers, commercial actors and civil society campaigners have recognised
The weather observation data produced by Museums Sheffield at Weston Park, for instance, was shared readily with the Met Office, as well as with students and researchers at the local universities. The curator was also responsible for fostering a rich local data ecology in which weather station updates were shared with the public on Twitter and in the local newspaper. However, whilst strongly in favour of the idea that these data belonged to the public, the curator was wary about Open Data policies given that the sustainability of the weather station was dependent upon the small-scale commercialisation of the weather data it produced; a factor that seemed unlikely to change soon given the financial challenges posed by recent cuts to public funding for museums. These material conditions that shape the production of the museum’s meteorological data have a significant impact upon the curator’s understanding and practices regarding opening data generated by the museum equipment, highlighting how ‘friction’ in the movement of data can reflect, and be shaped by, power dynamics at play in the wider context. In this case, the curator’s efforts to keep the museum weather station going in the face of significant reductions in public spending, and a reduction in the importance of the museum’s data since the Met Office installed its own equipment at the site, is dependent upon creating and maintaining some data ‘friction’.
Elsewhere at different sites across the infrastructure we can observe some policy makers, politicians and financial market actors pushing to reduce ‘friction’ in data movement by calling for the removal of charges for commercial re-use of Met Office historic and real-time bulk data in order to reduce costs and spur innovation in the weather derivatives industry. As a public sector Trading Fund, the Met Office is institutionally obliged to generate revenue through the commercial exploitation of the goods and services it produces, although in the case of the Met Office revenue comes primarily from the services it is contracted to provide to the UK government and public sector. These material conditions of Met Office data production and processing mean that, similar to the curator at Weston Park, there has been some resistance to Open Data policies that are perceived to risk the financial stability of the organisation during an era of deep public sector restructuring. Despite these issues, some meteorological data have been opened. However, even when there is the will to make high volumes of frequently updated, highly detailed data available for others to re-use, material barriers exist to making that happen in practice, for example, as a result of datasets being updated four or more times a day, and models getting bigger and more detailed, the volume of data being processed is significant and growing. The sheer volume of data therefore presents a material challenge when the Met Office wants to make data available to third parties.
Overall, our findings demonstrate that where data do and do not end up on their journey from production through to re-use is influenced by a range of inter-related socio-material factors, including: the material conditions of their production and efforts to sustain physical infrastructures and institutions given this context; the material form and properties of data such as their size and speed; the relative power and influence of different actors who desire to shape the movement of data; and the socio-cultural values framing beliefs and practices around the role of publicly funded infrastructure, data sharing and valued forms of data re-use. Illuminating the causes behind these shifting patterns of ‘friction’ as data move, or not, between sites of data practice has the potential to provide a fascinating insight into the power dynamics that are shaping emergent material conditions of production.
Data objects as mutable mobiles
Observations of what happens to digital data when they do move indicate that ‘mutability’ is an important
The mutability of objects as they move through Euclidean space, for example, between two sites located in different geographical locations, has been explored within the field of ANT and, more broadly, STS. The
Significant examples of data being adapted as they move through different sites are practices of data cleaning and homogenisation. Every day data arrive from meteorological offices around the world at the firm that supplies data to the financial markets. A team of 3 to 4 people then spend all day, every day, analysing the data, looking for unusual readings and missing data points for stations including Sheffield Weston Park. If errors or gaps are found in the data, data points are adjusted and filled in based upon the team’s climatological knowledge and readings from surrounding stations. Data points are also altered based upon knowledge of particular weather stations gained from historic and present-day station metadata; for example, if the weather station location or instruments have changed over the years, incoming data are ‘homogenised’ in order to correct for the resulting divergence in the observations. The changes made to particular data points enable incoming data from different sources to be aggregated in a way that generates a more uniform representation of the weather appropriate to the context of use. A detailed audit trail of any changes made is saved in the database. A very similar process happens when data arrive at different sites across the infrastructure, for example, the Met Office and the Climatic Research Unit. However, the temporal dynamics of these mutations differ; for example, the pace at which data are cleaned and homogenised is much slower for climate data processing, where accuracy is more important than speed, than for data being fed into the weather derivatives industry and forecasting where immediacy is vital. Further, whilst the mutability of digital data is always technically present, social factors intersect to restrict this mutability at particular points. When a particular version of a dataset is used in a weather derivatives trade, for example, data become immutable – they will not undergo any further alterations, as this would impact upon the validity of the contract. Similarly, when a new version of a climate dataset is published, for example, CRUTEM4.4.0.0 (Jones et al., 2012), data points are made temporarily immutable. However, unlike in the case of the weather derivatives contract, they may change again at a future point when the next version of the dataset is published. In many cases, these practices of cleaning and homogenisation of data that take place at different sites are undertaken in order to generate datasets that are accurate and complete enough for the purposes to which they are to be put, whether that be climate research or financial market trades.
Our research only touched on a few examples of data mutability across a small number of sites. However, the reproducibility and reconfigurability of digital data allows these mutations to happen simultaneously, at scale, and largely independently of one another, factors which contribute substantially to the complexity and scale of data in existence. Their mutable nature enables digital data to be re-used, re-purposed, and put to work for different purposes in different places and contexts, factors which contribute to driving the movement of data between different sites across infrastructures. Whilst the value systems of some of these different sites of data practice may conflict, for example, those of Weston Park and the data supplier to the financial markets, the data as ‘mutable mobiles’ connect these different human actors in complex, often invisible, relations that together form a key component of emergent socio-material conditions.
Conclusions
As it becomes increasingly clear that emergent ‘Big Data’ practices across a variety of domains are contributing to the re-constitution of socio-spatial relations, it is crucial that methodologies are developed and research conducted that help to illuminate the concrete ways in which ‘Big Data’ are constituted through complex socio-material practices, and how they contribute to the ongoing reconstruction of socio-spatial relations. The
We identified the importance of historically constituted socio-cultural values in shaping practices of data production. The dedication and pride taken in the production of data at Sheffield Weston Park over the years contributes directly to their scientific and economic value at other sites including the Met Office, climate science and financial markets, as well as to the data and the weather station being valued as an important part of the cultural heritage of Sheffield. We also identified how the material properties of digital data – for example, their volume, mutability and durability – impact on how data move between different sites. Increasing volumes of quality data may contribute to their desirability for potential re-users, but can also pose challenges for how best to distribute data between sites. Their mutability – the ways in which they can be cleaned, homogenised, and otherwise re-configured, linked and aggregated with other data – increases the re-usability of data, and therefore contributes to generating demand and driving the movement of data between sites. Through drawing attention to the broader power dynamics influencing the evolution of these practices – particularly material conditions of production such as a lack of public investment and funding for data recovery projects and sites such as Weston Park, as well as broader questions in the UK about the structure and governance of public institutions such as the Met Office in the context of a deep restructuring of the state – the approach also shed light on the ways in which data practices and journeys are deeply politicised. We observed that different actors were working to influence the distribution of data between sites for a range of political and economic ends from the restructuring of public institutions, to the protection of local infrastructure and cultural heritage in this context, to efforts to deepen the financialisation of climate uncertainty through pushing to open data used by the weather derivatives industry. Overall, the
