Abstract
This article is a part of special theme on Knowledge Production. To see a full list of all articles in this special theme, please click here: http://journals.sagepub.com/page/bds/collections/knowledge-production.
Introduction
For well over a decade, we lived with the hyped claim that Big Data, or processes of datafication, would revolutionise knowledge production practices. Soon, we were told, the wonders of data-driven algorithmic approaches would give us perfect information, real-time insights and smarter decision-making. A lot of water has passed under the bridge since then, and Chris Anderson’s (2008) call for disruption and ‘the end of theory’ has been challenged and contested from multiple sides, with scholars outlining and emphasising Big Data’s historical trajectories and structural reproductions (boyd and Crawford, 2012; Gitelman, 2013; Kitchin, 2014). Such sobering accounts remind us that datafied knowledge production involves similar contingencies, limitations and complexities to previous forms of knowledge production.
The articles in this special theme contribute to emergent conversations that seek to take stock of the status and nature of datafied knowledge production, and they engage with them through more elaborate explorations of how datafied knowledge depends on the contexts of its production and the forms of knowledge production that precede it in those contexts. Our basic argument is that while the resources, material features and analytical operations involved in datafied knowledge production may be different, many fundamental concerns about epistemology, ontology and methods remain relevant to understand what shapes it. We still need to understand and explicate the assumptions, operations and consequences of emergent forms of knowledge production. If datafied knowledge production is neither a clean revolutionary break with past forms of knowledge production nor a balloon of pure hype, we must ask: what does this phenomenon look like? Which digital and datafied infrastructures support its future development? What potentialities and limits do such forms of analysis and knowledge production contain?
The articles in this special theme offer contextual and conceptual enquiries into the particular conditions of knowledge production to which digital transformations and processes of datafication give rise. The articles all deal with the shapes and dynamics of technological transformations which have paved the way for novel information ecosystems. The articles also link these aspects to canonical questions that have historically defined our engagement with knowledge production practices: what counts as knowledge, and what epistemological assumptions are at play? What are the resources that go into the production of knowledge? What methods and procedures does knowledge production involve? While such questions are well known, the articles in this special theme provide some unfamiliar answers, and sketch the contours of novel forms of knowledge production in a digital and datafied world.
Scholars and the mass media often describe the operations of data analytics, algorithms and digital operations as enclosed within black boxes, where knowledge itself is the result of hidden and powerful forces operating inside opaque, closed and commercial technological systems (Pasquale, 2016; see also Chun and Keenan, 2006: 18). Yet, in recent years, scholars have also begun to advocate the importance of moving beyond the image of the black box. As Noble and Roberts (2017) note, datafication processes and their infrastructures require ‘great amounts of power, space, and other environmental resources and vast infrastructure’. Even so, few stop to think through the many dimensions of knowledge creation and dissemination, or about how black-boxed problems often stem from active and strategic obscuring. Rather than resigning themselves to accept the black box and working out strategies to circumvent it, Noble and Roberts argue for the necessity of ‘step[ping] out of the black box’ through increased work across old disciplinary boundaries. In this they are joined by a chorus of critical digital media scholars. Geiger (2017) argues that ‘scholarship and practice must go beyond trying to “open up the black box”’ of computational systems to ‘also examine sociocultural processes’. Taina Bucher (2016) suggests that ‘the figure of the black box constitutes a distraction’, and that ‘moving beyond the notion that algorithms are black boxes’ to instead use ‘well-known methods’ in new domains will be more generative, providing us with ‘knowledge about emerging issues and practices’. Velkova and Kaun (2018) outline a research strategy for ‘demystifying the black box and with it reducing the sense of fear of algorithmic governmentality’. Seaver (2017) too argues for the usefulness of a methodological ‘tactic’ that will ‘enact algorithms not as inaccessible black boxes, but as heterogeneous and diffuse sociotechnical systems, with entanglements beyond the boundaries of proprietary software’. This special theme aligns itself with the turn outlined above, evoking voices that through different methodological strategies displace discourses of the black box in favour of theoretical and practical interventions that engage more intimately with Big Data methodologies, their epistemological assumptions and their ecological entanglements.
Shapes and dynamics of datafied knowledge production
The article by communication theorist Anja Bechmann and information scholar Geoffrey Bowker argues that careful attention must be paid to the processes involved in the application of machine learning – process that involve human knowledge production at critical junctures. The authors argue that classification theory provides a fruitful framework for explicating the human component of seemingly automatic processes, and they illustrate this with examples from laboratory-based studies.
Shifting the focus away from machine learning, the article by Anders Koed Madsen shows the difficulties of doing datafication in practice and not only in theory, and takes us to the negotiating table in a Danish municipality that is seeking to implement visions of the smart city. The article shows that these negotiations are no less influenced by knowledge production imaginaries than by real insights – and are also determined by strategic concerns over data ownership, even when datafied knowledge production demands a radical form of data-sharing.
Often, when digital infrastructures are contemplated, only the technologies are foregrounded. Yet, as feminist, critical race and postcolonial scholars have highlighted (Agostinho and Thylstrup, 2019; Anand, 2017; Bowker and Leigh Star, 1999; Star and Ruhleder, 1996), the questions we ask about digital networks and infrastructures should seek to break out of this strict technological focus, since technologies operate by virtue not only of their technical affordances but also of the labour of those who build and maintain them. Digital transformations and infrastructural developments make up the backbone of datafied forms of knowledge production, and include the emergence of digital platforms, digital traces as a resource, and automated, algorithmic forms of analysis and visualisation.
This special theme approaches knowledge infrastructures as ‘networks of people, artifacts, and institutions that generate, share, and maintain specific knowledge about the human and natural worlds’ (Edwards, 2010: 17). In her article, Daniela Agostinho spotlights a knowledge production process that has become a genre in its own right: data visualisation. Agostinho’s work unfolds the underlying logic of the optical metaphors (such as the microscope) that are prevalent in the popular discourse on datafication, highlighting its epistemological ramifications. Drawing on theories from visual culture, the article suggests that the language typical of promotions of Big Data reproduces historical imaginaries of the privileged power of vision. The relationship of datafied knowledge production to deeply ingrained modes of sense-making is also the focus of the article by literary scholar Kristin Veel. Taking as her point of departure systems that use natural language generation to communicate analytical outcomes, she shows how theories of narrative from literary studies can be used to disentangle and contextualise the claims made by the producers of such systems. By invoking the core human ability to comprehend and decode narrative forms, the automated production of narratives as a vehicle for the (supposedly) efficient transmission of complex information also brings back questions about the precise nature and potentials of narrative forms.
Thus, several articles in the special theme deal with the nature of datafied knowledge production, and investigate how its premises and limitations relate to earlier types of knowledge production. The focus on specific features and dimensions of datafied knowledge production is an invitation to consider what is different – and what is not so different – about datafied forms of knowledge, i.e. how they relate to earlier and existing modes. It also points to the value of probing datafied knowledge production in the same manner, and by asking the same questions, as we have asked in the past.
In their article, organisation theorists John Murray and Mikkel Flyverbom draw attention to how datafication alters the processes of knowledge production. Adding a new term to the growing vocabulary of critical digital literacy, they introduce the concept of ‘datastructuring’, which highlights datafication as both product and constitutive of social activity. Information is actively collected and shaped, yet it is also integrated into new technologies, where it in turn shapes future actions. They argue that datastructuring is a useful concept for systematic interdisciplinary work, and forms a bridge between normally disjunct scholarly debates.
Finally, Nanna Bonde Thylstrup examines the political economy that subtends these datastructuring processes, expanding the theoretical scope of digital capitalism and datafication to include the emerging field of studies on waste, discards and recycling. At the heart of Thylstrup’s argument is the claim that datafication operates as a waste-handling process, extracting, processing and recycling data, while also strategically positioning data and digital traces as by-products rather than primary products. By invoking the term ‘data waste’, the article foregrounds datafication’s reproduction and rupturing of previous forms of informational materiality, epistemology and political economy.
Taken together, several articles in the special theme sketch the contours of new forms of knowledge creation and infrastructure, which on the one hand tie in with previous modes of knowledge production, and on the other also present us with features and dynamics that set them aside from those of the past. We do not claim that knowledge production has been disrupted, or that datafied knowledge production has or will replace other types. Our claims are more modest, attending not to the hype of data, but rather to the ways the dust of Big Data settles in mundane operations and infrastructures (Helles and Flyverbom, 2019), and how these are quietly transforming how we see, read, organise, use and dispose of knowledge.
Beyond the contributions of the individual articles, the special theme as a whole also suggests conceptual and empirical approaches that may be fruitful for organising future studies of datafied knowledge production. One is to consider the resources at play in datafied knowledge production. By this we mean a focus on the kinds of data that are picked up and used in such forms of analysis. The standard approach has been to point to the volume, velocity and variety of data sources, but we need to be specific when we consider how social activities are turned into data points (Flyverbom and Madsen, 2015). The differences between, for instance, social media data, news content, sensors and various forms of metadata have consequences for the analytical operations and forms of knowledge involved. This focus invites investigation of the translation processes and intersections between numbers, narratives, visualisations and other resources that shape datafied knowledge production. Such accounts should also explore concrete processes of data-cleansing, and principles about data quality, scale and messiness. This would also highlight issues related to the reuse, mobility and circulation of data that often remain out of sight. By focusing on the types and features of different data sources, we can start to make sense of the specificity and foundations of datafied knowledge production.
We also suggest that it may be valuable to focus on the concrete and practical techniques used for the aggregation, sorting, circulation and visualisation of digital traces. Such discussions imply a focus on the training and development of analytical techniques and technologies, as well as the criteria and categorisations used when digital traces are turned into forms of knowledge. The focus on how data is sorted may also involve more abstract discussions of the logics (predictive and others) underpinning datafied knowledge production. We emphasise the need for such critical work in the face of the promotional language of Big Data, arguing that the foundations of datafied knowledge production should be analysed along the same lines as earlier sociological accounts of the logics of numbers and statistics (Desrosières, 1993; see also Wernimont, 2019) or narratives and storytelling (Czarniawska, 1998; Ricœur, 1990).
Finally, we suggest that studies of datafied knowledge production should include a focus on the relationship between data and human existence. Big Data analyses are often cast as more directly connected to human activities, but we need to consider and problematise this link (Alaimo and Kallinikos, 2017). Such questions are not only about the validity of data, but also about scalability (Tsing, 2012) and possible forms of behavioural modification (Zuboff, 2019). Such accounts of datafication’s proximity to everyday lives and human experiences may offer new insights into the relationship between human lives and datafied knowledge, as well as new discussions of the entanglements between data, politics and the social order.
These types of question, we suggest, will be useful if we want to approach datafied knowledge production not as a fixed paradigm, but a set of techniques, operations and logics at work in emergent attempts to see, know and govern social affairs in new ways.
