Abstract
Keywords
Introduction
The term “Big Data” is used to describe the volume of information produced through the use of technologies like mobile devices, positioning systems, and online services—“[i]n a digitized world, consumers going about their day—communicating, browsing, buying, sharing, searching create their own enormous trails of data” (Manyika et al., 2011: 1). The increasing use of digital services has given social scientists unprecedented access to previously unimaginable data; traces of the lives, dreams, and feelings of hundreds of millions of people. This seems to bring great promises for social scientific work, as the “data deluge … is leading us to an ever greater understanding of life on Earth and the Universe beyond … [it may] transform the process of scientific discovery. The more data there is the more discoveries can be made” (Rosling, 2010). Some have pointed to a “fourth paradigm” for science, as new algorithmic, computational, and analytical tools produce “gold” from this data resource (Bell et al., 2009; Hey et al., 2009).
But Big Data is also associated to a number of scientific challenges, that have led to the competing view that “the glittering promise of online data abundance too often proves to be fool’s gold” (Karpf, 2012: 652). Big Data has brought a development that is simultaneously enticing and vexing: a veritable “siren-song of abundant data” (Karpf, 2012) that is causing researchers to flock to the study of phenomena identifiable in growing Big Data, and ignore phenomena not so inscribed. They are met by a host of difficulties, some of which are a natural part of new technologies to which we have yet to grow accustomed, but there are also issues that run deeper than the mere settling of dust (e.g. Boyd and Crawford, 2012: 20). Big Data is not only different in quantity, but also in quality, and it seems that the new shapes of data do not always fit into the holes of old theory. This has resulted in it bringing fundamental issues to the fore: the questioning of epistemological assumptions, discussions of the validity of disciplinary divides, critique of methodological monism, and the rejection of long-trusted simplifications (Kitchin, 2014). Throughout the sciences, similar tensions can be seen as the data deluge causes long submerged epistemological questions to float to the surface.
While the traditional variable-based approaches to social science have struggled with the new forms of data, approaches with their roots in the natural sciences have stepped forth to meet the tide. A central player in the multitude of new approaches is Computational Social Science, located at the “intersection of the social and computational sciences, an intersection that includes analysis of web-scale observational data, virtual lab-style experiments, and computational modeling” Watts, 2013:1. This natural scientific approach has brought with it a range of methods to allow dealing with the complexity of mass-interaction (Conte et al., 2013). The renegotiation of demarcations between the natural and social sciences following from this development seems to in part be leading to a renewed naturalism, sometimes referred to as the “end of theory” (Anderson, 2008) and exemplified by Lev Manovich's (2016) view that “Digital is what gave culture the scale of physics, chemistry or neuroscience. Now we have enough data and fast enough computers to actually study the ‘physics’ of culture”. In this new naturalism, society is subsumed not under the traditional Cartesian-Newtonian paradigm, but instead under the metatheory of particles and flows, and analogues such as “avalanches and granular flows, flocks of birds and fish, networks of interaction in neurology, cell biology and technology” (Ball, 2012: ix).
This situation paints a picture in which Big Data has taken parts of social science, in particular fields like Computational Social Science, towards a new computationally based perspective by underlining the limitations of traditional quantitative methods. As we will argue, their response implies a particular social ontology, which focuses on relations and sees social structures as patterns emerging from underlying local interaction.
This paper looks at what this approach leaves out. By weaving together a series of theoretical views, the paper outlines the epistemic limits of the emerging computational paradigm. We bring back into the contemporary discussion the views that dominated the social sciences in the early days of the digital era—a time when digitalization was seen not as something that would make society more amenable to formal methods, but rather as the precise opposite: as part and parcel of the processes of postmodernity. It was seen as part of a development towards increasing
We begin by looking at the impact that Big Data has had on contemporary social science.
Contemporary digital research: Computation and relations
Despite its name, size is arguably not the most defining feature of “Big Data” (e.g. Boyd and Crawford, 2012): the concept rather describes a set of parallel developments in various disciplines, whose common denominator is an increasing proliferation of data sets that have proven difficult to fit into existing paradigms. In the computer industry—first to feel the effects of this development—quantity was indeed among the primary issues, since traditional tools, such as relational databases, proved incapable to deal with new demands emerging from large-scale systems (Manovich, 2011). But in the social sciences, the emerging problems associated to Big Data were different: as Boyd and Crawford (2012) observe, some of the data sets understood as examples of “Big Data” (e.g. some Twitter studies) are significantly smaller than sets understood as “traditional” data (e.g. census data), pointing to the fact that the data quantities in themselves are not the issue—even huge quantities of structured census data are relatively easy to process using traditional tools.
Instead, the use of the term “Big Data” seems to point toward
When it comes to the methods that structure the data, the transition is perhaps best headlined as a shift from mathematically organized data to algorithmically organized data (see also Törnberg, forthcoming). While survey data is constructed for processing through variable-based analysis, requiring pre-compartmentalized data designed to be palatable for a scientific perspective that sees the social world through a lens of averages and variances, data extracted from digital technologies tends to be structured by and for algorithmic processing, implying indexed data structures and traversable networks (Mackenzie, 2012; Marres and Weltevrede, 2013). New data therefore tends to be poorly suited for statistical analysis; it often comes in small chunks, spreading and diffusing in complex and constantly transforming networks, without clearly defined bounds. The social ontology that digital technologies operationalize is not focused on the summing up of a population in fixed categories, but rather on the individuals and their dynamic connections and interactions (Castellani, 2014; Uprichard, 2013). This implies no longer producing data by departing from the aim of a whole, implicitly assumed to be the sum of its parts, but rather departing from the parts and their location within a data structure.
This has been taken to suggest that while census data is produced for scientific analysis, Big Data is a “naturally occurring by-product” (Edwards et al., 2013; Kitchin, 2014), constituted by traces of ongoing social processes rather than something produced for scientific consumption. This ostensible rawness is taken to mean that the ontology revealed by Big Data is in fact the true, relational nature of the social world that had previously been concealed by the survey method. As Allen Barton (1968: 1) puts it, “the survey is a sociological meat grinder, tearing the individual from his social context and guaranteeing that nobody in the study interacts with anyone else in it”.
This idea of rawness is seen within the emerging Computational Social Sciences as providing the foundation for a new approach to studying the social world, which is argued to have the potential to solve many of sociology’s deep-rooted problems. Alex Pentland refers to this as “sociology of the 21th century” in Manovich, 2011:464, and says that digital data “give us the chance to view society in all its complexity, through the millions of networks of person-to-person exchanges”. The new data allows us to navigate “data sets without making the distinction between the level of the individual component and that of the aggregated structure” (Latour et al., 2012: 590), which, Lazer et al. (2009) argues, has the potential to transform our understanding of our lives, organizations, and societies. Other scholars have argued that “a new kind of social science” is needed (e.g. Christakis, 2012) (a call referring Wolfram’s (2002) declaration of “a new kind of science”, i.e. Complexity Science), to respond to the fundamental changes that Big Data brings in its wake. This new science is seen as the answer to the crisis of the old approach of empirical sociology, through a supplanting of surveys and interviews with data mining and GIS analysis (Savage and Burrows, 2007). This is seen as bringing a redrawing of the disciplinary boundaries by, as Watts (2007) argues, resolving the issues that has made the social sciences “less successful” than the physical and biological sciences in providing explanatory and coherent theoretical accounts of, for example, the complexities of collective social behavior (see e.g. Bajec and Heppner, 2009; Dorigo and Stützle, 2010; Helbing et al., 2005; Johnson, 2002). Thus, the new data is seen as enabling a convergence between the social and natural sciences under a new approach and ontology (Christakis, 2012). The difference in social ontologies that are operationalized by the old and new data in many ways corresponds to the contrast between “complicated” and “complex” systems (Morin, 2007). This similarity is not incidental: the Santa Fe school of Complexity Science developed largely around the study of, and with, computers and algorithms, in which the dynamics of computer models of mass-interactive systems were studied under labels such as ALife, Agent-Based Modeling and Cellular Automata (Galison, 1997). It was largely this study that became the foundation for the theory of complexity, describing both an ontological category and an approach (e.g. Mitchell, 2009). It is for this reason to little surprise that Big Data seems similar to this ontological category and responds well to a similar methodology, as they carry the same basic social ontology within their structure, having been shaped by the same type of methods and technologies.
What, then, is implied by this ontology of complexity? According to Complexity Science,
The complexity approach has proven highly capable of analyzing many types of systems that have otherwise been impenetrable to formal approaches (e.g. Mitchell, 2009). Complexity Science is centered on the core use of formal models of mass-interaction, focusing not as much on social facts and aggregate explanations, but rather on the emergence of aggregate pattern. This means putting the finger exactly on the limits of aggregate measures, since
For a sociologist, this view of the social world is more reminiscent of Tarde’s notion of
The epistemological perspective of complexity relates to the generally accepted view in the various sciences dealing with complex collective behavior, that there exist some fundamental differences between the individual and the aggregate levels (Calhoun, 2002; Knorr-Cetina and Cicourel, 1984). Traditionally, in the social sciences, the existence of levels have often been assumed, and questions have focused mainly on issues like whether the micro or macro level is the suitable level of analysis, or if the two would be possible to “reconcile” using some higher-level theory. In practice, this has primarily been handled through disciplinary and methodological separations, leaving the question of the emergence of structure from individuals to the road-side. Complexity Science, in contrast, focuses explicitly and almost exclusively on this question (Érdi, 2007; Mitchell, 2009).
The Computational Social Science introduced by e.g. Lazer et al. (2009) constituted to certain degree a reboot or re-appropriation of the term: the Computational Social Science of the previous decade was part of Complexity Science and was never linked to large-scale data, but rather approached society mainly through simulation in general, and agent-based modeling in particular. While a certain rift between the data-focused and the simulation-focused Computational Social Science can be identified, they were, and are, strongly connected through a common perspective on society as a relationally and dynamically complex system.
With the rise of Big Data, Complexity Science seems to be increasingly experiencing an “obliteration by incorporation” (Merton, 1968: 27), as the perspective transitions from an explicit focus on emergence and complexity to constituting an implicit foundation in many of the tools and approaches used for social scientific research. For instance, complex network analysis has become widely used within parts of mainstream social science, focusing on how relations/interactions on the micro-level lead to the formation of higher level social patterns (e.g. Strogatz, 2001). The influence of complexity thinking, and the linking of macro dynamics to individual behavior, is also exemplified by many theories and concepts that are increasingly used by social scientists even outside of digital methods, including terms such as “threshold effects” or “tipping-points”, “power-laws”, “preferential attachment”, and so on.
The increasing prevalence of the complexity perspective is often argued to be not only the result of differences in the “rawness” of the data and what it can reveal of the social world, but also to reflect actual changes in the nature of social interaction. There may be certain merits to such claims: just as researchers are shaped by the social life of Big Data, so are its users. Differences that are often emphasized is that digital social life seems more quantitative, regular, predictable (as illustrated by the successes of platform and data analysis companies that subsist precisely on predictive analysis), which is argued to motivate a more natural scientific approach to the data. Two such lines of argument in particular can be identified.
In summary, the combination between a social life that is more reactive, instinctive and natively quantified, and an understanding of Big Data as something fundamentally
In the following sections, we will look at
Early digital research and beyond: Liquidity and postmodernity
In the early discussions on the implications of digital technology, pre-dating the age of ubiquitous social media and digital platforms, digital technology was primarily viewed through the lens of dematerialization: the transition of technology
In these discussions, the notions of digitalization and dematerialization were connected to the larger contemporary discussions around terms like “postmodernity”, “liquid modernity”, “late capitalism”, and “acceleration”. Analyses of the implications of digital technology can be found in a range of strands, from Baudrillard’s (1994)
The common denominator of these views is in many ways the precise opposite of the epistemological conclusions taken in the current debate on digital media: these scholars saw the digital technology as being part of a late modernity “uncontrollable and quintessentially kaleidoscopic in form” (Archer, 2014: 1). As Archer emphasizes, this means that just because a social phenomenon (institution, role, group, belief or practice) continues to bear the same name, “it cannot automatically be regarded as being ‘the same’” (p.6), and continuously stable. Digitalization and dematerialization were thus seen as part of the processes of postmodernism in that it constitutes the dissolution of an impediment to the pace of change. This is part of the larger process of modernity, in which, “instead of inhabiting a stable world of objects made to last, human beings found themselves sucked into an accelerating process of production and consumption”. (Arendt 1958: xiv)
According to these scholars, digitalization can thus be understood as yet another step or phase of this transition, in which capitalism has, as Jameson (1991) argued, reached its purest form. Through digitalization, this process has finally melted the very materiality of technology, permitting
As Hayles (1999) points out, this new instability is brought into our very language, and the ways we interpret the world. To analyze this, Hayles build upon Lacan’s concept of “floating signifiers”, adding that they through digital technology also begin to
Taken together, the view proposed by these early critical scholars can be understood as digitalization bringing increasing
In our view, despite being largely neglected in the contemporary literature, this early characterization of digitalization remains in many ways accurate as a description of the effects of digitalization on social life; implying a fluidity and instability of meanings and structures constantly boiling under the surface of the ostensible constancy of fixed numbers and symbols. However, digital technology has since developed in some new and at the time unforeseen directions, which have meant that the fluidity of meaning and structures have become channeled in unexpected ways.
Fluid technology in the era of platforms
These two factors have meant that the fluidity and capacity for rapid change of dematerialized technology, theorized by the scholars of the early days of the Internet, have not only played into a postmodern culture of late capitalism, but has also been channeled into new forms of power for the owners of technology. Technological power can now be exercised in more sophisticated, nimble and illusive ways than ever before, as the dematerialization of technology means that the ownership even of consumer products has become possible to centralize. The artifacts that we consume and surround ourselves with are increasingly rented rather than owned, as apps, programs, and technological platforms are increasingly located in the cloud, and thus prone to constantly change without warning. The “zero-marginal-cost” of software has resulted not in the end of capitalism, as some social scientists rather optimistically theorized (Mason, 2015; Söderberg, 2015), but rather in a transition of business models from
While digitalization brings increasing centralization and sophistication in the expression of technological power, technology’s function as a shaper of social behavior is in itself nothing new. Technology has always been in and for the power of its owners and producers, as a force capable of shaping and directing social life in their interests. There is hardly an activity, belief or form of interaction that is not mediated by artifacts and thus affected by this hidden ideological face of technology (Feenberg, 1991), whether wedding rings and clothes, candles and incense, or money and art—these artifacts store and propagate societal structures (Elder-Vass, 2017). Social life has always played out within technological platforms that shape and frame our interaction and provide context to it, granting permanence to our symbols and our language (Collins, 2014). The impact of the technological context is not merely incidental: churches, for instance, are expressions of power and authority, consciously designed to inspire awe toward the power of Gods, religious institution and holy men. They instill authority into the solemn priest behind the pulpit, and remind us of the larger story within which we are but minor players, thus shaping and giving meaning to our behavior and interaction. Online digital platforms of today are not unlike such physical meeting places: they too provide the context within which rituals and social life take place. They condition our interactions, shape who has authority and who is heard.
But the combination between centralization of power and the dematerialization of technology implies an important transition in the expression of technological power. While yesteryears churches were carved from stones, rocks and clay, the digital churches of today constantly shift underneath our feet. While physical churches were blunt tools for shaping our lives, needing to be backed up by damnations and inquisitions, the digital churches read and react to our every gesture and expression. They are capable of customizing their expressions to individuals, or trying a hundred variations of the colors of the pulpit to see how its faithful are affected. What use culture emerges among users is importantly controlled by what the system “affords” (Norman, 1999), and what can be done “frictionlessly” (Shaw, 2015): subtle design choices herd the users in certain directions, in ways related to the concept of “nudging” (Thaler et al., 2013).
In other words, the increasing fragmentation and fluidity following from dematerialization has somewhat paradoxically implied increased centralization of control, as it permits the owners of technology to express power by shaping meaning and structures through gentle nudging of underlying technical rules. This control does not congeal the constant boiling fluidity of meaning, but rather dynamically directs its flow. Control moves to lower ontological strata, shaping outcomes through the underlying rules of interaction rather than through explicit control. In this dematerialized modernity, the fluidity of meanings and structures afford a form of control that seemingly paradoxically emerges from the bottom up.
This transition in the expression of power is reminiscent of the transition described by Norbert Elias (2006) in
The nature of digital data
What, then, are the implications of the condition of postmodernity and the technological power of platform owners for digital data research? What limits do these observations imply for the computational study of digital social life, and how do they clash with the tendency of Complexity Science to naturalize social life—seeing social patterns not as the result of contingency and conflict, but as expressions of universal social laws?
As we saw in the first section, the promise of the Big Data revolution has described a world of previously unimaginable data; a flood of coffee-table discussions revealing traces of the lives, dreams, and feelings of hundreds of millions of people. This has painted a picture—which hangs centrally in the halls of e.g. Computational Social Science—of the “true” relational nature of social life being unveiled, showing a social life which is not only measurable but even predictable.
While this painting shows a dreamy world for the social sciences, another reality appears when we lift our gaze from the data feeders, and cast it upon the less than appetizing context in which the data is fed to us. Rather than a spontaneous and natural production of social traces, we see how the data is produced, selected and provided to us by platform owners pursuing their own interests. Many aspects are left out of this data. For instance, the platforms and their rules that shape the online behavior are not readily visible: their interests and incentives instead lie latent as hidden forces that guide individual behavior and the emergent social practices of the platforms. Thus, at the same time as the contextual aspects and the power of platform owners are becoming increasingly central to understanding social life, our focus as researchers is increasingly on the patterns of interaction, which, as they lose their natural setting, become naturalized and decontextualized, just in the way that complexity perspectives have historically had a tendency of implying naturalization (Byrne and Callaghan, 2014; Uitermark, 2015). When Big Data is seen as merely an
This idea of Big Data as an “encoding” of social life disregards the complex interplay between the technological and social aspects of human life that have produced the data. Rather than merely a one-way encoding, the production of data takes place by digital platforms directing and limiting action by providing a “grammar of action” that make certain activities doable, and thus rendering social activities available for measurement, analysis, commodification, and manipulation (Van Dijck, 2013). But at the same time, users are not helpless puppets in this process: they are often aware of the ways that measures and technologies play into their social lives, and reflexively take account of this in their use. They are not “encoding” their behavior, but rather employing and enacting the methods, performing through the measures in front of an “imagined audience” (Litt, 2012): “social actors produce methodical accounts of social life as part of social life” (Lynch, 1991). The platform owners are in turn aware of these dynamics: the very creation of digital platforms tends to involve the implementation of sociological and social psychological ideas; their use ranging from the benign push (e.g. suggesting friends through triadic closure) to what basically amounts to a weaponized social psychology (e.g. “engagement maximization” through the application of research on addiction). In short, measures not only describe, but are enacted and made part of social life, in the type of continual process of reflexivity between diverse actors and roles that is quintessential of the vexatious nature of social life.
Online behavior and content are in other words a consequence both of how digital technologies work and what people do with them, in ways that are exceedingly difficult to separate. Rather than thinking of online social life through a separation between “human behavior” to be studied, and “technological bias” to be in various ways “corrected for”, content is perhaps better understood as the output of an entanglement between the two—a sociotechnical system (Marres, 2017). This casts technology as a defining feature of human society, rather than as something to be corrected for. The way that “virality” is employed to make claims about the new “instinctual” and “reactive” nature of digital social life is the case in point here; Halavais (2014) shows how the re-tweet emerged as a sociotechnical script on Twitter: beginning as an informal practice, to becoming encoded in a button, which ended up producing the macro-pattern of “virality” and “diffusion”, appearing as repeated behavior cascading through a network.
The result of thinking of Big Data as providing a form of privileged access to the social world is that researchers flock to study relatively predictable and correlated social behavior on ostensibly disintermediated online platforms, while disregarding the sociotechnical conditions that lead to the formation of that behavior. The digital platforms are developed using significantly more sophisticated methods and larger data quantities than what is available to researchers, with even the most seemingly insignificant design decision being the result of meticulous A/B-testing and data analysis. From the basis of the Complexity Science metaphor of social behavior as the playing of a game, the data thus makes more visible the “playing of the game”, while obscuring the “rules of the game” and the interests that shaped them. The data thus becomes a perfect fit for a naturalizing science that tends to see the rules as universal and their outcomes as inevitable.
The formal tools and mathematical models that we apply to study this world hinge on the stability of meaning and understanding that are exchanged. But such assumptions have not become less problematic through digitalization, but rather more so, as symbols and meaning are becoming more local in time and space. Interpretation has not become less central in the research process; its locus has merely moved, as interaction is simultaneously more quantified and its meaning more fragmented and flickering.
This does not mean that the observation of increasing complexity, native quantitativity, and the potential for predictability are false. Big Data is seemingly paradoxically associated to both these developments: it is simultaneously more liquid
The answer is not, as has been the case among some scholars, to simply reject computational methods or suggest that the entire notion of “new data” is merely a red herring since many of its aspects have a long history (Marres, 2017; Uprichard et al., 2008)—the epistemological and methodological demands of complexity in general, and Big Data in particular, are real and will have to be reckoned with. But neither is the answer a methodological one: we
Conclusions
The first section of this paper showed how the structure of digital data is making trouble for the traditional social scientific variable-based approach, creating a push toward new social ontologies matching the structure of the data. This has sparked a renewed,
The technological power of platform owners is to a large part enabled by the same new tools for data analysis as used by social scientists—indeed, the private sector is often the driver for the development of these tools. These efforts have been immensely economically successful, as illustrated by companies like Google and Facebook. But we must not forget that the aims of these corporations are quite different from the aims of researchers: they seek
As noted by early scholars of digital technology, the flexibility and rapid change enabled by digitalization are part of the larger processes of postmodernity. But the digital world is not well-described by the classical understanding of postmodernity alone. It is also part of an increased centralization of technological power, and a change in the role of technology in social life. The postmodern aspects of digitalization do bring more “openness” (in the sense of e.g. Bhaskar, 1978) to society, with social structures becoming more fragmented,
This is made more confounding by the fact that this development is occurring in an increasingly natively quantitative context, in which people are communicating through numbers and coded messages. While this changes the locus of scientific interpretation, since researchers no longer need to transcribe conversations, it does
Similarly problematic is the notion that human online behavior being more reactive should lead us to disregard context and view social life through analogues like “flocks of birds and fish”. While technological platforms do act on humans through their powerful social life, wielded and directed by platform owners, the fact that platforms are capable of herding users through technological nudges and affordances should not imply a reduced focus on context, but rather engage us to turn our gaze toward precisely the power of the platforms. But instead of moving towards increasing focus of these contextual aspects and the role of platform owners, many scholars interested in digital social life have been lured by the siren-call of new methods and abundant digital data to lose their gaze from precisely these factors. Thus, social patterns are naturalized: we take behavior on social platforms to be telling about the nature of social life, while it may in fact say more about the interests of the platform owners.
For us as researchers, this implies a need for not only studying processes of emergence, but for doing so while keeping in mind that there is nothing natural about human behavior and that there is no such thing as “raw data”. Social media should perhaps be thought off less like savannahs of free-running herds of humans, than like zoos in which caged users are made to dance to the tune of capital; “no data, big or small, can be interpreted without an understanding of the process that generated them” (Shaw, 2015: 1), and these processes are entangled in the interests of capital.
This calls for a
