Abstract
Introduction
There is a growing public and academic interest in ‘Big Data’ as they give rise to new ways of making sense of, doing work in, managing, or imposing control upon different aspects of the social world. Over the recent years, these developments have been welcomed by those setting out passionately the case for Big Data, open data and data infrastructures – especially in the realms of commerce and business activities, but also by governments, archives and academic research – and have been critically contested by scholars in an attempt to spark conversations about their issues and negative consequences (e.g., boyd and Crawford, 2011, 2012; Crawford and Schultz, 2014; Richards and King, 2013, 2014). These profound developments have been linked to debates around ‘the coming crisis of empirical sociology’ by Burrows and Savage (2014) and Mike Savage and Roger Burrows (2007) who focus mainly on methodological challenges of Big Data within the social sciences’ methodological repertoire. In addition, Chris Anderson (2008), former editor-in-chief of
It is no coincidence that the current iteration of this debate focusing on Big Data practices, rather than on methodological considerations, coincides with another discourse that has undergone a similar shift of focus, and which is led by some of the same British sociologists. This other humanities-infused discourse is concerned with the ‘politics of method’ (Savage, 2010; Savage and Burrows, 2007) and examines the ‘double social life of social methods’ (Law et al., 2011; Savage, 2013) – a cross-cutting approach to thinking methods not just as instruments, but as objects of study in themselves which are embedded in and shaping the social worlds they purport to describe. In other words, they seek to understand data and devices within the assemblages they form together with other kinds of actors as possessing co-constitutive agency in the enactment or materialisation of new ways of social and cultural being, while at the same time as new forms of social and cultural inquiry (e.g., Mair et al., 2015). Still others like Sophie Day et al. (2014) have proposed to examine a particular type of assemblage or what they term ‘number ecologies’ – extending the concept of ‘ecologies of knowledge’ (Star, 1995) – approached through the numbers and numbering practices that give rise to them. For instance, Carolin Gerlitz and Celia Lury (2014) have critically examined how the performative capacities of influence measures and other ‘participative metrics of value’ are interlinked with media to enact dynamic self-evaluating assemblages. The particular approach taken by these scholars is situated within the larger field of economic sociology, a field that ‘rediscovered the economy’ (Miller, 2001: 379) with its roots in social studies of science and technology (STS) and actor–network theory. Economic sociologists enquire into the ways in which economic phenomena like markets come into being through the various agencies exercised by both technical and social actors, and the relationships of translation or intermediation these may establish in different scenarios (Callon, 1998; Callon and Muniesa, 2005; Callon et al., 2007; Fligstein, 1993; Granovetter, 1985). Among the main contributions of early studies in STS and economic sociology has been the ‘turn to technology’ (Bijker et al., 1987; MacKenzie and Wajcman, 1985; Woolgar, 1991), as well as a reconceptualisation of the affective relationship between technology and social aspects (including behaviours) as neither simply deterministic or utilitarian in their effects or impacts, nor merely embedded in and constrained by them. Rather, complex actor–networks tend to be mutually constituted through a continuous interplay of both agential and structural factors and aspects (Bijker and Law, 1992; Callon, 1986; Granovetter, 1985; Latour, 1987; Latour and Woolgar, 1986).
The objective of this article is to contribute to these ongoing debates by tracing out such a network of associations comprised of technical objects, techniques, and the operative chains they are involved in, as seen through the conceptual lens of ‘cultural techniques’ (Macho, 2003, 2008; Siegert, 2013, 2014). Who works with Big Data, its production, storage, analysis and application? What motives and challenges drive and constrain their work? What is actually done with Big Data and what other kinds of knowledge could it help produce? On the one hand, the focus is on the coordination of a range of disparate concept and methods from within a larger genealogy or
The argument is organised in three parts. The first lays out the argument in general terms, positing commensuration as a cultural technique that is part of operative chains linking technological objects and social processes together in the structuration of analysis and interaction with social media platforms, while also playing a central role in reconfiguring them. It makes a case for studying the production and circulation of a particular kind of number and its practical utility. The second part engages with these themes and techniques more concretely and pragmatically. How and where is commensuration at work in a social media platform like Facebook? Concentrating specifically on Facebook’s Data Warehousing and Analytics Infrastructure (hereinafter DWAI), it moves beyond methodological critiques of the utility of Big Data that lack empirical support and specificity. It is notable that to a company like Facebook, data and analytics are at the core of everyday operations, where the work of programmers and non-programmers, internal applications and external products converge in their reliance on the very same infrastructures, attracting many different kinds of uses and users. Reading some of Facebook’s own publications on the topic, as well as available technical documentation is a way to begin describing this unique configuration and also gives insights into the kinds of issues and challenges driving engineers to implement certain solutions over others. If Big Data can be said to constitute challenges and opportunities that often require domain-specific solutions, then the associated practices mark a political space where multiple possible solutions compete, warranting further investigation. The third part examines the relationship between Facebook’s data infrastructure and the social and economic realities it gives rise to. How to understand the role of commensuration and other calculative agencies deployed in Big Data infrastructures in the structuring of analysis and interaction with social media platforms? How is the social accounted for and what makes it that these data can become economically valuable?
Commensuration as cultural technique
Following Thomas Macho’s initial definition, ‘Cultural techniques – such as writing, reading, painting, counting, making music are always older than the concepts that are generated from them’ (2003: 179). They are conceived as operative chains. Accordingly, symbolic work requires specific cultural techniques: ‘we may talk about recipes or hunting practices, represent a fire in pictorial or dramatic form, or sketch a new building, but in order to do so we need to avail ourselves of the techniques of symbolic work, which is to say, we are not making a fire, hunting, cooking, or building at that very moment’ (2008: 100). Understanding commensuration as cultural technique means acknowledging it as an integral component in a series of actions that may give rise to symbolic and material practices. Further, it is instructive to distinguish between commensuration as routine technique without significant consequences on the one hand, and as having technical, symbolic, or political advantages on the other (Espeland and Stevens, 1998: 316; Feldman and March, 1981). In a more concrete sense, this layering enables a meaningful distinction between the rather simple or mundane procedures and those involving tremendous collective effort. For example, distinguishing between simple routine counting procedures and shared collective counting procedures involving large infrastructures, powerful institutions and standards, or distinguishing simple acts of value comparison from high-frequency trading across cultural or geographical distances fundamental to global markets. Moreover, this perspective reminds us that these techniques are deeply cultural, historical, and open to critical scrutiny. Situating techniques within their larger conceptual spaces can enable a better understanding of the concepts and methods they mobilise.
Metrics and numbers do not only count, but also facilitate the analysis, evaluation and efficient management or control of a broad range of human activities and practices represented by these quantities. In
Commensuration and the work of accounting
Commensuration lies at the heart of many Big Data analytics practices, constituting a linchpin in these networks of technologies and techniques, concepts and methods converging in the form of management, control and accounting procedures. Before an analysis can be conducted on the basis of a set of qualities or quantifies (e.g., observations, frequencies, or ratios) they first have to be combined or grouped together as homogeneous in order to produce a single index number. Since processes of quantification often involve some form of judgement the concept of ‘qualculation’ (Callon and Law, 2005) seems better suited to analyse these calculations as accomplishments that require a certain kind of work. As Espeland and Stevens explain, ‘Whether it takes the form of rankings, ratios, or elusive practices, whether it is used to inform consumers and judge competitors, assuage a guilty conscience, or represent disparate forms of value, commensuration is crucial to how we categorise and make sense of the world’ (1998: 314). Commensuration enables rendering certain aspects of life visible or privilege them, while rendering others invisible or irrelevant. As a general technique commensuration is often deployed to negotiate difficult contradictions (e.g., when using a mean or count to compare two or more sets of numbers), as part of routine decision-making, and as vehicle for rationalisation, while presuming that these things can be measured (i.e., assigned quantities) in the first place. At the same time, because the reasons why we commensurate can vary greatly, it is arguably important to consider the forms we use to do so, as well as take note of those who resist it for its practical and political effects. This is especially true now that commensuration increasingly participates in decision-making processes in various domains are automated (through computational algorithms) in a desire to manage uncertainty, impose control, or secure legitimacy on an unprecedented pace and scale. Commensuration exists in relation to the incommensurable, and operates by ‘[creating] relations between attributes or dimensions where value is revealed in the comparison’ (p. 318). The relations created between various qualities thus come to constitute a common metric; a single number like ‘water quality’ which arises from the aggregation of an array of other disparate attributes or dimensions such as temperature, turbidity and pH. In turn, this new common metric not only offers not just a way of
Creating and apprehending these relations of commensuration requires material and social effort, which is why it is necessary to attend to material arrangements and practices (Callon and Law, 2005: 719). Relations between symbolic objects need to be formed, sustained, kept, and verified, and the practical tasks involved in these matters typically require tremendous organisation and discipline which largely become invisible when they have degraded into everyday routines and work practices. For instance, through efforts of institutionalisation and standardisation, ‘performing some highly elaborated modes of commensuration, such as generating identical units of value in stocks or commodities futures … are complex technical feats that seem “natural” to traders and stockholders nevertheless’ (Espeland and Stevens, 1998: 318; Porter, 1995). Accounting, in this sense, provides a set of related techniques, rationales and practices for doing this kind of work efficiently: to keep and verify accounts (quite literally in the context of social media user accounts), allowing one to
Engineering Big Data at Facebook
Having introduced commensuration as a more general cultural technique central to managerial procedures related to accounting, 4 this section proceeds to further situate commensuration within its field of operation (Derrida, 1982; Foucault, 1995: 138) to examine the role it plays in constituting or sustaining an assemblage of Big Data techniques and practices. Instead of a methods-driven analysis using Facebook data, I propose a kind of empirical enquiry that focuses mainly on reading published materials and available documentation to gain insight into the motivations, problems and challenges driving and constraining the design and development of Big Data infrastructures. Here, Facebook’s DWAI will serve as an example to illustrate the workings of a data assemblage. Viewing Big Data in terms of techniques helps to see that there are, for instance, a number of specific applications that indirectly rely on processing large quantities of data such as search queries, recommendations and content filtering. In fact, the scalable analysis of large data sets is among the core functions of a number of teams at Facebook – both engineering and non-engineering – and may vary ‘from simple reporting and business intelligence applications that generate aggregated measurements across different dimensions to the more advanced machine learning applications that build models on training data’ (Thusoo et al., 2010: 1013). Facebook’s DWAI – indeed an integral component of its infrastructure (Menon, 2012) – supports such batch-oriented analytics practices, which may include such things as reporting applications like Insights for Facebook Advertisers, creation of business intelligence dashboards, or doing more advanced calculations for site features like suggesting friend recommendations to Facebook users or combining messages, chat and email into a real-time conversation (Aiyer et al., 2012; Menon, 2012; Thusoo et al., 2010). As such, Facebook provides a rich set of tools for different kinds of users to perform analytics queries on its data.
A software engineer may perceive Big Data as posing big problems in need of working solutions that meet a certain set of criteria. While it is often difficult to obtain detailed empirical knowledge of the inner workings and management of these information systems, a general picture can be drawn based on information taken from published materials and available documentation from reports, periodicals, conference proceedings, presentation slides, blogs, technical documentation and handbooks, some of which are (co)authored by members of the teams at Facebook responsible for engineering these systems. For this reason, it also helps that Facebook’s data infrastructure is built ‘largely on top of open-source technologies such as Apache Hadoop, HDFS, MapReduce and Hive’ (Menon, 2012: 31), for which up-to-date documentation is usually available online. Using multiple sources of documentation, it is possible to develop a provisional understanding of how these systems may work and work together, enabling data warehousing and analytics operations to facilitate day-to-day operations. Moreover, it also gives a high-level overview of Facebook’s data infrastructure and enables distinguishing between some its cornerstone applications. As Aravind Menon and others have shown (e.g., Aiyer et al., 2012; Borthakur et al., 2011; Thusoo et al., 2010) there are three main components in Facebook’s infrastructure. Firstly, a MySQL/DB and caching component for primary data repository. This is a relational database management system (RDBMS) based on a model increasingly challenged by current demands like iterating over billions of rows at a time or working with richly connected entities. In such cases, NoSQL or graph databases and data models developed in response to the shortcomings of relational models offer clear advantages in terms of scale and agility, which is why it is surprising to learn this RDBMS is apparently still in use. Secondly, a HDFS/MapReduce/Hive component for conducting analytics on Facebook data. Thirdly, a HBase component to run transactional applications (most of which involve data documenting user operations), which is used both for internal applications and external products (Menon, 2012: 31). These components (accounting for two types of processing, which are analytical and transactional) constitute the infrastructural groundwork for a great variety of Facebook’s day-to-day operations, applications, site features and external products that involve processing large numbers of online transactions of operational data to control and manage diverse operations. The following sections first describe the cornerstones of Facebook’s Data Warehouse platform and then discuss large-scale data mining and analytics in more detail. The analysis relies on being sensitive to practical challenges (often domain-specific issues) and opportunities engineers may face when dealing with Big Data as a problem. Just as in other fields and industries, Big Data intervene and disrupt simply by posing challenges and opportunities for computer science and engineering, for example in terms of ‘volume, velocity, and variety’ (Beyer, 2011) – for ‘Big Data is [
Facebook’s DWAI
The concept of
As these descriptions indicate, what is central to this data warehousing platform is a rich set of tools for different kinds of users, enabling end users as well as internal applications, external products and third parties to perform analytics queries on Facebook data. While some of these tools may have friendly user-facing interfaces in the form of site features like Page Insights, this is not necessarily the case. Indeed, such applications are often unavailable to users of Big Data more generally; demanding technical knowledge, skills and expertise in statistics and programming now part, for instance, of the standard training for most economists (Taylor et al., 2014: 3). Furthermore, working with Big Data increasingly requires statistical techniques appropriate for dealing with entire populations of data, rather than with samples. This is indicative of the way in which analytics are moving from a
Large-scale data mining and analytics
Data mining techniques and algorithms (e.g., predictive analytics, data analytics, pattern recognition, and machine learning) play an important role in automatic and distributed data processing and analytics across a wide spectrum of domains involving often consequential decisions about human beings (Crawford, 2013; Hardt, 2014; Govindaraju et al., n.d.). The practical requirements that predictive and prescriptive modes of analytics therefore place on data infrastructures can be very demanding. The kinds of managerial procedures traced by Yates (1993) have transformed and now exist in, for example, HDFS or Hive/HBase components comprised of both analytical and transactional processing. Similarly, the notion of organisational learning has arguably reincarnated in the form of data and information management using computer learning methods or
Like MapReduce, Support Vector Machines (SVMs) constitute a more general method. SVMs are ‘supervised’ learning models or classifier algorithms that use training data to learn to solve classification problems. They can be found at work in Facebook’s applications for face recognition (Becker and Ortiz, 2009), identifying user behaviour patterns (Bozkır et al., 2010), or indeed for any other two-group classification problem. In this context, soft-margin SVMs are especially useful because they do relatively well with examples that are difficult to label (Cortes and Vapnik, 1995), a problem typically faced when mining social and user data as most of it is ‘unstructured’ (e.g., posts, pictures, and videos, but not likes, locations, or birthdays). Despite myriad benefits, however, there are also issues with these methods of classification, not least because of their reliance on supervision. This relates to what Solon Barocas and Andrew Selbst (2014) have termed Big Data’s ‘disparate impact’, a ‘procedural unfairness’ with regard to the complex forms of discrimination implicit in these techniques, running against common misconceptions that algorithms in general are fair or ‘neutral’, or can be made as such by ‘correcting’ for errors. Instead, fair classification is achieved ‘through a more thorough stamping out of prejudice and bias’ (Barocas and Selbst, 2014: 59), which requires tremendous effort as well as accepting that some degree of disparate impact is practically inevitable. But what amount is tolerable in a specific context? Classification thus requires compromise: fairness of specific outcomes at the expense of practical utility. Fundamental to most of today’s data mining techniques is the concept of
Conclusions: Accounting for economic markets?
This article set out to investigate what it means to account for – or literally take into account – the ‘social’ as it manifests on a major social media platform like Facebook; how we can understand the role of commensuration in the structuration of analysis and interaction with online social media platforms, and ultimately as reworking the boundary between the social, cultural and economic. Commensuration was conceived as a linchpin in establishing relations between technological objects and social processes involving many different practices, rationales, techniques, numbers, metrics and values; a cultural technique that may be encountered ‘in the wild’ as a basic ‘qualculative’ operation or as part of lengthy operative chains geared towards achieving a set of practical aims governing the formation, functioning and sustenance of data assemblages (e.g., optimising a recommender system). Facebook’s DWAI served as an illustrative case for describing – pragmatically and in descriptive-empirical terms – one such assemblage ‘composed of a set of apparatus and elements that are variously scaled (e.g., from local organisations and materialities to dispersed teams, national and supranational laws, and global markets) but are nonetheless bound in a unique constellation’ (Kitchin, 2014b: 186). Taking commensuration as a cultural technique involving both symbolic and material work, this article has proposed a conceptual framework for studying online social media platforms and how they relate to Big Data more generally, and demonstrates the empirical potential of a pragmatic approach grounded in reading published documents and available materials. Is it also possible to characterise the role of these techniques and operative procedures deployed in Big Data assemblages in performing online social networked environments
Extending Yates’ insights that new communication technologies have opened possibilities for wider markets and more scattered production facilities to firms as well as insights from others working in the field of economic sociology, the enabling of new data flows between devices and other actors (e.g., by implementing new features and techniques) contributes to redefining existing power relations or indeed produce new ones (e.g., Gerlitz and Helmond, 2013), thereby generating or reinforcing existing power relations and inequality (e.g., Andrejevic, 2014; Barocas and Selbst, 2014; Richards and King, 2013, 2014). Those who conduct analysis on social media data can (strategically) affect markets merely by stating or visualising what they believe their users are doing, should do, or will do in the future (cf. MacKenzie, 2007). Rather than ‘objective’ observation, data analytics can become performative of the very phenomena it purports to describe, analyse or predict. Yet while traditional approaches in statistics were generally limited in the number of variables used (e.g., for practical reasons), contemporary computational methods are not limited in the same ways and are particularly well-suited to work on any problem using a very high number of vectors or dimensions in an analysis. Data attributes or ‘features’ can be selected for their usefulness or relevance to a learning algorithm for solving a specific problem. For example, signals from Facebook users can be used to perform analytics across any number of dimensions by drawing together (i.e., through an act of commensuration) any number of disparate signals to explore or ‘discover’ new data relationships. This not only means that data have become more useful, or that their usefulness has extended deep into other domains, but also that the
In addition, machine learning also enables the ‘discovery’ – typically through
As this article demonstrates, a descriptive-empirical investigation may provide useful ways to study pragmatically some of the symbolic and material mechanisms and processes by which economic entities and agents are constructed in the context of a major social media platform. In examining Facebook’s DWAI, and describing its cornerstone applications as well as some major challenges, I argue it is possible to gain a better understanding of how Big Data are controlled, produced, stored, analysed and applied. The relation between managerial and accounting procedures and the social activity that numbers and metrics supposedly reflect or represent is arbitrary and involves commensurative work, which, as argued above, is both symbolic and material. Crucially, this relation is productive precisely because it is arbitrary, or as Espeland and Sauder explain: ‘Numbers are easy to dislodge from local contexts and reinsert in more remote contexts. Because numbers decontextualise so thoroughly, they
In conclusion, I propose four directions for further enquiry. First, following the approach proposed in this article, cultural techniques may be studied at different scales, varying from their simplest forms or their implementation in extremely sophisticated operative chains like those found in Facebook’s DWAI that require their own infrastructures designed specifically to accommodate such operations. This means understanding techniques as in terms of their positions within operative procedures – the techniques and cultural formations preceding them and new social realities they give rise to – as well as understanding the role they play in coordinating technological objects and social processes within larger assemblages. When performing analytics or aggregating numbers, we do not just count, but actively participate in calculation and the enactment of social worlds. Second, extending Kitchin’s call for more critical and philosophical engagement as well as detailed empirical research on the formation, functioning and sustenance of data assemblages, I suggest to study Big Data as constituting challenges addressed differently across domains such as economics, engineering or government, each of which has its own distinct rationales and practices. This includes investigating the work of engineers dealing with Big Data not just as a source but as a challenge in need of a working solution. Third, data are never simply there, but should be understood simultaneously as abstractions and as situated material-semiotic entities because these may assemble relations differently as in ‘second-order measurement’ (Power, 2004: 771) or further aggregations of data and numbers via statistical and mathematical operations. In particular, a sensitive attitude is needed toward the commensurative processes involved in prefiguring data, as well as the analytical operations we perform on them. The challenge is to distinguish general relational qualities of data and numbers from the specificities they gain by being situated within ‘number ecologies’ (Day et al., 2014). Investigating the various interfaces between data, infrastructure and applications matters because the technical shape of data is formed in relation to the platform, and is indeed situated within a production context (e.g., Vis, 2013). Through an investigation of commensuration, it is clear that not all signals are equal, even if they can be counted, recombined or decontextualised. Commonness and similarity are not properties inherent to a metric, but rather constitute an accomplishment, indeed the outcome of commensuration. The analytical and transactional operations through which such data points are made commensurable and countable at the same time also facilitate practical aims such as the efficient management of activities and practices including advertising, customer relationship management (CRM), or search result ranking. Finally, I propose to engage more deeply with economic theory to properly study online social networks as data assemblages. The point is not to study economic problems or activity
Footnotes
Acknowledgements
I would like to thank Anne Helmond, Bernhard Rieder, Jan Teurlings, and the anonymous reviewers for their constructive critical comments on previous versions of the article manuscript.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
