Abstract
Keywords
Introduction
Big Data (BD) is generally defined as a particular kind of dataset – one characterised by heterogeneity, noise and sheer size (Lim, 2015). Within the expert community, BD refers to those quantitative methods that use a new (inductive) kind of statistical approach. Having emerged in some areas of the natural sciences, these methods are being applied in e-commerce, video game industry and similar sort of ‘drivers of economic growth and innovation’ (Watson, 2014). Researchers point to the potential of BD based on the examples of the application of algorithmic approaches within areas as different as public health (Khoury and Ioannidis, 2014), disaster response (Bruns et al., 2014), security and public safety (Chen et al., 2012; Couch and Robins, 2013) and Countering Violent Extremism (CVE) (Johnson, 2016; Rovner, 2013). According to the numerous media articles and software companies and research organisations’ websites, social researchers are set to benefit from gaining access to the massive amounts of user-generated content, metadata and transaction data (Manovich, 2012).
Indeed, as Burrows and Savage (2014) maintain, the metricisation of social life derivable from the analysis of BD reveals a new picture of social life since digital data ‘permits a different kind of more temporally and spatially specific set of analyses which allow a more granular conception of the social to be delineated’ (p.5). The question remains how useful this knowledge is, for whom, and in what way.
As social and critical data scientists maintain, it is necessary to understand the way in which the new powerful instrument of knowing can transform social research (Aragona, 2017; Iliadis and Russo, 2016; Kitchin, 2014; Schroeder, 2018). Thus, Kitchin (2014) argues that ‘there is an urgent need for wider critical reflection within the academy on the epistemological implications of the unfolding data revolution’ (p.1), and maintains that ‘a potentially fruitful approach would be the development of a situated, reflexive and contextually nuanced epistemology’ (p.1). It is also necessary to consider social and ethical implications that the adoption of epistemological novelties may entail (Boyd and Crawford, 2012; Crawford and Finn, 2015; Shilton, 2012), and to assess their impact on analytical and decision-making practices (Adams and Brückner, 2015; Heath-Kelly, 2017; Wagner-Pacifici et al., 2015).
Some social scientists worry that the positivist concept of data as objective and transparent information representing reality will become increasingly diffused within the realm of social research (Dalton et al., 2016). As Shaw (2015) notes, ‘[s]tatistical data science is applicable to systems that have been designed as scientific instruments, but is likely to lead to confusion when applied to systems that have not’ (p.1). Others believe, on the contrary, in the positive nature of the epistemological change that the BD phenomenon may entail. For instance, Karpf (2012) suggests that social scholars need to embrace the values of transparency and kludginess as the latter helps finding solutions in such fields as computer science and engineering. Indeed, a certain change of epistemological culture seems to be inevitable, regardless of the social scholars’ will or ability to embrace new values. As Burrows and Savage (2014) note, social research is more and more often conducted by statisticians, journalists, computational scientists, data scientists and other non-sociological actors, including those who work for platform providers and other services outside academia. BD may further contribute to the proliferation of sociological research conducted by non-sociologists. Some researchers envision the future as a process of the peaceful convergence of the two epistemological cultures because only ‘modest difference’ in orientation can be observed between social scientists and computational scientists, as DiMaggio (2015) maintains.
Attempts to respond to what some social researchers perceived as a crisis of empirical sociology have resulted in significant developments of the digital social science field (Housley et al., 2017a). These included further enrichment of qualitative research methods (Housley et al., 2017b; Robards and Lincoln, 2017), reconceptualization of social media within social scientific and philosophical frameworks (Brooker et al., 2017), and a greater diversity of epistemological approaches in digital humanities and computational social science (Kitchin, 2017). Social researchers are also aware that some areas of applied research and practice may still be affected by the BD epistemological revolution discourse. For instance, Chandler (2015) maintains that a critical view on the use of BD in governance needs to be adopted. Rieder and Simon (2016) explore the interplay between the BD epistemological claims and the broader socio-political and cultural shift toward mechanical objectivity and data-driven society. Within the recently emerged area of digital humanitarianism, the BD epistemology is believed to obscure ‘many forms of knowledge in crises and emergencies and produces a limited understanding of how a crisis is unfolding’ (Burns, 2015: 477). Within the national security area, as Atran et al. (2017) maintain, attempts to develop multi-disciplinary analytic capability that enables an effective and efficient integration and analysis of diverse data for the purpose of countering terrorism resulted in ‘theory-agnostic, Big Data-driven exploratory work’ (p.352) that could have contributed to the proliferation of simplistic ‘root-cause’ explanations of terrorism and ignoring more complex data that do not fit into those explanations.
As shown above, the social researchers’ views on the BD challenge demonstrate a complete spectrum of attitudes, from overenthusiastic to ultra-conservative. Yet, they tend to be shaped by a technology-focused concept of BD – or, rather, by its image constructed in the popular media and promotional discourses. Indeed, some social researchers express a concern about academia’s declining leadership in the BD discussion: ‘The rapid adoption of BD by industry has leapfrogged the discourse to popular outlets, forcing the academic press to catch up’ (Gandomi and Haider, 2015: 137). The major challenge derives from a possibility that assumptions about the nature and implications of BD, having been ingrained into the technologies and tools that mediate social researchers’ access to and analysis of digital data, can force some researchers of the social to adopt the data-driven ethos.
This paper aims to contribute to the adoption of a social-scientific epistemological stance in relation to BD. The rest of the paper comprises four sections, and a conclusion. Section ‘Epistemological revolution in perspective’ examines assumptions behind the ‘epistemological revolution’ discourse on BD and suggests that this discourse may hinder the integration of the social scientific knowledge into the Big Data analytics (BDA). This argument is further developed in the remaining sections ‘Classical problems in the new data landscape’, ‘Social media data as manifestations of social processes and human activity: Conceptual frameworks’, and ‘Big Data analytics and social theory: The CVE case’. The ‘Conclusion’ section summarises the discussion on the nature of the BD epistemological challenge.
Epistemological revolution in perspective
High volume, high velocity and high variety are presented as distinctive features of the BD landscape (Russom, 2011), whilst algorithmic approaches are celebrated for enabling researchers to obtain valuable insights on social processes and human behaviour due to the possibility to repurpose data and to aggregate data from heterogeneous datasets (Bruns et al., 2014). From a social and cultural analysis perspective, the biggest problem with the BD discourse is that it draws upon and naturalises a technology-centred version of the data landscape (Calic and Resnyansky, 2015). Social media is a good example of this, being habitually defined as content generated by the users of different kinds of platforms (micro-blogging websites, video-sharing websites, blogs and discussion forums) and represented in different formats (text, image, audio, video) (Agarwal and Sureka, 2015). Whilst such descriptive categories may be useful from the machine learning and information retrieval perspectives, they have no particular heuristic value from the perspective of social research and cannot be considered sufficient in many areas of practice. In order to obtain heuristic significance and explanatory value, the technology-centred categorisation of data sources needs to be interpreted within a particular social scientific framework such as ethnographic methodology (Mare, 2017), Conversation Analysis (CA) Housley et al., 2017b), and so on.
In spite of their neutral appearance, the descriptive definitions of BD carry a strong revolutionary pathos. BD is not simply about huge quantity. It is about
The epistemological revolution discourse on BD draws upon dichotomies such as causation vs. correlation, hypothesis-driven vs. data-driven research, and inductive vs. deductive reasoning. However, the search for correlation and the search for causation have been equally legitimate heuristic strategies in science (Mazzocchi, 2015). It is true that in many areas of contemporary scientific endeavour which deal with statistically significant phenomena, application of algorithmic approaches for the analysis of troves of dynamic data enables researchers to discover novel surprising correlations, patterns and rules. However, this strategy can result in scientific discoveries due to the fact that the analysed data is the products of the previous theoretical thought embodied, for instance, into sensors, measurement procedures, and so forth. Unfortunately, this factor of utmost importance tends to be overlooked due to the simplistic image of science in the popular, media and promotional discourse on BD. By highlighting the role of method in scientific knowing, an ‘epistemological revolution’ discourse may contribute to a belief that the use of BD methods (algorithmic approaches) for the analysis of the social does not need to be justified theoretically.
Perhaps, focusing on method was quite justified in the times of the Copernican revolution, when science had to affirm itself as a distinctive tradition of knowing (reasoning supported by experimentally produced empirical evidence) vis-à-vis other traditions such as philosophy, religion, art, common sense, and so on (Feyerabend, 2010). However, this perspective is not so relevant in the case of the interaction between matured fields of scientific exploration, particularly when their epistemological stances and the concepts of data are as different as those of natural, computational and social sciences.
The ‘epistemological revolution’ discourse on BD introduces, and naturalises a set of paradoxically co-existing dichotomies. Specifically, infrastructures underlying the digital data universe are presented as ubiquitous yet invisible. Tools and algorithms that mediate researchers’ access to the data and that shape its analysis are presented as powerful yet epistemologically neutral. Within the BD world, narrow-technical definitions originating in applied areas (information science, computational science, digital economy, data management, business intelligence, game industry, etc.) are presented as universally applicable and heuristically productive within any area of research and practice as far as digital datasets are involved.
However, as Agre (1997a) maintains, digital technologies have been shaped by a specific kind of practice linked to a particular social and ideological project. Specifically, the design of computer systems has been inspired by the work-rationalization paradigm in industry engineering, with its focus on the form (document) rather than on the meaning. This led to the blurring of the distinction between ‘entities as things in the world and entities as representations of things in the world’ (p.40). Therefore, the developers of technology need to reflect on their assumptions, social values and design practices (Sengers et al., 2005; Shilton, 2012). Having demonstrated the non-neutrality and epistemological bias of computing, the critical-reflexive tradition in the philosophy of technology (Agre, 1997b; Mansell, 2012) provides a solid ground for arguing that the development and application of algorithmic approaches for the analysis of digital data as manifestations of the social should be shaped by social scientists. The next step is to understand how this kind of interdisciplinary research project can be realised in practice. Based on the extant critical reflection on the previous experience of the use of mathematical and computational approaches for the development of automated analytic capabilities, integration of social scientific knowledge into the development of computational analytic tools requires that: conceptual models of the analysed phenomena were shaped by relevant social-scientific theories; social and cultural data was approached as theoretical-methodological constructs (as opposed to the information-cybernetic concept of data as a stream of signals to be captured and processed); and the analytic methods’ heuristic potential and limitations, as well as their social and practical implications were assessed (Resnyansky, 2008). In the BD case, however, this kind of critical-reflexive interdisciplinary collaboration may be hindered due to the image of BD as an ‘epistemological revolution’ whence its novelty is envisioned in its being ‘data-driven’ – in other words, in being essentially a-theoretical.
Classical problems in the new data landscape
Sociologists are well aware of the fact that available datasets do not necessarily contain data on all relevant variables. As Merton (1968a) posits, the fundamental limitation of sociological research derives from a circumstance which regularly confronts sociologists seeking to devise measures of theoretical concepts by drawing upon the array of social data which happen to be recorded in the statistical series established by agencies of the society – namely, the circumstance that these data of social bookkeeping which happen to be on hand are not necessarily the data which best measure the concept. (p.219)
All of these issues – data as products of research, organisational and social-ideological practices; the transformative power of the technologies of data access, presentation and consumption; and subjectivity and bias in individuals’ accounts of events and contexts – continue to be relevant in the BD era. For instance, Sheehan’s (2012) study revealed that the most popular large-
As Merton’s (1968a) quote above reminds, social researchers have not always been able to control the scope of data available for analysis. In comparison to traditional sources of data, social media and other digital data sources present an additional challenge due to their dynamic and mediated nature. The content produced by social media users is constantly changing, as well as the structure of the interaction networks. Most importantly, the underlying infrastructure that mediates researchers’ access to data operates on principles that are hidden from those researchers (Driscoll and Walker, 2014). Social researchers and critical data scientists argue that it is, therefore, necessary to problematise the concept of data as a decontextualised entity and to approach data in context – i.e., in relation to the social, cultural, economic, ideological and technological conditions of their production and consumption (Dalton et al., 2016; Iliadis and Russo, 2016; Leszczynski and Wilson, 2013; Resnyansky, 2015). The next question to answer is how to conceptualise digital data as manifestation of social processes and human activity.
Social media data as manifestations of social processes and human activity: Conceptual frameworks
As Merton (1968b) maintains, up until the twentieth century, the only source of knowledge about beliefs, worldviews, opinions and attitudes were introspective narratives that were generated by members of intellectual and power elites. They were available from secondary sources, comprising philosophical treatises, social and historical studies, ideological manifestos, political pamphlets, the memoires of ‘great’ historical actors and their friends and relatives, and literary and artistic works. The spread of literacy, the emergence of mass media, and the growing interest of commercial and political actors in getting precise and granulated knowledge about opinions, attitudes and tastes of different segments of the population made it both possible and necessary to obtain massive primary data that could be analysed with the help of standardised instruments (e.g., surveys) and quantitative methods (e.g., content analysis). However, such technical precision has been shown to incur a heavy price: ‘there has been a marked pressure for working with very simple, one-dimensional categories, in order to achieve high reliability’ (p.503).
In comparison to the early years of public opinion research, the possibilities provided by social media look very promising (Ceron et al., 2014). Indeed, the number of ‘ordinary members of public’ who strive to express their opinions and attitudes to the global audience – and analysts – is vast and growing. The question is if, and to what degree, this kind of data can be meaningfully analysed with the help of approaches and methods originating from the analysis of such data as mass media content, organisational records, and individuals’ self-accounts in the form of narratives, or answers obtained with the help of carefully crafted research instruments (questionnaires, interviews, etc.). In the online communicative space, people tend to convey their opinions and attitudes in the form of simple performative acts (clicking the ‘like’ button, retweeting, following hashtags, uploading and downloading content, etc.) linked to a more or less generic topic, rather than in the form of statements about an explicitly defined subject (Nothman et al., 2015). Extraction of the aspect towards which a given sentiment is directed is, therefore, the key issue in the development of automated tools for sentiment analysis (Overbey et al., 2017). The affordances of the multimedia networked communicative space present a real challenge for researchers whose methods and tools are grounded within network/structuralist and semantics/representation-oriented paradigms (Rost et al., 2013). As Marwick and Boyd (2010) argue, mediated social interaction, and self-representation in particular, cannot be sufficiently explained by representation-oriented theories. Rather, social media users’ activity aims to serve utilitarian interests (e.g., expanding one’s network for career or other purposes), and people tend to maintain carefully crafted images of their public selves intended for diverse potential audiences. Therefore, representation-focused analysis of social media can help establish a repertoire of social masks that are worn in the virtual
Social media can be approached as a social practice that comprises three types of activities: production of meanings, transmission of meanings and consumption of meanings. Traditionally, production of meanings has been a duty and a privilege of groups that are capable of systematic reflection on social life – for example, philosophers, social thinkers, writers, artists and the members of the spiritual and intellectual elites (Merton, 1968b). Transmission of meanings has been the function of institutions responsible for the transmission of collective memory and ideologies, such as mass media (Neiger et al., 2011). In mass communication society, media played the key role in determining which meanings enter the realm of public discourse and, therefore, can influence opinions, attitudes, decisions, behaviour and policies. Consumption of meanings is an activity that takes place in individuals’ heads and, until recently, could only occasionally be revealed to the external observer either in a dialogue or in the form of self-account.
To summarise, in pre-social media era, researchers dealt with symbolic practices characterised not only by a strict division of labour between institutionalised actors (e.g., intellectual elite – mass media – audience) but also by a (assumed) priority of meaning production and dissemination over meaning consumption. For this reason, it was considered important to know what meanings are being disseminated through specific channels and to monitor the information demand of the intended audience. These tasks could be successfully satisfied by semantics-oriented methods represented by content and thematic analysis. This paradigm, however, has been considered inadequate for approaching media and communication as social practices (Golding and Murdock, 1978; Nordenstreng, 1968), and approaches have been suggested that aimed to capture the production – consumption interrelationship (see, e.g., Fairclough, 1992). Social media have blurred the boundaries between the three aspects of symbolic production of the social. The mediated symbolic space presents an amalgam of meaning production, dissemination and consumption at the level of individual behaviour within dynamic, amorphous, fluctuating collective entities (social networks, virtual communities, etc.) (Bruns and Schmidt, 2011). Mediated symbolic practice is characterised by the increasing role of the ‘consumer’ over meaning production and dissemination. Moreover, this space has provided more possibilities for an explicit manifestation of the activity of message consumption (Trilling, 2014). Conceptualisation of social media as a locus of social practice, therefore, needs to be informed by three kinds of theories: general theories of the symbolic production of the social; domain-specific conceptual models of relevant activity (e.g., culturally specific models of ideological indoctrination); and theories of language-in-use that aim to capture its psycho-social, pragmatic and communicative aspects (Mohr et al., 2013; Resnyansky, 2014; Smith, 2014).
For instance, the development of analytic tools for approaching
On the other hand, the development of analytic tools for approaching
Despite the availability of studies that demonstrate the fruitfulness and explanatory potential of the social theories of language and communication in the analysis of social media, the development of tools enabling an automated analysis of social media (e.g., natural language processing and sentiment analysis) tends to draw upon assumptions inherited from studies on public opinion analysis at the beginning of the twentieth century and from text subjectivity analysis in computational linguistics (Mäntylä et al., 2018). Of course, integrating social theories and algorithmic approaches presents a tremendous difficulty per se. This ambitious interdisciplinary collaborative project can be further complicated, if not entirely precluded by a positivist view on BD as ‘epistemological revolution’ characterised by an a-theoretical stance and a belief in the universal applicability of algorithmic approaches.
BD analytics and social theory: The CVE case
BDA appears to be quite selective of the theoretical underpinnings that it is capable of incorporating. Priority seems to be given to theories that focus on behavioural and structural aspects of social phenomena, rather than on the systemic and historical aspects. For instance, theories explaining psychological phenomena such as child-parent disputes or sexual violence are considered to be quite applicable for the analysis of the phenomena of a political or socio-technical nature, such as civil unrest, group conflicts, cyber-attacks, and so on. Indeed, family disputes and political violence may be grouped together on the grounds that they all involve one actor attacking another (Johnson et al., 2013). However, it is still to be theoretically substantiated that a psychological concept of aggression can satisfactorily explain political and socio-technical phenomena.
On the other hand, rather than trying to erase the boundary between psychological and sociological conceptual systems, it would seem more logical, in order to explore social phenomena, to draw upon relevant macro-level theories (i.e., systemic-functional, politico-economic and socio-historical). The problem is that it is very difficult – perhaps, impossible – to bring together macro-sociological theories of the social and methods that have emerged from and for the analysis of processes that are manifested at the micro-scale level in the regime of ‘nowcasting’ (Dalton et al., 2016). This does not mean, however, that those theories can be dismissed and forgotten. Similarly, it would be a mistake to ignore the fact that the extant concepts of social phenomena, including extremism, terrorism and radicalisation, are of a contextualised and ideologically loaded nature (Sedgwick, 2010). The developers of automated analytic capabilities need to pay attention to the social science community’s discussion on issues such as the danger of overgeneralisation and re-contextualisation of empirical findings, particularly in areas such as actor profiling for countering extremism and radicalisation (Monaghan and Molnar, 2016).
As Dalton et al. (2016) maintain, the myths of BD ‘make Big Data seems like the only way to know anything about remote populations, but they also distract from its imperfections and qualitative nature, and the risks it poses as a basis for decision making’ (p.5). In the case of radicalisation, it has been recognised that focusing on the structure of virtual communities and on the content of online messaging is quite limiting and that a true understanding and countering of individuals’ and groups’ radicalisation requires an approach to online behaviour pertaining to offline environments comprising the immediate situation (event) as well as socio-historical conditions and cultural systems (Fishman, 2010; Von Behr et al., 2013). Preoccupation with social media as a source of data on indicators of radicalisation may contribute to the failure to capture the role of factors operating on the societal and cultural-ideological levels. For instance, on the bounds of a new century, the media discourse shaped by the Us – Them dichotomy was identified by social researchers as one of the main factors contributing to the process of radicalisation, along with systemic-structural and cultural-historical conditions (Borum, 2011). At present, the ‘psychological condition’ discourse on radicalisation and terrorism seems to become more popular (Bhui et al., 2014). Focusing on social media data could have contributed to the proliferation of the individual-centred psychological paradigm among analysts, decision makers and the public, thus narrowing down the spectrum of strategies and measures that can be undertaken in order to address the problem of extremism and radicalisation at the societal and community levels.
There are two approaches to CVE and radicalisation, known as the ‘hard power’ and the ‘soft power’ approaches (Bunnik, 2016). The hard-power approach focuses on the threat of political violence; it is characterised by offensive/coercive and defensive/risk management strategies. The soft-power approach aims to prevent radicalisation and extremism by addressing root causes, grievances and ideologies. Soft power strategies include: political, ideological and social policy strategies aiming to increase opportunities for vulnerable actors’ social and economic engagement as well as to facilitate community engagement in local countering extremist practices. BDA is habitually presented as a powerful instrument for obtaining knowledge that is most relevant for operative counter-measures and the hard-power approach to the provision of security and public safety (Chen et al., 2012). As Bunnik (2016) maintains, it is also necessary to understand the BDA ramifications from the perspective of the soft-power approach.
When it comes to the prevention of extremism and radicalisation, BDA is yet to demonstrate its potential – or, rather, its rightful place is yet to be understood. For instance, Johnson (2016) maintains that the use of algorithmic approaches for the analysis of terrorist events revealed that emergence of terrorist events is characterised by a periodical law that is similar to the natural laws observed in the case of the periodic recurrence in some natural phenomena. However, knowing the regularity and frequency per se cannot aid in preventing the event, the only logical action in the meantime may be to mobilise resources in anticipation of an event based on its likelihood. This finding could gain more practical significance if placed within a context comprising sociological knowledge on terrorism and its causes. In this context, the very fact that there seems to be certain regularity behind the emergence of events that disrupt normal life may be interpreted as an indication that the causes of violent extremism are to be sought at the system level.
Actor profiling may serve as another example that shows the importance of placing BDA into the relevant theoretical and practical contexts. For instance, an analysis of social media data could have revealed that people who have been involved in violent extremism or are at risk of becoming radicalised are characterised by such attributes as: young, male, educated, 2nd or 3rd generation immigrant, and so on (Vidino and Hughes, 2015). Having been examined from the ‘hard power’ approach perspective, this kind of profiling poses serious problems (Heath-Kelly, 2017). What use could law enforcement agencies make of the fact that those ‘future radicalised’ are predominantly young educated males of immigration background? What course of action may be undertaken on the basis of this knowledge in a legal democratic state? To be a useful instrument, profiling needs to be shaped by theories that explain certain kinds of behaviour by certain attributes or establish obvious correlation between them. In criminology, for instance, profiling is substantiated by a sound and empirically validated theory of psychological types combined with typologies of socio-economic variables that correlate with deviant behaviour and involvement in certain kinds of criminal activities (Rae, 2012). There is no unity, however, regarding theoretical underpinnings of profiling in areas such as terrorism, extremism and radicalisation. As summarised by Kruglanski et al. (2014), a variety of psychological, sociological, economic, political and cultural factors can contribute to radicalisation, but neither of them is particularly significant per se. Radicalisation is a dynamic process that represents the convergence of high-order elements (i.e., elements that cannot be reduced to clusters of standard variables), such as the quest for personal significance, an ideology that justifies violence as the path to significance, and the possibility of adopting the necessary means (e.g., networking and group dynamics). On the background of theories as above, the attributes of the ‘future radicalised’ actors identified by a BD analyst cannot be interpreted as markers of radicalism that possess an explanatory status. Rather, the BDA findings may indicate that society fails to create conditions for utilising the potentially most active and productive segment of the population, and that solutions need to be sought at the societal and cultural levels (Atran et al., 2017).
To summarise, social and cultural BD analysis needs to be assessed vis-à-vis relevant social scientific theories and empirical research. For instance, assessment of the place of BDA within the counter-extremism and radicalisation realm needs to be conducted in relation to: the problem of using quantitative and mathematical method in empirical social research; attempts to address that problem within areas that preceded BDA (such as social network analysis, social modelling, computational social science, terrorism informatics); and relevant social scientific research and applied theories. BDA aims to inform counter-extremist and counter-radicalisation practice across the whole spectrum, from the tactical to the broadly strategic end. How can BDA address the diverse concerns of various stakeholders? Will it broaden or narrow down the repertoire of countering strategies and policies? It is necessary to assess its impact in terms of the balance between proactive longitudinal measures (that aim to address the causes of a problem) and operative reaction to events and to information about actors. It is also important to identify areas and problems that cannot be informed by BDA.
Conclusion
In the promotional and positivist discourses, BD has been presented as a phenomenon that radically changes every aspect of individual and social life and, indeed, the traditional ways of knowing the world. These changes are usually summarised around three points: that establishing correlations can replace the search for causality; that the heuristic value of data is the outcome of its sheer volume; and that aggregated datasets from heterogeneous sources can be analysed separately from the contexts of their production, dissemination and consumption. The logical conclusion is that the availability of digital data related to various aspects of human life makes it possible to apply computational approaches to the analysis of social and cultural processes and human behaviour. However, whilst such methods are considered extremely productive in some areas of science as well as in the commercial sector, they appear challenging both in social research and in many areas of practice.
Algorithmic approaches have flourished in areas where ‘data speaks for itself’, as in e-commerce, where the observable digital transactions equal the phenomena needing to be explored (consumers’ online behaviour). Using Peirce’s (1955) typology of signs, this kind of digital data can be qualified as iconic signs whose form contains actual characteristics of their meaning. However, when digital data is meant to be explored as manifestations of social life, cultural processes and individuals’ socially significant behaviour, they will relate to the represented reality as indexical or symbolic signs whose meaning is the product of interpretations shaped by multiple contexts. Digital social and cultural data are, in the literal sense, only
In spite of the growing significance of BD for social and cultural analysis, for social scientists and practitioners BD remains an external technological condition to be accommodated to. To change the situation, the development of social BDA requires an interaction between computational science and social science as conceptual
Perhaps, the major challenge posed by the BD phenomenon comes from the fact that the proclaimed ‘epistemological revolution’ may turn to be an illusion that hides the beginning of an ‘epistemological decadence’. BD seems to revive some classical problems that social researchers have encountered in the distant past. Since then, social researchers have thoroughly reflected on those problems, understood their epistemological, theoretical and methodological aspects, and have learned how to address them. That understanding has entered the basis of social scientific disciplines, having been incorporated into theoretical frameworks, research methods and analytic techniques. Over the last decade years, a growing number of social researchers try to apply them to the analysis of digital data (see, e.g., Goldsmith and Brewer, 2015; Housley et al., 2017a, 2017b; Mohr et al., 2013; Nothman et al., 2015; O’Halloran et al., 2017; Papacharissi, 2015; Smith et al., 2016). Yet integration of the social scientific epistemologies into the development of algorithms and analytic tools is still a challenge due to the diversity of disciplines and epistemological cultures that need to be brought together (Resnyansky, 2015). In order to productively – or, indeed, meaningfully – apply algorithmic approaches for the exploration of the social, BDA needs to be informed by relevant social scientific knowledge and grounded within social-scientific epistemological principles. Answering the BD challenge requires that social scientists’ efforts are directed towards extending the repertoire of social scientific theories and conceptual frameworks that may inform the analysis of the social in the age of BD. In illustrating this point, this paper briefly outlined some relevant theories and concepts. Other promising conceptual frameworks could be only briefly mentioned, whilst many are yet to be identified or developed.
