Sage Journals: Discover world-class research

Abstract

This paper aims to contribute to the development of tools to support an analysis of Big Data as manifestations of social processes and human behaviour. Such a task demands both an understanding of the epistemological challenge posed by the Big Data phenomenon and a critical assessment of the offers and promises coming from the area of Big Data analytics. This paper draws upon the critical social and data scientists’ view on Big Data as an epistemological challenge that stems not only from the sheer volume of digital data but, predominantly, from the proliferation of the narrow-technological and the positivist views on data. Adoption of the social-scientific epistemological stance presupposes that digital data was conceptualised as manifestations of the social. In order to answer the epistemological challenge, social scientists need to extend the repertoire of social scientific theories and conceptual frameworks that may inform the analysis of the social in the age of Big Data. However, an ‘epistemological revolution’ discourse on Big Data may hinder the integration of the social scientific knowledge into the Big Data analytics.

Keywords

Social and cultural Big Data analytics social science computational science epistemological challenge social media

Introduction

Big Data (BD) is generally defined as a particular kind of dataset – one characterised by heterogeneity, noise and sheer size (Lim, 2015). Within the expert community, BD refers to those quantitative methods that use a new (inductive) kind of statistical approach. Having emerged in some areas of the natural sciences, these methods are being applied in e-commerce, video game industry and similar sort of ‘drivers of economic growth and innovation’ (Watson, 2014). Researchers point to the potential of BD based on the examples of the application of algorithmic approaches within areas as different as public health (Khoury and Ioannidis, 2014), disaster response (Bruns et al., 2014), security and public safety (Chen et al., 2012; Couch and Robins, 2013) and Countering Violent Extremism (CVE) (Johnson, 2016; Rovner, 2013). According to the numerous media articles and software companies and research organisations’ websites, social researchers are set to benefit from gaining access to the massive amounts of user-generated content, metadata and transaction data (Manovich, 2012).

Indeed, as Burrows and Savage (2014) maintain, the metricisation of social life derivable from the analysis of BD reveals a new picture of social life since digital data ‘permits a different kind of more temporally and spatially specific set of analyses which allow a more granular conception of the social to be delineated’ (p.5). The question remains how useful this knowledge is, for whom, and in what way.

As social and critical data scientists maintain, it is necessary to understand the way in which the new powerful instrument of knowing can transform social research (Aragona, 2017; Iliadis and Russo, 2016; Kitchin, 2014; Schroeder, 2018). Thus, Kitchin (2014) argues that ‘there is an urgent need for wider critical reflection within the academy on the epistemological implications of the unfolding data revolution’ (p.1), and maintains that ‘a potentially fruitful approach would be the development of a situated, reflexive and contextually nuanced epistemology’ (p.1). It is also necessary to consider social and ethical implications that the adoption of epistemological novelties may entail (Boyd and Crawford, 2012; Crawford and Finn, 2015; Shilton, 2012), and to assess their impact on analytical and decision-making practices (Adams and Brückner, 2015; Heath-Kelly, 2017; Wagner-Pacifici et al., 2015).

Some social scientists worry that the positivist concept of data as objective and transparent information representing reality will become increasingly diffused within the realm of social research (Dalton et al., 2016). As Shaw (2015) notes, ‘[s]tatistical data science is applicable to systems that have been designed as scientific instruments, but is likely to lead to confusion when applied to systems that have not’ (p.1). Others believe, on the contrary, in the positive nature of the epistemological change that the BD phenomenon may entail. For instance, Karpf (2012) suggests that social scholars need to embrace the values of transparency and kludginess as the latter helps finding solutions in such fields as computer science and engineering. Indeed, a certain change of epistemological culture seems to be inevitable, regardless of the social scholars’ will or ability to embrace new values. As Burrows and Savage (2014) note, social research is more and more often conducted by statisticians, journalists, computational scientists, data scientists and other non-sociological actors, including those who work for platform providers and other services outside academia. BD may further contribute to the proliferation of sociological research conducted by non-sociologists. Some researchers envision the future as a process of the peaceful convergence of the two epistemological cultures because only ‘modest difference’ in orientation can be observed between social scientists and computational scientists, as DiMaggio (2015) maintains.

Attempts to respond to what some social researchers perceived as a crisis of empirical sociology have resulted in significant developments of the digital social science field (Housley et al., 2017a). These included further enrichment of qualitative research methods (Housley et al., 2017b; Robards and Lincoln, 2017), reconceptualization of social media within social scientific and philosophical frameworks (Brooker et al., 2017), and a greater diversity of epistemological approaches in digital humanities and computational social science (Kitchin, 2017). Social researchers are also aware that some areas of applied research and practice may still be affected by the BD epistemological revolution discourse. For instance, Chandler (2015) maintains that a critical view on the use of BD in governance needs to be adopted. Rieder and Simon (2016) explore the interplay between the BD epistemological claims and the broader socio-political and cultural shift toward mechanical objectivity and data-driven society. Within the recently emerged area of digital humanitarianism, the BD epistemology is believed to obscure ‘many forms of knowledge in crises and emergencies and produces a limited understanding of how a crisis is unfolding’ (Burns, 2015: 477). Within the national security area, as Atran et al. (2017) maintain, attempts to develop multi-disciplinary analytic capability that enables an effective and efficient integration and analysis of diverse data for the purpose of countering terrorism resulted in ‘theory-agnostic, Big Data-driven exploratory work’ (p.352) that could have contributed to the proliferation of simplistic ‘root-cause’ explanations of terrorism and ignoring more complex data that do not fit into those explanations.

As shown above, the social researchers’ views on the BD challenge demonstrate a complete spectrum of attitudes, from overenthusiastic to ultra-conservative. Yet, they tend to be shaped by a technology-focused concept of BD – or, rather, by its image constructed in the popular media and promotional discourses. Indeed, some social researchers express a concern about academia’s declining leadership in the BD discussion: ‘The rapid adoption of BD by industry has leapfrogged the discourse to popular outlets, forcing the academic press to catch up’ (Gandomi and Haider, 2015: 137). The major challenge derives from a possibility that assumptions about the nature and implications of BD, having been ingrained into the technologies and tools that mediate social researchers’ access to and analysis of digital data, can force some researchers of the social to adopt the data-driven ethos.

This paper aims to contribute to the adoption of a social-scientific epistemological stance in relation to BD. The rest of the paper comprises four sections, and a conclusion. Section ‘Epistemological revolution in perspective’ examines assumptions behind the ‘epistemological revolution’ discourse on BD and suggests that this discourse may hinder the integration of the social scientific knowledge into the Big Data analytics (BDA). This argument is further developed in the remaining sections ‘Classical problems in the new data landscape’, ‘Social media data as manifestations of social processes and human activity: Conceptual frameworks’, and ‘Big Data analytics and social theory: The CVE case’. The ‘Conclusion’ section summarises the discussion on the nature of the BD epistemological challenge.

Epistemological revolution in perspective

High volume, high velocity and high variety are presented as distinctive features of the BD landscape (Russom, 2011), whilst algorithmic approaches are celebrated for enabling researchers to obtain valuable insights on social processes and human behaviour due to the possibility to repurpose data and to aggregate data from heterogeneous datasets (Bruns et al., 2014). From a social and cultural analysis perspective, the biggest problem with the BD discourse is that it draws upon and naturalises a technology-centred version of the data landscape (Calic and Resnyansky, 2015). Social media is a good example of this, being habitually defined as content generated by the users of different kinds of platforms (micro-blogging websites, video-sharing websites, blogs and discussion forums) and represented in different formats (text, image, audio, video) (Agarwal and Sureka, 2015). Whilst such descriptive categories may be useful from the machine learning and information retrieval perspectives, they have no particular heuristic value from the perspective of social research and cannot be considered sufficient in many areas of practice. In order to obtain heuristic significance and explanatory value, the technology-centred categorisation of data sources needs to be interpreted within a particular social scientific framework such as ethnographic methodology (Mare, 2017), Conversation Analysis (CA) Housley et al., 2017b), and so on.

In spite of their neutral appearance, the descriptive definitions of BD carry a strong revolutionary pathos. BD is not simply about huge quantity. It is about the transformation of quantity into quality: due to the size of datasets available for analysis, patterns can be revealed that cannot be detected in smaller datasets, thus providing answers to questions that could not even be asked before. The idea that an analysis of huge dynamic datasets in real time generates additional value is particularly prominent within the digital economy sector, where BDA is used in order to better understand and influence consumers’ behaviours. Indeed, in commercial, computer science, and natural and behavioural science domains, BD is being sold as a revolutionary change in the ways of knowing: the search for correlation instead of a search for causation (Calude and Longo, 2017). ‘Big data is about what, not why. We don't always need to know the cause of a phenomenon; rather, we can let data speak for itself’ (Mayer-Schönberger and Cukier, 2013: 14). The epistemological revolution discourse on BD is elucidated upon by Andersen’s (2008) article in which the author maintains that BD changes science as we know it and suggests that, once science adopts a data-driven ethos, it can learn something new and useful from companies specialising in Internet services and products.

The epistemological revolution discourse on BD draws upon dichotomies such as causation vs. correlation, hypothesis-driven vs. data-driven research, and inductive vs. deductive reasoning. However, the search for correlation and the search for causation have been equally legitimate heuristic strategies in science (Mazzocchi, 2015). It is true that in many areas of contemporary scientific endeavour which deal with statistically significant phenomena, application of algorithmic approaches for the analysis of troves of dynamic data enables researchers to discover novel surprising correlations, patterns and rules. However, this strategy can result in scientific discoveries due to the fact that the analysed data is the products of the previous theoretical thought embodied, for instance, into sensors, measurement procedures, and so forth. Unfortunately, this factor of utmost importance tends to be overlooked due to the simplistic image of science in the popular, media and promotional discourse on BD. By highlighting the role of method in scientific knowing, an ‘epistemological revolution’ discourse may contribute to a belief that the use of BD methods (algorithmic approaches) for the analysis of the social does not need to be justified theoretically.

Perhaps, focusing on method was quite justified in the times of the Copernican revolution, when science had to affirm itself as a distinctive tradition of knowing (reasoning supported by experimentally produced empirical evidence) vis-à-vis other traditions such as philosophy, religion, art, common sense, and so on (Feyerabend, 2010). However, this perspective is not so relevant in the case of the interaction between matured fields of scientific exploration, particularly when their epistemological stances and the concepts of data are as different as those of natural, computational and social sciences.

The ‘epistemological revolution’ discourse on BD introduces, and naturalises a set of paradoxically co-existing dichotomies. Specifically, infrastructures underlying the digital data universe are presented as ubiquitous yet invisible. Tools and algorithms that mediate researchers’ access to the data and that shape its analysis are presented as powerful yet epistemologically neutral. Within the BD world, narrow-technical definitions originating in applied areas (information science, computational science, digital economy, data management, business intelligence, game industry, etc.) are presented as universally applicable and heuristically productive within any area of research and practice as far as digital datasets are involved.

However, as Agre (1997a) maintains, digital technologies have been shaped by a specific kind of practice linked to a particular social and ideological project. Specifically, the design of computer systems has been inspired by the work-rationalization paradigm in industry engineering, with its focus on the form (document) rather than on the meaning. This led to the blurring of the distinction between ‘entities as things in the world and entities as representations of things in the world’ (p.40). Therefore, the developers of technology need to reflect on their assumptions, social values and design practices (Sengers et al., 2005; Shilton, 2012). Having demonstrated the non-neutrality and epistemological bias of computing, the critical-reflexive tradition in the philosophy of technology (Agre, 1997b; Mansell, 2012) provides a solid ground for arguing that the development and application of algorithmic approaches for the analysis of digital data as manifestations of the social should be shaped by social scientists. The next step is to understand how this kind of interdisciplinary research project can be realised in practice. Based on the extant critical reflection on the previous experience of the use of mathematical and computational approaches for the development of automated analytic capabilities, integration of social scientific knowledge into the development of computational analytic tools requires that: conceptual models of the analysed phenomena were shaped by relevant social-scientific theories; social and cultural data was approached as theoretical-methodological constructs (as opposed to the information-cybernetic concept of data as a stream of signals to be captured and processed); and the analytic methods’ heuristic potential and limitations, as well as their social and practical implications were assessed (Resnyansky, 2008). In the BD case, however, this kind of critical-reflexive interdisciplinary collaboration may be hindered due to the image of BD as an ‘epistemological revolution’ whence its novelty is envisioned in its being ‘data-driven’ – in other words, in being essentially a-theoretical.

Classical problems in the new data landscape

Sociologists are well aware of the fact that available datasets do not necessarily contain data on all relevant variables. As Merton (1968a) posits, the fundamental limitation of sociological research

derives from a circumstance which regularly confronts sociologists seeking to devise measures of theoretical concepts by drawing upon the array of social data which happen to be recorded in the statistical series established by agencies of the society – namely, the circumstance that these data of social bookkeeping which happen to be on hand are not necessarily the data which best measure the concept. (p.219)

This classical sociological problem persisted despite the emergence of a new data landscape and data gathering technologies. In their paper on agent-based modelling for cultural intelligence, Silverman et al. (2008) discuss the data source problem in relation to three types of data sources: survey and event databases; news feeds; and subject matter experts. In the case of survey and event databases, data is shaped by different methodologies of data collection and formats of presentation. Thus, the modeller needs to take into account specific databases’ strengths, weaknesses, terminology and idiosyncrasies. The methodological bias is further amplified by the fact that databases are products of organisational and social-ideological practices. They are assembled by social scientists, government agencies, or commercial organisations governed by their own needs and requirements – hence, they are inevitably selective in terms of the aspects of life that they capture through their datasets. When news feeds are used as a data source, a new layer of complicating factors is added. Since data needs to be extracted with the help of automated content analysis tools, it is necessary to consider the impact of data collection algorithms. In addition to this, media sources may be biased and selective. Finally, subject matter experts do not possess comprehensive domain knowledge and their judgement tends to correspond to an ideological rather than scientific model of thinking (Tetlock, 2005). Data provided by ‘cultural insiders’ is shaped by subjective experiences mixed with inherited grievances and memories (Schuman and Scott, 1989). Therefore, meaningful analysis of social and cultural data is possible only if bias and subjectivity are incorporated into the analytic methods and tools at the conceptual level (Park et al., 2010). Such an approach enables obtaining a fragmented picture of incomplete, partial and conflicting interpretations of reality – but, unlike a glossy picture obtained from the analysis of aggregated decontextualised data, it can provide insights on how reality is perceived by social actors embedded into concrete socio-historical and situational contexts.

All of these issues – data as products of research, organisational and social-ideological practices; the transformative power of the technologies of data access, presentation and consumption; and subjectivity and bias in individuals’ accounts of events and contexts – continue to be relevant in the BD era. For instance, Sheehan’s (2012) study revealed that the most popular large-n terrorism datasets display a striking inconsistency in regard to the working definitions of terrorism and operational inclusion rules, thus posing problems for those who want to generalise from those datasets. The impact of the technologies that produce and mediate access to digital data is unprecedented yet less visible due to the ubiquity of such technologies and their deeper impact on society and individuals (Driscoll and Walker, 2014). Increasingly powerful tools for collecting media data cannot help in overcoming the fundamental problem of selectiveness and partiality, nor can they help the fact that media data is literally data that ‘happened to be reported’ by media professionals motivated by their organisational conditions, ideological preferences, group alliances, and personal needs and tastes. This problem is unlikely to be solved by the availability of primary data generated by social media users. On the contrary, issues related to data sources’ subjectivity and partiality are becoming increasingly prominent (see, e.g., Burns, 2015; Gehl and Bakarjieva, 2017; Marwick and Boyd, 2010; Papacharissi, 2015).

As Merton’s (1968a) quote above reminds, social researchers have not always been able to control the scope of data available for analysis. In comparison to traditional sources of data, social media and other digital data sources present an additional challenge due to their dynamic and mediated nature. The content produced by social media users is constantly changing, as well as the structure of the interaction networks. Most importantly, the underlying infrastructure that mediates researchers’ access to data operates on principles that are hidden from those researchers (Driscoll and Walker, 2014). Social researchers and critical data scientists argue that it is, therefore, necessary to problematise the concept of data as a decontextualised entity and to approach data in context – i.e., in relation to the social, cultural, economic, ideological and technological conditions of their production and consumption (Dalton et al., 2016; Iliadis and Russo, 2016; Leszczynski and Wilson, 2013; Resnyansky, 2015). The next question to answer is how to conceptualise digital data as manifestation of social processes and human activity.

Social media data as manifestations of social processes and human activity: Conceptual frameworks

As Merton (1968b) maintains, up until the twentieth century, the only source of knowledge about beliefs, worldviews, opinions and attitudes were introspective narratives that were generated by members of intellectual and power elites. They were available from secondary sources, comprising philosophical treatises, social and historical studies, ideological manifestos, political pamphlets, the memoires of ‘great’ historical actors and their friends and relatives, and literary and artistic works. The spread of literacy, the emergence of mass media, and the growing interest of commercial and political actors in getting precise and granulated knowledge about opinions, attitudes and tastes of different segments of the population made it both possible and necessary to obtain massive primary data that could be analysed with the help of standardised instruments (e.g., surveys) and quantitative methods (e.g., content analysis). However, such technical precision has been shown to incur a heavy price: ‘there has been a marked pressure for working with very simple, one-dimensional categories, in order to achieve high reliability’ (p.503).

In comparison to the early years of public opinion research, the possibilities provided by social media look very promising (Ceron et al., 2014). Indeed, the number of ‘ordinary members of public’ who strive to express their opinions and attitudes to the global audience – and analysts – is vast and growing. The question is if, and to what degree, this kind of data can be meaningfully analysed with the help of approaches and methods originating from the analysis of such data as mass media content, organisational records, and individuals’ self-accounts in the form of narratives, or answers obtained with the help of carefully crafted research instruments (questionnaires, interviews, etc.). In the online communicative space, people tend to convey their opinions and attitudes in the form of simple performative acts (clicking the ‘like’ button, retweeting, following hashtags, uploading and downloading content, etc.) linked to a more or less generic topic, rather than in the form of statements about an explicitly defined subject (Nothman et al., 2015). Extraction of the aspect towards which a given sentiment is directed is, therefore, the key issue in the development of automated tools for sentiment analysis (Overbey et al., 2017). The affordances of the multimedia networked communicative space present a real challenge for researchers whose methods and tools are grounded within network/structuralist and semantics/representation-oriented paradigms (Rost et al., 2013). As Marwick and Boyd (2010) argue, mediated social interaction, and self-representation in particular, cannot be sufficiently explained by representation-oriented theories. Rather, social media users’ activity aims to serve utilitarian interests (e.g., expanding one’s network for career or other purposes), and people tend to maintain carefully crafted images of their public selves intended for diverse potential audiences. Therefore, representation-focused analysis of social media can help establish a repertoire of social masks that are worn in the virtual commedia dell’arte – and not necessarily by human beings (e.g., Gehl and Bakarjieva, 2017). The question to what degree this knowledge reflects what the online and offline populations really think, feel and desire is still as relevant as ever.

Social media can be approached as a social practice that comprises three types of activities: production of meanings, transmission of meanings and consumption of meanings. Traditionally, production of meanings has been a duty and a privilege of groups that are capable of systematic reflection on social life – for example, philosophers, social thinkers, writers, artists and the members of the spiritual and intellectual elites (Merton, 1968b). Transmission of meanings has been the function of institutions responsible for the transmission of collective memory and ideologies, such as mass media (Neiger et al., 2011). In mass communication society, media played the key role in determining which meanings enter the realm of public discourse and, therefore, can influence opinions, attitudes, decisions, behaviour and policies. Consumption of meanings is an activity that takes place in individuals’ heads and, until recently, could only occasionally be revealed to the external observer either in a dialogue or in the form of self-account.

To summarise, in pre-social media era, researchers dealt with symbolic practices characterised not only by a strict division of labour between institutionalised actors (e.g., intellectual elite – mass media – audience) but also by a (assumed) priority of meaning production and dissemination over meaning consumption. For this reason, it was considered important to know what meanings are being disseminated through specific channels and to monitor the information demand of the intended audience. These tasks could be successfully satisfied by semantics-oriented methods represented by content and thematic analysis. This paradigm, however, has been considered inadequate for approaching media and communication as social practices (Golding and Murdock, 1978; Nordenstreng, 1968), and approaches have been suggested that aimed to capture the production – consumption interrelationship (see, e.g., Fairclough, 1992). Social media have blurred the boundaries between the three aspects of symbolic production of the social. The mediated symbolic space presents an amalgam of meaning production, dissemination and consumption at the level of individual behaviour within dynamic, amorphous, fluctuating collective entities (social networks, virtual communities, etc.) (Bruns and Schmidt, 2011). Mediated symbolic practice is characterised by the increasing role of the ‘consumer’ over meaning production and dissemination. Moreover, this space has provided more possibilities for an explicit manifestation of the activity of message consumption (Trilling, 2014). Conceptualisation of social media as a locus of social practice, therefore, needs to be informed by three kinds of theories: general theories of the symbolic production of the social; domain-specific conceptual models of relevant activity (e.g., culturally specific models of ideological indoctrination); and theories of language-in-use that aim to capture its psycho-social, pragmatic and communicative aspects (Mohr et al., 2013; Resnyansky, 2014; Smith, 2014).

For instance, the development of analytic tools for approaching social media as a locus of the expression of sentiments and attitudes can be informed by Vygotsky’s (1986) concepts of inner speech, dialogue, monologue and written speech as representing essentially different relationships between language and thought. The difference is the result of two factors – namely, the speed of communication and the degree to which the participants share knowledge of the subject of speech and situation. The distinctive feature of inner speech is the tendency toward abbreviation and predication; the latter is possible because ‘[w]e always know what we are thinking about’ (p.243). According to Vygotsky, the more economical – predicative – forms of speech are adopted when the participants share their point of reference. The online multimodal communicative environment encourages forms of speech that display the features of spontaneous dialogical communication on a shared topic, although often its referential meaning is neither clearly defined nor negotiated and agreed upon. At the same time, social media provide endless possibilities for re-contextualisation of symbolic entities both within and across cultural-linguistic boundaries (O’Halloran et al., 2017; Resnyansky, 2015), which makes it impossible to guarantee that the users talk about the same subject. In ideological domains, the illusory sense of sharing the subject of a discussion can be purposefully imposed upon the participants to promote ideas that would look less attractive if their original sources and final beneficiaries were revealed. In order to obtain valid data on attitudes and sentiments, an analytic tool is needed that enables analysts to take account of the trend toward predication in social media.

On the other hand, the development of analytic tools for approaching social media as a locus of social practice can be informed by relevant – social and pragmatic – theories of language, media and communication. For instance, scholars working within the tradition of CA have demonstrated that this approach enables an analysis of computer-mediated communication as social interaction embedded into the two levels of context – the online communication network as well as the physically proximal context (Warner and Chen, 2017). Recently, studies emerged that demonstrate the potential of CA for the development of tools enabling researchers to approach large datasets through the ‘talk-in-interaction’ conceptual framework, and to analyse them as structured social action (Rühlemann, 2017). For instance, in their analysis of Twitter conversations, Housley et al. (2017b) have demonstrated the fruitfulness of the ideas of Harvey Sacks, Harold Garfinkel and Erving Goffman for digital social research as well as for the development of analytic tools. Specifically, the ethnomethodology-oriented CA proved to be useful for the development of coding frames for the annotation of social media content. Having suggested that such frames may, in turn, inform the development of automated analytic tools, the authors note that there are ‘serious methodological and epistemological limitations in translating situated analyses of social action into quantitative and automated procedures’ (p.631). Another alternative is represented by attempts to enhance content analysis by critical discourse analysis (Albert and Salam, 2013; Bouvier, 2015; Rightler-McDaniels and Hendrickson, 2014; Zappavigna, 2015). The development of tools for the analysis of social media as a locus of ideological interaction in which ideas are legitimised and naturalised, or problematized and critically deconstructed would further benefit from the adoption of concepts such as, for instance, Goffman’s (1981) response, or Bakhtin’s (1981) and Voloshinov’s (1986) concepts of reported speech, dialogue, monologue and speech genres as manifestations of the higher-order communicative phenomena related to social interaction and critical consumption of meanings.

Despite the availability of studies that demonstrate the fruitfulness and explanatory potential of the social theories of language and communication in the analysis of social media, the development of tools enabling an automated analysis of social media (e.g., natural language processing and sentiment analysis) tends to draw upon assumptions inherited from studies on public opinion analysis at the beginning of the twentieth century and from text subjectivity analysis in computational linguistics (Mäntylä et al., 2018). Of course, integrating social theories and algorithmic approaches presents a tremendous difficulty per se. This ambitious interdisciplinary collaborative project can be further complicated, if not entirely precluded by a positivist view on BD as ‘epistemological revolution’ characterised by an a-theoretical stance and a belief in the universal applicability of algorithmic approaches.

BD analytics and social theory: The CVE case

BDA appears to be quite selective of the theoretical underpinnings that it is capable of incorporating. Priority seems to be given to theories that focus on behavioural and structural aspects of social phenomena, rather than on the systemic and historical aspects. For instance, theories explaining psychological phenomena such as child-parent disputes or sexual violence are considered to be quite applicable for the analysis of the phenomena of a political or socio-technical nature, such as civil unrest, group conflicts, cyber-attacks, and so on. Indeed, family disputes and political violence may be grouped together on the grounds that they all involve one actor attacking another (Johnson et al., 2013). However, it is still to be theoretically substantiated that a psychological concept of aggression can satisfactorily explain political and socio-technical phenomena.

On the other hand, rather than trying to erase the boundary between psychological and sociological conceptual systems, it would seem more logical, in order to explore social phenomena, to draw upon relevant macro-level theories (i.e., systemic-functional, politico-economic and socio-historical). The problem is that it is very difficult – perhaps, impossible – to bring together macro-sociological theories of the social and methods that have emerged from and for the analysis of processes that are manifested at the micro-scale level in the regime of ‘nowcasting’ (Dalton et al., 2016). This does not mean, however, that those theories can be dismissed and forgotten. Similarly, it would be a mistake to ignore the fact that the extant concepts of social phenomena, including extremism, terrorism and radicalisation, are of a contextualised and ideologically loaded nature (Sedgwick, 2010). The developers of automated analytic capabilities need to pay attention to the social science community’s discussion on issues such as the danger of overgeneralisation and re-contextualisation of empirical findings, particularly in areas such as actor profiling for countering extremism and radicalisation (Monaghan and Molnar, 2016).

As Dalton et al. (2016) maintain, the myths of BD ‘make Big Data seems like the only way to know anything about remote populations, but they also distract from its imperfections and qualitative nature, and the risks it poses as a basis for decision making’ (p.5). In the case of radicalisation, it has been recognised that focusing on the structure of virtual communities and on the content of online messaging is quite limiting and that a true understanding and countering of individuals’ and groups’ radicalisation requires an approach to online behaviour pertaining to offline environments comprising the immediate situation (event) as well as socio-historical conditions and cultural systems (Fishman, 2010; Von Behr et al., 2013). Preoccupation with social media as a source of data on indicators of radicalisation may contribute to the failure to capture the role of factors operating on the societal and cultural-ideological levels. For instance, on the bounds of a new century, the media discourse shaped by the Us – Them dichotomy was identified by social researchers as one of the main factors contributing to the process of radicalisation, along with systemic-structural and cultural-historical conditions (Borum, 2011). At present, the ‘psychological condition’ discourse on radicalisation and terrorism seems to become more popular (Bhui et al., 2014). Focusing on social media data could have contributed to the proliferation of the individual-centred psychological paradigm among analysts, decision makers and the public, thus narrowing down the spectrum of strategies and measures that can be undertaken in order to address the problem of extremism and radicalisation at the societal and community levels.

There are two approaches to CVE and radicalisation, known as the ‘hard power’ and the ‘soft power’ approaches (Bunnik, 2016). The hard-power approach focuses on the threat of political violence; it is characterised by offensive/coercive and defensive/risk management strategies. The soft-power approach aims to prevent radicalisation and extremism by addressing root causes, grievances and ideologies. Soft power strategies include: political, ideological and social policy strategies aiming to increase opportunities for vulnerable actors’ social and economic engagement as well as to facilitate community engagement in local countering extremist practices. BDA is habitually presented as a powerful instrument for obtaining knowledge that is most relevant for operative counter-measures and the hard-power approach to the provision of security and public safety (Chen et al., 2012). As Bunnik (2016) maintains, it is also necessary to understand the BDA ramifications from the perspective of the soft-power approach.

When it comes to the prevention of extremism and radicalisation, BDA is yet to demonstrate its potential – or, rather, its rightful place is yet to be understood. For instance, Johnson (2016) maintains that the use of algorithmic approaches for the analysis of terrorist events revealed that emergence of terrorist events is characterised by a periodical law that is similar to the natural laws observed in the case of the periodic recurrence in some natural phenomena. However, knowing the regularity and frequency per se cannot aid in preventing the event, the only logical action in the meantime may be to mobilise resources in anticipation of an event based on its likelihood. This finding could gain more practical significance if placed within a context comprising sociological knowledge on terrorism and its causes. In this context, the very fact that there seems to be certain regularity behind the emergence of events that disrupt normal life may be interpreted as an indication that the causes of violent extremism are to be sought at the system level.

Actor profiling may serve as another example that shows the importance of placing BDA into the relevant theoretical and practical contexts. For instance, an analysis of social media data could have revealed that people who have been involved in violent extremism or are at risk of becoming radicalised are characterised by such attributes as: young, male, educated, 2nd or 3rd generation immigrant, and so on (Vidino and Hughes, 2015). Having been examined from the ‘hard power’ approach perspective, this kind of profiling poses serious problems (Heath-Kelly, 2017). What use could law enforcement agencies make of the fact that those ‘future radicalised’ are predominantly young educated males of immigration background? What course of action may be undertaken on the basis of this knowledge in a legal democratic state? To be a useful instrument, profiling needs to be shaped by theories that explain certain kinds of behaviour by certain attributes or establish obvious correlation between them. In criminology, for instance, profiling is substantiated by a sound and empirically validated theory of psychological types combined with typologies of socio-economic variables that correlate with deviant behaviour and involvement in certain kinds of criminal activities (Rae, 2012). There is no unity, however, regarding theoretical underpinnings of profiling in areas such as terrorism, extremism and radicalisation. As summarised by Kruglanski et al. (2014), a variety of psychological, sociological, economic, political and cultural factors can contribute to radicalisation, but neither of them is particularly significant per se. Radicalisation is a dynamic process that represents the convergence of high-order elements (i.e., elements that cannot be reduced to clusters of standard variables), such as the quest for personal significance, an ideology that justifies violence as the path to significance, and the possibility of adopting the necessary means (e.g., networking and group dynamics). On the background of theories as above, the attributes of the ‘future radicalised’ actors identified by a BD analyst cannot be interpreted as markers of radicalism that possess an explanatory status. Rather, the BDA findings may indicate that society fails to create conditions for utilising the potentially most active and productive segment of the population, and that solutions need to be sought at the societal and cultural levels (Atran et al., 2017).

To summarise, social and cultural BD analysis needs to be assessed vis-à-vis relevant social scientific theories and empirical research. For instance, assessment of the place of BDA within the counter-extremism and radicalisation realm needs to be conducted in relation to: the problem of using quantitative and mathematical method in empirical social research; attempts to address that problem within areas that preceded BDA (such as social network analysis, social modelling, computational social science, terrorism informatics); and relevant social scientific research and applied theories. BDA aims to inform counter-extremist and counter-radicalisation practice across the whole spectrum, from the tactical to the broadly strategic end. How can BDA address the diverse concerns of various stakeholders? Will it broaden or narrow down the repertoire of countering strategies and policies? It is necessary to assess its impact in terms of the balance between proactive longitudinal measures (that aim to address the causes of a problem) and operative reaction to events and to information about actors. It is also important to identify areas and problems that cannot be informed by BDA.

Conclusion

In the promotional and positivist discourses, BD has been presented as a phenomenon that radically changes every aspect of individual and social life and, indeed, the traditional ways of knowing the world. These changes are usually summarised around three points: that establishing correlations can replace the search for causality; that the heuristic value of data is the outcome of its sheer volume; and that aggregated datasets from heterogeneous sources can be analysed separately from the contexts of their production, dissemination and consumption. The logical conclusion is that the availability of digital data related to various aspects of human life makes it possible to apply computational approaches to the analysis of social and cultural processes and human behaviour. However, whilst such methods are considered extremely productive in some areas of science as well as in the commercial sector, they appear challenging both in social research and in many areas of practice.

Algorithmic approaches have flourished in areas where ‘data speaks for itself’, as in e-commerce, where the observable digital transactions equal the phenomena needing to be explored (consumers’ online behaviour). Using Peirce’s (1955) typology of signs, this kind of digital data can be qualified as iconic signs whose form contains actual characteristics of their meaning. However, when digital data is meant to be explored as manifestations of social life, cultural processes and individuals’ socially significant behaviour, they will relate to the represented reality as indexical or symbolic signs whose meaning is the product of interpretations shaped by multiple contexts. Digital social and cultural data are, in the literal sense, only traces of social life, traces yet to be ‘decoded’. Therefore, the major challenge to be addressed stems not so much from the sheer volume and ‘noise’ but from the unclearly defined epistemological status of digital data as the manifestation of social and cultural reality (Shaw, 2015). Social researchers are yet to identify the sort of research questions that digital data might help appropriately address (Brooker et al., 2016).

In spite of the growing significance of BD for social and cultural analysis, for social scientists and practitioners BD remains an external technological condition to be accommodated to. To change the situation, the development of social BDA requires an interaction between computational science and social science as conceptual systems (disciplines). It may be useful to re-examine lessons learned from previous experiences of using quantitative and computational methods for the analysis of social and cultural phenomena. In order to help obtaining meaningful and useful knowledge about social and cultural phenomena, algorithmic approaches and analytic tools need to be underpinned by relevant conceptual models of the analysed phenomena. Unfortunately, the most relevant social scientific theoretical frameworks may appear to be incompatible with the BD epistemological stance. Indeed, one of the major problems with social and cultural BDA is the algorithm developers’ disregard of knowledge developed in social science and humanities. This knowledge may be difficult or impossible to formalise – but it certainly should serve as a baseline – or, rather, the ideal – for the assessment of the heuristic significance and the practical implications of BDA.

Perhaps, the major challenge posed by the BD phenomenon comes from the fact that the proclaimed ‘epistemological revolution’ may turn to be an illusion that hides the beginning of an ‘epistemological decadence’. BD seems to revive some classical problems that social researchers have encountered in the distant past. Since then, social researchers have thoroughly reflected on those problems, understood their epistemological, theoretical and methodological aspects, and have learned how to address them. That understanding has entered the basis of social scientific disciplines, having been incorporated into theoretical frameworks, research methods and analytic techniques. Over the last decade years, a growing number of social researchers try to apply them to the analysis of digital data (see, e.g., Goldsmith and Brewer, 2015; Housley et al., 2017a, 2017b; Mohr et al., 2013; Nothman et al., 2015; O’Halloran et al., 2017; Papacharissi, 2015; Smith et al., 2016). Yet integration of the social scientific epistemologies into the development of algorithms and analytic tools is still a challenge due to the diversity of disciplines and epistemological cultures that need to be brought together (Resnyansky, 2015). In order to productively – or, indeed, meaningfully – apply algorithmic approaches for the exploration of the social, BDA needs to be informed by relevant social scientific knowledge and grounded within social-scientific epistemological principles. Answering the BD challenge requires that social scientists’ efforts are directed towards extending the repertoire of social scientific theories and conceptual frameworks that may inform the analysis of the social in the age of BD. In illustrating this point, this paper briefly outlined some relevant theories and concepts. Other promising conceptual frameworks could be only briefly mentioned, whilst many are yet to be identified or developed.

Footnotes

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author received no financial support for the research,authorship,and/or publication of this article.

References

Adams

Brückner

(2015) Wikipedia, sociology, and the promise and pitfalls of Big Data. Big Data & Society. July–December: 1–5.

Agarwal S and Sureka A (2015) Applying social media intelligence for predicting and identifying on-line radicalization and civil unrest oriented threats. arXiv:1511.06858. Available at: https://arxiv.org/pdf/1511.06858.pdf (accessed 17 May 2017).

Agre

(1997a) Beyond the mirror world: Privacy and the representational practices of computing. In: Agre

Rotenberg

(eds) Technology and Privacy: The New Landscape, Cambridge, MA: MIT Press, pp. 29–61.

Agre

(1997b) Toward a critical technical practice: Lessons learned in trying to reform AI. In: Bowker

Gasser

Star

et al. (eds) Social Science, Technical Systems, and Cooperative Work: Beyond the Great Divide, Mahwah, NJ: Erlbaum, pp. 131–157.

Albert CS and Salam AF (2013) Critical discourse analysis: Toward theories in social media. In: Proceedings of the nineteenth Americas conference on information systems, Chicago, IL, 15–17 August 2013, pp.1–8.

Andersen

(2008) The end of theory: The data deluge makes the scientific method obsolete. Wired. 23 June.

Aragona

(2017) New data science: The sociological point of view. In: Lauro

Amaturo

Grassia

et al. (eds) Data Science and Social Research: Epistemology, Methods, Technology and Applications, Cham, Switzerland: Springer International Publishing AG, pp. 17–24.

Atran

Axelrod

Davis

et al. (2017) Challenges in researching terrorism from the field. Science 355(6323): 352–354.

Bakhtin

(1981) The Dialogic Imagination, Austin: University of Texas Press.

10.

Bhui

Warfa

Jones

(2014) Is violent radicalisation associated with poverty, migration, poor self-reported health and common mental disorders? PLoS One 9(3): 1–10.

11.

Borum

(2011) Radicalization into violent extremism I: A review of social science theories. Journal of Strategic Security 4(4): 7–36.

12.

Bouvier

(2015) What is a discourse approach to Twitter, Facebook, YouTube and other social media: Connecting with other academic fields? Journal of Multicultural Discourses 10(2): 149–162.

13.

Boyd

Crawford

(2012) Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society 15(5): 662–679.

14.

Brooker

Barnett

Cribbin

(2016) Doing social media analytics. Big Data & Society 3(2): 1–12.

15.

Brooker

Dutton

Greiffenhagen

(2017) What would Wittgenstein say about social media? Qualitative Research 17(6): 610–626.

16.

Bruns

Burgess

Highfield

(2014) A “big data” approach to mapping the Australian Twittersphere. In: Arthur

Bode

(eds) Advancing Digital Humanities: Research, Methods, Theories, Basingstoke, United Kingdom: Palgrave Macmillan, pp. 113–129.

17.

Bruns

Schmidt

J-H

(2011) Produsage: A closer look at continuing developments. New Review of Hypermedia and Multimedia 17(1): 3–7.

18.

Bunnik

(2016) Countering and understanding terrorism, extremism, and radicalisation in a big data age. In: Bunnik

Cawley

Mulqueen

et al. (eds) Big Data Challenges: Society, Security, Innovation and Ethics, London: Palgrave Macmillan, pp. 85–96.

19.

Burns

(2015) Rethinking big data in digital humanitarianism: Practices, epistemologies, and social relations. GeoJournal 80(4): 477–490.

20.

Burrows

Savage

(2014) After the crisis? Big data and the methodological challenges of empirical sociology. Big Data & Society 1(1): 1–6.

21.

Calic D and Resnyansky L (2015) Twitter in crises ‘data’: A framework for critical reflection on the multidisciplinary research field. In: 2nd European conference on social media ECSM 2015 (eds A Mesquite and P Peres), Porto, Portugal, 9–10 July 2015, pp.52–58.

22.

Calude

Longo

(2017) The deluge of spurious correlations in big data. Foundations of Science 22(3): 595–612.

23.

Ceron

Curini

Iacus

(2014) Every tweet counts? How sentiment analysis of social media can improve our knowledge of citizens’ political preferences with an application to Italy and France. New Media & Society 16(2): 340–358.

24.

Chandler

(2015) A world without causation: Big data and the coming of age of posthumanism. Millennium: Journal of International Studies 43(3): 833–851.

25.

Chen

Chiang

RHL

Storey

(2012) Business intelligence and analytics: From big data to big impact. MIS Quarterly 36(4): 1165–1188.

26.

Couch N and Robins B (2013) Big data for defence and security. The Royal United Services Institute (RUSI), Occasional Paper, September 2013. Available at: https://uk.emc.com/campaign/bigdata/rusi/big-data-for-defence-and-security-report-final.pdf (accessed 15 June 2017).

27.

Crawford

Finn

(2015) The limits of crisis data: Analytical and ethical challenges of using social and mobile data to understand disasters. GeoJournal 80(4): 491–502.

28.

Dalton

Taylor

Thatcher

(2016) Critical data studies: A dialog on data and space. Big Data & Society 3(1): 1–9.

29.

DiMaggio

(2015) Adapting computational text analysis to social science (and vice versa). Big Data & Society 2(2): 1–5.

30.

Driscoll

Walker

(2014) Working within a black box: Transparency in the collection and production of big twitter data. International Journal of Communication 8: 1745–1764.

31.

Fairclough

(1992) Discourse and Social Change, London: Polity Press.

32.

Feyerabend

(2010) Against Method, London: Verso.

33.

Fishman

(2010) Community-level indicators of radicalization: A data and methods task force. Report to Human Factors/Behavioral Sciences Division, Science and Technology Directorate, U.S. Department of Homeland Security, College Park, MD: START.

34.

Gandomi

Haider

(2015) Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management 35(2): 137–144.

35.

Gehl RW and Bakarjieva M (eds) (2017) Socialbots and Their Friends: Digital Media and the Automation of Sociality. New York, NY: Routledge.

36.

Goffman

(1981) Forms of Talk, Philadelphia: University of Pennsylvania Press.

37.

Golding

Murdock

(1978) Theories of communication and theories of society. Communication Research 5(2): 339–356.

38.

Goldsmith

Brewer

(2015) Digital drift and the criminal interaction order. Theoretical Criminology 19(1): 112–130.

39.

Heath-Kelly

(2017) The geography of pre-criminal space: Epidemiological imaginations of radicalisation risk in the UK Prevent Strategy, 2007–2017. Critical Studies of Terrorism 10(2): 297–319.

40.

Housley

Dicks

Henwood

et al. (2017a) Qualitative methods and data in digital societies. Qualitative Research 17(6): 607–609.

41.

Housley

Webb

Edwards

et al. (2017b) Digitizing sacks? Approaching social media as data. Qualitative Research 17(6): 627–644.

42.

Iliadis

Russo

(2016) Critical data studies: An introduction. Big Data & Society 3(2): 1–7.

43.

Johnson

(2016) New terrorism reveals new physics. APS News 25(10): 8.

44.

Johnson

Medina

Zhao

et al. (2013) Simple mathematical law benchmarks human confrontations. Scientific Reports, 3, Article number 3463.

45.

Karpf

(2012) Social science research methods in Internet time. Information, Communication & Society 15(5): 639–661.

46.

Khoury

Ioannidis

JPA

(2014) Big data meets public health. Science 346(6213): 1054–1055.

47.

Kitchin

(2014) Big data, new epistemologies and paradigm shifts. Big Data & Society 1(1): 1–12.

48.

Kitchin

(2017) Big data – Hype or revolution? In: Sloan

Quan-Haase

(eds) The SAGE Handbook of Social Media Research Methods, Los Angeles, CA: SAGE, pp. 27–39.

49.

Kruglanski

Gelfand

Bélanger

et al. (2014) The psychology of radicalization and deradicalisation: How significance quest impacts violent extremism. Advances in Political Psychology 35(suppl. 1): 69–93.

50.

Leszczynski

Wilson

(2013) Guest editorial: Theorizing the geoweb. GeoJournal 78: 915–919.

51.

Lim

(2015) Big data and strategic intelligence. Intelligence and National Security 31(4): 619–635.

52.

Manovich

(2012) Trending: The promises and the challenges of big social data. In: Gold

(ed) Debates in the Digital Humanities, Minneapolis: University of Minnesota Press, pp. 460–475.

53.

Mansell

(2012) Imagining the Internet: Communication, Innovation, and Governance, Oxford: Oxford University Press.

54.

Mäntylä

Graziotin

Kuutila

(2018) The evolution of sentiment analysis – A review of research topics, venues, and top cited papers. Computer Science Review 27: 16–32.

55.

Mare

(2017) Tracing and archiving ‘constructed’ data on Facebook pages and groups: Reflections on fieldwork among young activists in Zimbabwe and South Africa. Qualitative Research 17(6): 645–663.

56.

Marwick

Boyd

(2010) I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience. New Media & Society 20(1): 1–20.

57.

Mayer-Schönberger

Cukier

(2013) Big Data: A Revolution That Will Transform How We Live, Work, and Think, London: John Murray.

58.

Mazzocchi

(2015) Could big data be the end of theory in science? A few remarks on the epistemology of data-driven science. EMBO Reports 16(10): 1250–1255.

59.

Merton RK (1968a) Continuities in the theory of social structure and anomie. In: Merton RK (1968) Social Theory and Social Structure. New York, NY: Free Press, pp.215–248.

60.

Merton

(1968b) Wissenssoziologie and mass communications research. In: Merton

(ed) Social Theory and Social Structure, New York, NY: Free Press, pp. 493–509.

61.

Mohr

Wagner-Pacifici

Breiger

et al. (2013) Graphing the grammar of motives in National Security Strategies: Cultural interpretation, automated text analysis and the drama of global politics. Poetics 41(6): 670–700.

62.

Monaghan

Molnar

(2016) Radicalisation theories, policing practices, and “the future of terrorism?”. Critical Studies on Terrorism 9(3): 393–413.

63.

Neiger M, Meyers O and Zandberg E (eds) (2011) On Media Memory: Collective Memory in a New Media Age. New York: Palgrave Macmillan.

64.

Nordenstreng

(1968) Communication research in the United States: A critical perspective. Gazette XIV(3): 207–216.

65.

Nothman J, Ahmad A, Breidbach C, et al. (2015) Understanding engagement with insurgents through retweet rhetoric. In: Sze-Meng J and Haffari G (eds) Proceedings of Australasian Language Technology Association Workshop, Parramatta, Australia, 8–9 December, pp.122–127. Brisbane, Australia: QUT.

66.

O’Halloran

Tan

Wignell

et al. (2017) Multimodal recontextualisations of images in violent extremist discourse. In: Zhao

Djonov

Björkvall

et al. (eds) Advancing Multimodal and Critical Discourse Studies: Interdisciplinary Research Inspired by Theo Van Leeuwen’s Social Semiotics, New York, NY: Routledge.

67.

Overbey LA, Batson SC, Lyle J, et al. (2017) Linking Twitter sentiment and event data to monitor public opinion of geopolitical developments and trends. In: Social, Cultural, and Behavioural Modelling: 10th International Conference, SBP-BRiMS 2017 (eds D Lee, Y-R Lin, N Osgood, et al.), Washington, DC, USA, 5–8 July 2017, pp.223–229. Washington, DC: Springer.

68.

Papacharissi

(2015) The unbearable lightness of information and the impossible gravitas of knowledge: Big Data and the makings of a digital orality. Media, Culture & Society 37(7): 1095–1100.

69.

Park

Fables

Parker

et al. (2010) The role of culture in business intelligence. International Journal of Business Intelligence Research 1(3): 1–14.

70.

Peirce

(1955) Logic as semiotic: The theory of signs. In: Buchler

(eds) Philosophical Writings of Peirce, New York, NY: Dover Publications, pp. 98–129.

71.

Rae

(2012) Will it ever be possible to profile the terrorist? Journal of Terrorism Research 3(2): 64–74.

72.

Resnyansky

(2008) Social modelling as an interdisciplinary research practice. IEEE Intelligent Systems 23(4): 20–27.

73.

Resnyansky

(2014) Social media, disaster studies, and human communication. Technology and Society Magazine 33(1): 54–65.

74.

Resnyansky

(2015) Social media data in the disaster context. Prometheus 33(2): 187–212.

75.

Rieder

Simon

(2016) Datatrust: Or, the political quest for numerical evidence and the epistemologies of Big Data. Big Data & Society 3(1): 1–6.

76.

Rightler-McDaniels

Hendrickson

(2014) Hoes and hashtags: Constructions of gender and race in trending topics. Social Semiotics 24(2): 175–190.

77.

Robards

Lincoln

(2017) Uncovering longitudinal life narratives: scrolling back on Facebook. Qualitative Research 17(6): 715–730.

78.

Rost M, Barkhuus L, Cramer H, et al. (2013) Representation and communication: Challenges in interpreting large social media datasets. In: CSCW’13 proceedings of the 2013 conference on computer supported cooperative work, San Antonio, TX, USA, 23–27 February 2013. New York, NY: ACM, pp.357–362.

79.

Rovner

(2013) Intelligence in the twitter age. International Journal of Intelligence and Counterintelligence 26: 260–271.

80.

Rühlemann

(2017) Integrating corpus-linguistic and conversation-analytic transcription in XML: The case of backchannels and overlap in storytelling interaction. Corpus Pragmatics 1(3): 201–232.

81.

Russom

(2011) Big data analytics. TDWI Best Practices Report, Seattle, WA: The Data Warehousing Institute.

82.

Schroeder

(2018) Social Theory after the Internet: Media, technology, and globalization, London: UCL Press.

83.

Schuman

Scott

(1989) Generations and collective memories. American Sociological Review 54: 359–381.

84.

Sedgwick

(2010) The concept of radicalization as a source of confusion. Terrorism and Political Violence 22(4): 479–494.

85.

Sengers P, Boehner K, David S, et al. (2005) Reflective design. In: Proceedings of the 4th decennial conference on critical computing: Between sense and sensibility, Aarhus, Denmark, 20–24 August. New York, NY: ACM, pp.49–58.

86.

Shaw

(2015) Big data and reality. Big Data & Society 1(4): 1–4.

87.

Sheehan

(2012) Assessing and comparing data sources for terrorism research. In: Lum

Kennedy

(eds) Evidence-based Counterterrorism Policy, London: Springer Science+Business Media, pp. 13–40.

88.

Shilton

(2012) Values levers: Building ethics into design. Science, Technology, & Human Values 38(3): 374–397.

89.

Silverman

Bharathy

Kim

(2008) Challenges of country modelling with databases, newsfeeds, and expert surveys. In: Uhrmacher

Weyns

(eds) Multi-Agent Systems: Simulation and Applications, Boca Raton, FL: CRC Press, pp. 1–30.

90.

Smith

Burke

de Leiuen

et al. (2016) The Islamic State’s symbolic war: Socially mediated terrorism as a threat to cultural heritage. Journal of Social Archaeology 16(2): 164–188.

91.

Smith

(2014) Missed miracles and mystical connections: Qualitative research, digital social science and big data. In: Hand

Hillyard

(eds) Big Data? Qualitative Approaches to Digital Research (Studies in Qualitative Methodology, Volume 13), Bingley, United Kingdom: Emerald Group Publishing Limited, pp. 181–204.

92.

Tetlock

(2005) Expert Political Judgement: How Good Is It? How Can We Know?, Princeton, NJ: Princeton University Press.

93.

Trilling

(2014) Two different debates? Investigating the relationship between a political debate on TV and simultaneous comments on Twitter. Social Science Computer Review 33(3): 259–276.

94.

Vidino

Hughes

(2015) ISIS in America: From Retweets to Raqqa, Washington, DC: The George Washington University.

95.

Voloshinov

(1986) Marxism and the Philosophy of Language, Cambridge, MA: Harvard University Press.

96.

Von Behr

Reding

Edwards

et al. (2013) Radicalisation in the digital era: The use of the internet in 15 cases of terrorism and extremism, RAND Report. Santa Monica, CA: RAND Corporation.

97.

Vygotsky

(1986) Thought and Language, Cambridge, MA: The MIT Press.

98.

Wagner-Pacifici

Mohr

Breiger

(2015) Ontologies, methodologies, and new uses of Big Data in the social and cultural sciences. Big Data & Society. July–December: 1–11.

99.

Warner

Chen

H-I

(2017) Designing talk in social networks: What Facebook teaches about conversation. Language Learning & Technology 21(2): 121–137.

100.

Watson

(2014) Tutorial: Big data analytics: Concepts, technologies, and applications. Communications of the Association of Information Systems 34(1): article 65.

101.

Zappavigna

(2015) Searchable talk: the linguistic functions of hashtags. Social Semiotics 25(3): 274–291.