Sage Journals: Discover world-class research

Abstract

Recent works have suggested an analytical complementarity in mixing big and thick data sources. These works have, however, remained as programmatic suggestions, leaving us with limited methodological inputs on how to archive such complementary integration. This article responds to this limitation by proposing a method for ‘blending’ big and thick analytical insights. The paper first develops a methodological framework based on the cognitivist linguistics terminology of ‘blending’. Two cases are then explored in which blended spaces are crafted from engaging big and thick analytical insights with each other. Through these examples, we learn how blending processes should be conducted as a rapid, iterative and collaborative effort with respect for individual expertise. Further, we demonstrate how the unique, but often overlooked, granularity of big data plays a key role in affording the blending with thick data. We conclude by suggesting four commonly appearing blending strategies that can be applied when relying upon big and thick data sources.

Keywords

Big data blending ethnomethodology ethnography granularity multimethodology thick data

Introduction

Today, the omnipresence of sensors tracking our social whereabouts has led to the production of digital traces with high speed and volume commonly referred to as ‘big data’. While celebrated for its potentialities, scholars have also asserted how this growing body of digital traces, due to their common origin as by-products of already existing processes, often are used ‘out of context’, which decrease the ‘meaning and value’ (Boyd and Crawford, 2012: 670). This lack of context – a term we will define later in the article – is not an abstract or obscure problem apparent to a few philosophers focused on cybernetics. Rather, one need only to eyeball a dataset built from digital traces to be reminded how little decontextualized numbers by themselves are able to tell us about social life.

Figure 1 presents a simple example of such decontextualization from a big dataset containing spatial path points of persons moving within a store as recorded by video sensors. To calculate the movements, the video footage of the sensors has been reduced to decontextualized numeric traces of the interaction that took place, thereby enabling insights that someone ‘moved’ somewhere. Context – what was accomplished, by whom and where, how and why, all need to be re-created for ‘the data to carry meaning'. As Blank (2008: 540) commented: ‘With many interesting variables unavailable, people are, at best, thinly described. Because of these problems many forms of electronic record are very difficult for researchers to use.’ While big datasets are extensive in volume and granularity, these qualities often only extend along one dimension whereby they appear as ‘thin’ to the analysts and researchers working with them.

Figure 1.

Data within red square represents person moving from A to B as captured by video analytics. While a person’s path is highly detailed, data traces offer no information describing the context of the path.

Figure 2 combines these two distinctions: thin/thick–extensive/small into a matrix. The two red areas define the main data sources the blending methodology engages with.

Figure 2.

Splitting the data universe by the two distinctions, thin/thick–extensive/small. Four common methods for collecting data have been added to the figure to illustrate how it is possible to think about highly different data sources along these lines. The two red areas in, respectively, the extensive–thin and thick–small define the data sources that the blending methodology has been developed from and where the complementarity is strongest. Authors model.

In the top left corner, we find the big-thin data sources that Blank talked about. These data sources are extensive in numbers but also overly thin with very little context linked to them. Most big datasets built from sensors, such as location data collected from a GPS sensor, belong to this group of data. While less talked about than, e.g. social media data due to its less clear application, this group of data sources is by far the fastest growing with a steady appearance of new sensors tracking in our everyday life. To address the challenge of thin data, ‘data scholars’ have suggested complementing big data with sources of highly contextualized thick data (Blok and Pedersen, 2014; Boellstorff, 2013; Curran, 2013; Ford, 2014; Stoller, 2013; Wang, 2013).

We use the word ‘thick data’ for the group of data sources located in the opposite end of the coordinate system. ‘Thick data’ is synonymous with ethnographically collected and analysed observational data in the tradition of Clifford Geertz (1977), who described how thick descriptions of human behaviour include detailed data collection and analysis of the context in which a behaviour occurs. Thick data is defined by its contextual complexity which enables the researcher to reflect upon how and why people do what they do. Small data is opposed to big data by being a low number of instances. It is of course possible to have a small number of thin data, though this would in most cases be useless. Thus, thick and small data do not share the same epistemological status. However, it is often the case that the collection and analysis of thick data produced as human actions and interactions provides a relative small collection of thick phenomena, i.e. opposed to big data being millions of thin nodes. However, context is not a simple matter (Duranti and Goodwin, 1992). One particular strong analytical perspective for analysing thick ethnographically collected data is ethnomethodology (EM) (Garfinkel, 1967), conversation analysis (CA) (Sacks et al., 1974) and multimodal interaction analysis (Streeck et al., 2011), which rests on the collection of naturally occurring data through video recordings. In this tradition, some argue that the only relevant context in social interaction is the utterance before a new utterance is produced in a sequential environment (Schegloff, 1987, 1997). However, others in the ethnographic version of the EM/CA tradition (Arminen, 2005; Atkinson et al., 2001; Heath et al., 2010; Moerman, 1988) use a broader definition and put emphasis on the situated encounters in the tradition from Goffman (1964). The collection of thick data by ethnographic methodology is primarily based on (video/photo-) observations, field records and interviews. All sorts of contextual knowledge can potentially be relevant concerning these situated encounters, but in the tradition of EM/CA, we emphasize that it is primarily the issues that participants themselves somehow orient to, that is the most relevant context for the analysis (Heritage, 1984). Thus, some qualitative techniques might be used in this process, but thick data is not generally speaking per se provided through all sorts of qualitative methods. By Big–Thick Blending we specifically intend to focus on the blending of ethnographically collected thick observational data.

The blending methodology rests on the complementarity between these highly heterogeneous data sources. While this is the focus in the paper, it should be noted that the methodology is not limited to these extremes and it can surely be productive to blend both ‘less thin’ big data sources, e.g. social media data, or ‘less thick’ thick-data, e.g. interview data.

We are far from the first to argue for important positive complementarities to arise from mixing these two types of data. It has, e.g. been argued that the mixing of big decontextualized data with highly contextualized thick data can help ‘uncover the meaning behind Big Data visualization and analysis’ (Wang, 2013). Others have hypothesized how ‘entirely new interferences and polyphonies’ can arise […] given that these two types of data are ‘mixed with care’ (Blok and Pedersen, 2014: 1).

There are also empirical experiments: Researchers from Berkley have, e.g. studied space usage within homes by mixing big data from behavioural tracking sensors with ethnographic observations (Anderson et al., 2009). Similar, tracking using mobile-embedded Bluetooth sensors was conducted by Girardin (2013) to study congestion and space usage at the museum of Louvre, using observations provided by the massive security staff guarding the values of the museum to qualify and extend the thin data traces. A similar ‘qualifying role’ was reached by Hsu (2014) who made use of GPS data from Myspace to ‘navigate’ her ethnographic mapping of online music communities. In contrast to the Louvre study, where thick descriptions were used to qualify analytical results built from thin big data, big geographical knowledge on the location of the individuals wherein Hsu study used to qualify and contextualize her local ethnographic work. Finally, Blok A et al. (2017) have recently used the setting of a party to study possible complementary effects (and the absence hereof) when combining big and thick observations.

However, none of these engage directly with the goal of describing a practical method for integrating big and thick data (cf. Girardin, 2013). The development of applicable methodologies has thus been overly absent leaving scholars like ourselves in the dark as to how in practice billions of thinly digital data (instead of traces) should be mixed together with ethnographic accounts (the only notable exception is the method of ethno-mining that we will discuss further below). In this paper, we report on how we, a team of ethnographers and big data analysts, during the last three years have developed methodological conventions on how to blend big and thick analytical results.

The remaining paper falls in four parts. First, we position the blending methodology within the multimethodology framework. Second, we develop the methodological concept of ‘blending’ as a technique for bringing big and thick analytical insights into shared analytical spaces. Third, we explore this method in relation to two analytical examples, extracting important insights on how best to integrate big and thick data. We end by discussing how the behavioural and temporal granularity intrinsic to most big datasets plays a crucial role in affording the blending of insights built from big and thick data.

Establishing a framework for blending

The blending terminology is borrowed from Fauconnier’s (1997, 2001) and Fauconnier and Turner’s (1998, 2002) research in the field of cognitivism and linguistics. In their terminology, blending is a cognitivistic process assumed to be ubiquitous to everyday thought that people apply to combine elements from diverse scenarios into new elements – so-called blended spaces. The theory thus provides a terminology for the cognitive process of developing new concepts (cf. Koestler, 1964). In the Big–Thick Blending methodology, we draw on this terminology, but in a slightly different manner as we extend the concept of blending from being primarily a cognitive process into one that also covers intentional and strategic processes such as research. We use blending to describe the analytical process in which insights based on big and thick data are brought together into new conceptualizations through deliberate actions performed by researchers. We argue that this move is theoretically appropriate and within the conceptual nature of the terminology (see also other uses, such as in Hougaard (2005) and Hutchins (2005)). Figure 3 shows Fauconnier and Turner’s terminology with a simple example showing the construction of a ‘lamp–chair’.

Figure 3.

Blending of two elements into a new third one (see Fauconnier and Turner, 1998; Authors model, Due, 2014).

The figure shows how a common generic space exists: a schematic frame of shared elements. In this case, the common generic space is (at least) the category: ‘furniture’, the shared colour and the wooden material. The blending process then consists in partially matching the two inputs ‘lamp’ and ‘chair’ and projecting selectively from these two input spaces into a new space, the blended space. In the blended space, we have a new type of furniture. This construal is emergent in the blend but it also remains connected to the original inputs by specific affordances: the lamp–chair is a new emergent construct, but the specific affordances of, for instance, the light bulb and the chair legs remain the same.

The example is a simple case of blending. Two inputs share properties that might be blended. They are linked by a cross-space mapping and elements are projected selectively to a blended space. The projection of these specific elements allows an emergent structure to develop. Thus, the blending process can derive concepts from the input spaces to provide relations that do not exist in the separate inputs (Fauconnier and Turner, 2003).

The input spaces that are blended in the Big–Thick Blending methodology consist of analytical insights, which is built on data materiality with different affordances. Rather than mixing different methods with different disciplinary constraints, the Big–Thick methodology focuses upon the blending of insights. The actual blending can thus be described as an interpretative, distributed cognitive and embodied process conducted by the researchers. Consequently, the blending must happen iteratively and in rapid pace to counter how analytical insights tend to stabilize over time. The blending thus needs to take place before the analysis in each input space is finished to secure the full potential of the blending process.

Positioning Big–Thick Blending within a multimethodology framework

As a method, the blending methodology carries obvious similarities to the idea of ‘mixed methods’. At its core, mixed method (or multimethodology) is about reaching a more comprehensive or ‘true’ view on reality by linking different bits and pieces of data – often generated by very different methods (Brewer and Hunter, 1989). Within this broad definition many distinct approaches exist, varying across methodological choices as to what is mixed (data, methods, analytical results, etc.), why they are mixed (triangulation, complimentarity, expansion, etc.) and when they are mixed (initiation, conclusion, continuously, etc.). Despite this variety, most methods are interested in bridging the two main domains within social sciences: the qualitative and quantitative methods. Big–Thick Blending differs from these approaches by focusing on a very particular and narrow data domain of blending analytical results built upon big and thick data sources carrying compatibility and complementarity.

The previous theoretical and empirical engagement in mixing big and thick data has neglected the establishment of methodological recommendations for how to carry out such mixing. In our search for mixed method approaches with an attentiveness to the unique affordances of big and thick data, we managed to identify only one method termed as the ‘ethno-mining’. First suggested by Aipperspach et al. (2006) and later further developed in Anderson et al. (2009), ethno-mining sets out to combine thin big data collected from different sensors (data mining) with ethnographic descriptions of the same settings (ethnography). As in Big–Thick Blending, ethno-mining takes seriously the possibilities of harvesting the complementarity between the heterogeneous data worlds and works to craft hybrids in which ‘traces of each of the ingredients can still be seen’ but the different inputs ‘cannot be separated out’ (Anderson et al., 2009: 125). Ethno-mining also describes the iterative and rapid loops necessary in the process.

However, where Big–Thick Blending attempts to integrate analytical results, but never the method or data itself, the ethno-mining approach attempts to engage both the data and the process (Aipperspach et al., 2006). While this might be possible in some special cases with an abundance of available resources, we fear that this focus runs the risk of underestimating the importance of ‘expertise’ needed to fully master both big and thick data. Second, where the main reason for mixing data in the ethno-mining perspectives is ‘exposing the biases inherent in either type of data alone’, a type of triangulation, the Big–Thick Blending focuses on the crafting of entirely new analytical results (blended spaces) through, e.g. complementarity, extension and calibration. Third, ethno-mining only vaguely offers a terminology as to how big and thick data should be integrated. The blending terminology presented here thus fills out the critical void of suggestions as to how one could approach mixing big and thick data.

The goal of the following two cases is to develop this terminology further as well as show it in action. Here we will demonstrate how blending occurs as (1) a departure from a generic space of interest and data complementarity, (2) different input spaces with findings from, respectively, big and thick data analysis and (3) a selective projection of some of these findings into a blended space with emergent properties. We focus on the structural elements of the blending process and focus a bit rigidly only on the input spaces and the blend with emergent properties. There are many other small steps in the iterative progression that are important, but impossible to discuss within the limited scope of this article.

Case 1: Blending big and thick insights from video recordings

This case was developed in close collaboration with a Danish optometrist chain that wanted to improve the in-store experience of their customers when visiting one of their 100 brick-and-mortar shops. Blending big and thick data became especially crucial as we wanted to both quantify and qualify customer’s physical interactional paths and in-store actions to identify crucial points for enhanced customer interactions.

We collected thick data from 11 stores using observations, shadowing, contextual inquiries, interviews, video recordings and mystery shopping (acting like a normal shopper while observing and taking notes). All employees had signed informed consent forms and customers were informed through visible signs and verbal consent. We collected more than 1000 hours of video footage from the stores through mounted and hand-held devices. In a single optometry shop video cameras were also used to quantify the in-store movement through applying novel video analytics and face recognition converting selected recordings into measures of physical in-store movement. The camera remained in the shop for three months, covering most of the store space during business hours. By tracking movement in the recorded video footage, video analytics transformed the movement into spatial coordinates which were combined to depict the totality of movement in the store, in effect quantifying the physical customer paths and turning them into routes fit for statistical manipulation. While the technology has been used for several years in security (Regazzoni et al., 2010), warfare (Bowman et al., 2017) and certain retail applications (Battiato et al., 2016; Huang et al., 2017; Musalem et al., 2015), it has only recently been adapted to mid-range camera technology making it viable for use as more than a niche product. From this project, we show two examples in which blending came to play a key role.

Example 1a: Identifying the importance of tables and charts

Through ethnographic observations in the shop we identified the tables as important interactional touchpoints and wanted to look more into them.

Input space 1: Analysis of thick interactional data of customer interaction

Using video recordings of the interactions around the tables we did a fine-grained multimodal interaction analysis (Mondada, 2014; Streeck et al., 2011). Figure 4 shows one of our initial (thick) micro-analysis of the interaction between a customer and a sales person.

Figure 4.

A detailed ‘Jeffersonian’ transcription (Jefferson, 2004) of the interactions occurring at the interview tables in the store. The transcription reveals how the diagram creates a misalignment between the employee and the customer.

Through the analysis of the thick data we identified the diagram as a focal point at the tables. After proposing what the optician would recommend, the conversation ends by the optician asking the customer what she thinks about ‘that’ (line 62). There is a very long pause on 2.6 sec before the customer initiates an entirely new topic. The long pause and the unrest in the customer’s embodied actions displays how she probably does not understand what ‘that’ is. She demonstratively does not respond to the question put forth by the employee, although she orients through body posture and gaze to the diagram thereby making it relevant for the interaction. However, the diagram is not used as a helpful resource in situ. Instead, the different symbols on the diagram seem to be part of the dis-alignment between the employee and the customer, as the customer silently stairs at the symbols without making any verbal account. Hence, this example displays problems in the interaction relating to explaining products using the chart.

Input space 2: Analysis of customer’s big data behaviour

While departing from the same overall generic space, the analysis of the video analytics big data was simultaneously able to pinpoint to the specific behavioural patterns concerned with the tables. During this analysis, we found that human activity centred around the checkout counter and the interview tables, the later a rather obscure area of the store with no products (light red areas of Figure 5).

Figure 5.

The big data video analytics mappings are shown with a focus on the tables.

Blending findings from big and thick analysis

Figure 6 shows the ingredients of the blending. Input space 1 consists of the multimodal interaction analysis identifying the diagram as a crucial actor in the table activity, while input space 2 consists of the tracked movement paths in the store, identifying the store tables as an interactional hub. This supplied us with empirical grounding for zooming further in on the interactions at the tables, leading to the subsequent identification of similar examples where participants displayed perplexity towards the diagram central to the activity at the tables. From these two input spaces, a blended space of the activity at the tables qualified as ‘relevant’ was thus created.

Figure 6.

Blending big patterns of in-store behaviour with thick descriptions of activity surrounding the store tables.

The shift between data modes allowed us to identify and innovate on an overlooked diagram crucial to the customer’s interactional trajectories. However, while our fine-grained analysis exposed the workings of the diagram in social interaction and selling practices, the tables only became a ‘relevant’ area of interest through the work of the big data camera patterns. By using the granularity of data and the blending of big and thick analytical results, a thick result was thus qualified as central to the store flow through the frequency count of customers in the area, separating ‘real’ issues from non-real. There were many findings not shown here due to space limits, but as shown: specific findings from the input spaces were selectively projected into the blended space which then dynamically developed an emergent structure, in the example providing managers a solid ground to do something about the use of charts.

Example 1b: Identifying the most relevant activity at glass walls

In the second example from the same case, the generic space and analytical aim was to explore how customers interacted with the shop’s glass walls that exhibited diverse product categories such as ‘contact lenses’, ‘trendy male glasses’, and so on. While the company management was aware that wall content and design were important and attracted different customers, they relied solely upon gut feelings to direct the interior design of their 100 shops.

Input space 1: Big data analysis of in-store customer paths

Using video analytics, we produced several compelling heat maps covering store activity. Analysis of the data revealed great differences concerning customer path behaviour and time spent in front of glass walls. But skewness in data, common to most datasets of digital traces, made it difficult to conduct any nuanced comparison of diverse map areas. Additionally, the measured activities in several zones deviated greatly from our expectations, with zones at the periphery of the shop showing up as intense interaction zones, while zones near the entrance were nearly empty (see Figure 7).

Figure 7.

Diagram shows video analytics exploring in-store behaviour in a specific optical store before (left) and after (after) blending, thus leading to more relevant and precise numbers.

Input space 2: Thick data analysis

Through analysis of the ethnographic material (video recordings and field notes), we divided the shop into analytically relevant zones. Figure 6 shows some of the different materials that we employed in the subsequent blending process. One of the central findings from the analysis was the way the customers orient to the glass wall by primarily looking at main height where, e.g. signs with glass-category descriptions hung. The analytical process also resulted in many findings about the type of interaction occurring while trying out glasses at the walls, e.g. the focus on how glasses are passed between customer and employees (Due, 2018a, 208b).

The blending strategies of calibration and contextualization

Very different input spaces generated findings from, respectively, big and thick data analysis. Through the process, ethnographic findings about customer behaviour in the shop became standards that the big data results could be evaluated and negotiated against and vice versa. This process thus resembled the scientific process of ‘calibration’, in which the measurements of an instrument, in the current example the video analytics of the customer paths, are stabilized by alternately comparing the results and adjusting the instrument (Franklin, 1997; c.f. Bateson, 1978). From the thick input space specific findings were projected into the blended space: The ethnographic analysis revealed how the extremely high readings for the sunglasses product category, despite its position at the shop’s periphery, were expected during the summer months, with sunglasses being the only product category able to attract the attention of passing customers. The ethnographic analysis also pointed out how the corridor to the eye-testing area was heavily trafficked by staff members, and the consequently high numbers in nearby zones were a misrepresentation of customer activity levels if not properly adjusted when defining/drawing zones in the shop. Through such calibration, based on selectively projected inputs from the different data worlds and types of analysis, blending transformed the untested digital trace into a somewhat reliable measure of behavioural activity, thereby constructing the blended space with emergent properties as a novel result.

Figure 7 also illustrates the thinness of big data as contextless numbers. We can start by asking if 1405 persons in front of the male designer glass wall are above or below expectations? (see Figure 7, right). The question is rhetorical because the number ‘1405’ without further information is without meaning in itself. Contextualization is strongly needed. From a different input space the ethnographic analysis revealed that, e.g. the more expensive and trendy glasses (e.g., male designer glasses) often were considered mere attention attractors. Beautiful, but with a price well above what most customer could afford, the ethnographic analysis uncovered why many of the optometrists accepted low sales number from this category. Viewed within this blended space, the low numbers of customers found in the trendy product zone were in fact surprising and thus highly relevant to the manager.

Through the blending processes, the ethnographer’s analysis of the shop’s layout thus re-contextualized the otherwise arbitrary numbers of customers in front of a glass wall. This knowledge was far more than an appealing supplement but rather an indispensable complementary ‘thickness’ that entered into the final visualization alongside the quantitative (see Figure 8, ‘blended space’). Such challenges of big datasets are extremely common when working with thin big data (cf. Porway, 2013; Blok et al., 2017).

Figure 8.

Model summarizes the blending process. By blending the data visualization of movement in the different zones of the store (input space 1) with thick analytical findings (input space 2), the blended space of Figure 8 is formulated.

Case 2: Blending insights from big sensor data with thick etnographic data

To show how the blending process is not only applicable using video analytics, we briefly present a second case. This case concerns an evaluation study of bike signs initiated by the municipality of Copenhagen. To ease cyclists’ navigation in the city the municipality put up hundreds of bike signs showing the direction and travel distance to key places around the city. The municipality wanted to evaluate the effect of the signs and how the cyclists made use of them. Several conventional methods, including ethnographic observation and shadowing of cyclists, interviews and an online survey, were applied to evaluate the usage of the signs. On top of this, the team also used location data from 371 individual cyclists who installed a specially designed app on their smartphone that collects and transmits GPS data on their journeys through the city. Thus, this project originated in a generic space of shared interest and data complementarity: the object of the study, the cyclists’ naturally occurring paths and usage of signs were shared across both data sources but the methods for collecting insights originate from very different methodologies and data worlds.

Input space 1: Analysing big data from GPS trackers

The analysts drew a heat map of the average length (km) of the routes the participants followed (Figure 9). The map revealed how morning and afternoon commuting follow routes of vastly different length. This pattern seemed to persist when zooming in on the paths of some of the individual cyclists, using the granularity of the dataset. Looking at individual GPS-identified paths thus revealed that the two trips often appeared to follow entirely different routes (see e.g. Figure 10).

Figure 9.

Average length (km) of bike routes. Morning commuting appears to follow longer routes than in the afternoon.

Figure 10.

Visualization of an individual cyclist’s trip through Copenhagen. The visualization reveals that the specific cyclist, like many others, follows different paths to and from work.

While the trackers clearly indicated that biking to work differed greatly from the act of biking from work, the analyst were puzzled by the fact that the busy morning commutes should be the long routes, rather than the afternoon. On this background, the analyst developed an alternative explanation in which the shortness of the afternoon routes was a result of stops along the route. This would split the route into multiple routes that individually were shorter than the morning routes but combined would be longer.

Input space 2: Analysing thick data

That cyclists followed different routes according to the hour of the day were also found by the ethnographers by following the cyclists as they navigated the city. Through this shadowing and contextual inquiry, analysis showed that many citizens developed multiple routes to and from the same destination. Thick data consisted not just of interviews in survey form, but also contextual inquiries accomplished during the field work, where the ethnographers would ask the cyclists why they chose the paths they cycled. As a 30-year-old local woman explained on her way to work: ‘I always bike the same route to work because it is the fastest. […] I need to cross the river, so I always bike across ‘cykelslangen’ [ed. bridge in Copenhagen]. That is clearly the fastest’ (see Figure 11). While speed is thus the primary factor for this cyclists’ choice of route in the morning, this and encounters with other cyclists revealed how speed is a much less important factor in the afternoon with cyclists developing secondary routes based upon feeling of safety, shopping possibilities along the way and green surroundings. As the 30-year-old local woman describes her secondary route: ‘If the sun shines, then I like to take some time instead of biking home directly, and then I will go by another way, gaze a bit and listen to music.’

Figure 11.

A woman shadowed and interviewed about her route preferences.

Figure 12.

The blended space of mixing thick and big descriptions of cyclists led to a more pluralistic understanding of cyclists’ choice of routes, adjusting the understanding of what counts as an effective sign.

Blended space: Emergent results from big and thick data analysis

The different input spaces resulted in different and yet at the same time very complementary findings because of the shared generic space, which was then selectively projected into a blended space with emergent properties. Through the blending of the two input spaces, a deeper understanding of the city’s cyclists emerged. The blending thus revealed that while the increased speed and more direct routes provided by the signs might be useful in most situations, path choices are often more complicated with multiple factors informing the final choice of the route in the afternoon. To improve the signs would thus not only mean optimizing their ‘effectiveness’, but would also necessitate a consideration of how cyclists more interested in green surroundings and traffic might best be assisted. Figure 12 summarizes the blending.

This case represents an example in which reciprocal effects are produced through the blending. By blending big and thick findings the big data patterns of the bike journeys are enriched with an explanation through the thick observation of many cyclists’ reliance upon multiple routes to and from their home. In this sense, the thick observations work on the big by adding a ‘why’ to the big ‘what’, a relationship that has also been brought forward in prior big data studies (e.g., Kitchin, 2014; Porway, 2013). The relationship is, however, also reciprocal, since the same blending process also extends the thick observations with knowledge on the generalizability of the behaviour of using multiple routes to and from one’s home.

The blending thus exploits the unique granularity of most big datasets (Ruppert et al., 2013) which allows us to re-identify selected behavioural traits of the population built from thick descriptions within the big datasets. In this specific case, we identify both thick and big observations of having multiple routes. However, in contrast to conventional quantitative data sources, the extreme granularity of big data allows us to aggregate individual observations together into aggregated basic statistics for the behaviour without having to disconnect from the individual behaviour as illustrated in Figure 13. After identifying a specific behaviour within data, i.e. having multiple bike routes, we thus count the number of people who according to the data make use of multiple routes in order to evaluate the extent of the phenomenon.

Figure 13.

Illustration of linking big-and-thick insights through shared behavioural traits.

This strategy is thus not unlike the very common mixed method strategy of exploring the extent of an identified phenomenon by, e.g. following up the observation with a representative survey (Bryman, 2006). However, in contrast to such mixed methods, which commonly ends up working on different populations and within different settings, both setting and population remain closely linked to each other in the blending approach as the departure is within the generic space with alike structural data properties. Through this move, it is possible to reach what has fittingly been described as a quali-quantitative perspective (Latour et al., 2012) with numbers and stories co-appearing within the same blend.

Concluding remarks

For researchers and analysts, the complementary nature of big and thick data suggests moving towards more and deeper integration. While scholars have previously engaged empirically and theoretically with the task of integrating big and thick data worlds, none have attempted to develop a systematic method for this process. An important contribution of our paper is therefore the introduction of methodological specificity backed by empirical cases to the much-talked about, but little practiced, process of complementing big and thick insights. Under the concept of blending, we have reported on our own experiments for engaging analytical insights grounded in big and thick data, conceptually linking insights based on highly heterogeneous datasets.

Summing up, the Big–Thick Blending methodology proposed here is about blending analytical insights. The blending rests upon the contribution from two (or more) separate input spaces containing, respectively, thick and big data analytical insights which share some conceptual associations in a generic space. While the methodology can be applied to blend other data types as well, our interest has been to highlight the unique complementary effects that arise when one attempts to blend thin big data with small thick data. The method unfolds by selectively projecting insights from these input spaces hereby leading to the creation of new blended spaces with the construction of novel results. From the outside, this process bears resemblance to the basic dialectic method of thesis, antithesis, synthesis. However, rather than being based on opposition and conflict (in the Hegelian sense), blending is based on complementarity and extension as the main elements of the creation of analytically interesting cross-space mappings.

Through two cases we have demonstrated how analytical insights built from heterogeneous big-and-thick data sources can qualify and guide each other’s focus through blending processes with the goal of constructing novel results in emergent blended spaces. For simplicity, the presentation of the examples followed a linear, step-by-step progression. While this can be an effective strategy when the objective is to make a specific point, blending processes hardly ever consist of such linear progressions. We fully agree with Elgaard et al (2017) suggestion that mixing should follow the iterative and rapid act of slaloming down a steep hill. The blending processes should also rely on iterative and rapid exchanges between big and thick data insights where the researchers deliberately blend inputs with shared properties.

Table 1 summarizes the four main strategies presented in these cases. Other strategies for blending exist, just as the iterative and open process of blending means that no blending process ever follows the exact same process. It should also be noted that multiple strategies can be applied to the same case and even in extension of each other, and that the role of big and thick data described below sometimes can be switched around.

Table 1.

Common blending strategies and their usage.

Strategy	What is done	When should it be used
Calibrating	Observations derived from thin big data are blended with thick observations of the same social phenomenon to calibrate the big data and develop trust for these new data	When working with more experimental data sources or data sources that are very vulnerable to noise
Contextualizing	Observations derived from big data are blended with observations collected outside the physical infrastructure to contextualize the big data patterns with aspects that do not leave behind digital traces	When working with big data sources that are locked to a physical infrastructure
Adding the why	Thin observations derived from big data are blended with thick observations of the same social phenomenon to add a ‘why’ to unexplainable patterns within the big dataset	When working with highly thin datasets in which intentions behind the unravelled phenomena cannot be extrapolated from the digital traces
Adding scale of behaviour	Exploiting the unique granularity of big data, aggregated descriptions of behaviour are blended with behaviour extrapolated form thick descriptions to establish the extent of a specific behaviour	When wanting to evaluate the extent of a particular behaviour where self-reflective answers collected through, e.g. interviews, are not desirable/possible

As a multimethodology, Big–Thick Blending distinguishes itself from most other approaches because of (1) the attentiveness to data affordances and data complementarity in the generic space, (2) the speed and use of iterations in the method and (3) respect of divergent analytical competencies represented through divergent types of analysis and spaces.

Attentiveness to data affordance not only relate to the use of data sources that are complimentary in their shape (thin-big versus thick), but thin big data and thick data are also commonly connected through a shared focus of observing physical behaviour. What is mixed in Big–Thick Blending are analytical results that share a focus on observing physical behaviour with the implication that both the base of participants and its context are shared across the big and the thick approach leading to much more integrated analytical results in which the different contributions cannot easily be separated out nor exist by itself (cf. Anderson, 2009). This is described within the methodology as shared structure in the generic space.

Dealing with massive datasets also requires specially developed expertise in the same way that the practice of ethnography and micro-analysis of video recordings requires prior training. On this background, we firmly believe that blending processes should seek to honour these differences in expertise, shifting the focus towards analytical outcomes of diverse methods. Blending thus joins with the growing choir of digital-based scholars who suggest that social scientists abandon the historical ideal of the renaissance person, bound to the individual but genius scholar who masters all methods and theories needed (e.g., Ford, 2014; King, 2014; Marres, 2013; Venturini et al., 2017).

Footnotes

Acknowledgements

The author would like to thank the entire team behind the study,notably Johan Trærup,Søren Gravholt-Nielsen,Håvard Lundberg and Heidi Sørensen. A special thought also goes to the people at Gemeinschaft and the Technique and Environmental department of Copenhagen Municipality for drawing on their work in relation to the second case. Finally,we would like to thank Torben Elgaard,Helena Webb and Anders Blok for providing useful comments along the way. And to the optician chain for letting us collect data in their shops.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: The work was supported by a grant from Synoptik Fonden along with support from the KU16 pool at the University of Copenhagen.

References

Aipperspach R, Lawrence RT, Woodruff A, et al. (2006) Ethno-Mining: Integrating Words and Numbers from the Ground Up.

Anderson K, Nafus D, Rattenbury T, et al. (2009) Numbers have qualities too: experiences with ethno-mining. Ethnographic Praxis in Industry Conference Proceedings 2009; 1: 123–140. https://doi.org/10.1111/j.1559-8918.2009.tb00133.x.

Arminen

(2005) Institutional Interaction: Studies of Talk at Work, Aldershot, Hants, UK: Ashgate Publishing Limited.

Atkinson

Coffey

Delamont

et al. (2001) Handbook of Ethnography, 1st ed. London; Thousand Oaks, CA: SAGE Publications Ltd.

Bateson G (1979) Mind and nature: a necessary unity (1st ed). New York: Dutton.

Battiato

Cavallaro

Distante

(2016) Special issue on ‘video analytics for audience measurement in retail and digital signage’. Pattern Recognition Letters 81(C): 1–2.

(2008) Online research methods and social theory. In: Blank

(ed.) In: The SAGE Handbook of Online Research Methods, London: SAGE Publications, Ltd, pp. 537–549. Available at: http://methods.sagepub.com/book/the-sage-handbook-of-online-research-methods/n29.xml (accessed 18 February 2017).

Blok

Pedersen

(2014) Complementary social science? Quali-quantitative experiments in a Big Data world. Big Data & Society 1(2): 1–6.

Blok A, Carlsen H, Bornakke T, et al. (2017) Stitching together the heterogeneous party: a complementary social data science experiment. Big Data & Society.

10.

Boellstorff T (2013) Making big data, in theory. First Monday 18(10). Available at: http://journals.uic.edu/ojs/index.php/fm/article/view/4869 (accessed 23 January 2014).

11.

Bornakke T (2017) Transactional data experiments (PhD Dissertation). Copenhagen University.

12.

Bowman EK, Turek M, Tunison P, et al. (2017) Advanced text and video analytics for proactive decision making. In: Proceedings Volume 10207, Next-Generation Analyst V; 102070K (2017). Anaheim, California, United States: SPIE Defense+Security. DOI: 10.1117/12.2276369.

13.

Boyd

Crawford

(2012) Critical questions for Big Data. Information, Communication & Society 15(5): 662–679.

14.

Brewer J and Hunter A (1989) Multimethod Research: A Synthesis of Styles. SAGE Publications.

15.

Bryman

(2006) Integrating quantitative and qualitative research: How is it done? Qualitative Research 6(1): 97–113.

16.

Curran

(2013) Big Data or ‘Big Ethnographic Data’? Positioning Big Data within the ethnographic space. Ethnographic Praxis in Industry Conference Proceedings 2013(1): 62–73.

17.

Due BL (2014) Ideudvikling. En multimodal tilgang til innovationens kreative faser. (Idea development: a multimodal approach). Samfundslitteratur.

18.

Due BL (2018a) Respecifying the information sheet: An interactional resource for decision-making at the optician. Jounral of Applied Linguistics and Professional Practice.

19.

Due BL (2018b) Passing Glasses: coordinating, co-constructing and collaborating at the optician. Social Interaction. Video-Based Studies of Human Sociality.

20.

Duranti

Goodwin

(1992) Rethinking Context: Language as an Interactive Phenomenon, Cambridge; New York: Cambridge University Press.

21.

Elgaard T (2017) The slalom method: Combining ethnographic field work and digital methods. In submission.

22.

Fauconnier

(1997) Mappings in Thought and Language, Cambridge, UK: Cambridge University Press.

23.

Fauconnier

(2001) Conceptual blending and analogy. In: Gentner D, Holyoak KJ and Kokinow BN (eds) The Analogical Mind: Perspectives from Cognitive Science, 255–286.

24.

Fauconnier

Turner

(1998) Conceptual integration networks. Cognitive Science 22(2): 133–187.

25.

Fauconnier

Turner

(2002) The Way We Think: A New Theory of How Ideas Happen, New York: Basic Books.

26.

Fauconnier G and Turner MB (2003) Polysemy and conceptual blending. Available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1346508 (accessed 16 April 2017).

27.

Ford

(2014) Big Data and Small: Collaborations between ethnographers and data scientists. Big Data & Society 1(2): 1–3. DOI: 10.1177/2053951714544337.

28.

Franklin

(1997) Calibration. Perspectives on Science 5(1): 31–80.

29.

Garfinkel H (1967) Studies in Ethnomethodology. Englewood Cliffs, NJ.

30.

Geertz

(1977) The Interpretation of Cultures, New York: Basic Books.

31.

Girardin F (2013, April 2) Insights from network data analysis that yield field observations. Available at: http://ethnographymatters.net/blog/2013/04/02/insights-from-network-data-analysis-that-yield-field-observations/.

32.

Goffman

(1964) The neglected situation. American Anthropologist 66(6): 133–136.

33.

Heath

Hindmarsh

Luff

(2010) Video in Qualitative Research, London, UK: Sage.

34.

Heritage

(1984) Garfinkel and Ethnomethodology, Cambridge, UK: Polity Press.

35.

Hougaard

(2005) Conceptual disintegration and blending in interactional sequences: A discussion of new phenomena, processes vs. products, and methodology. Journal of Pragmatics 37(10): 1653–1685.

36.

Hsu WF (2014) Digital Ethnography Toward Augmented Empiricism: A New Methodological Framework. Journal of Digital Humanities 3(1). Available at: http://journalofdigitalhumanities.org/3-1/digital-ethnography-toward-augmented-empiricism-by-wendy-hsu/.

37.

Huang

Tan

Maybank

et al. (2017) Guest editorial introduction to the special issue on large-scale video analytics for enhanced security: Algorithms and systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems 47(4): 589–592.

38.

Hutchins

(2005) Material anchors for conceptual blends. Journal of Pragmatics 37(10): 1555–1577.

39.

Jefferson G (2004) Glossary of transcript symbols with an introduction. In: Lerner GH (ed.), Conversation Analysis: Studies from the first generation. John Benjamins Publishing Co., pp. 13–31.

40.

King

(2014) Restructuring the social sciences: Reflections from Harvard’s institute for quantitative social science. PS: Political Science & Politics 47(1): 165–172.

41.

Kitchin

(2014) Big Data, new epistemologies and paradigm shifts. Big Data & Society 1(1): 1–12.

42.

Koestler A (1964) The act of creation.

43.

Latour

Jensen

Venturini

et al. (2012) The whole is always smaller than its parts’ a digital test of Gabriel Tarde’s Monads. British Journal of Sociology.

44.

Marres N (2013) What is digital sociology? CISP ONLINE Blog of the Centre for Invention & Social Process, Goldsmiths. Available at: http://www.csisponline.net/2013/01/21/what-is-digital-sociology/ (accessed 4 April 2017).

45.

Moerman

(1988) Talking Culture: Ethnography and Conversation Analysis, Philadelphia: University of Pennsylvania Press.

46.

Mondada L (2014) The local constitution of multimodal resources for social interaction. Journal of Pragmatics 65: 137–156. https://doi.org/10.1016/j.pragma.2014.04.00.

47.

Musalem

Olivares

Schilkrut

(2015) Retail in high definition: Using video analytics in Salesforce management. SSRN Electronic Journal. Available at: http://www.ssrn.com/abstract=2648334 (accessed 9 May 2017).

48.

Porway

(2013) You can’t just hack your way to social change. Harvard Business Review. Available at: https://hbr.org/2013/03/you-cant-just-hack-your-way-to (accessed 12 May 2017).

49.

Regazzoni

Cavallaro

et al. (2010) Video analytics for surveillance: Theory and practice [from the guest editors]. IEEE Signal Processing Magazine 27(5): 16–17.

50.

Ruppert

Law

Savage

(2013) Reassembling social science methods: The challenge of digital devices. Theory, Culture & Society 30(4): 22–46.

51.

Sacks

Schegloff

Jefferson

(1974) A simplest systematics for the organization of turn-taking for conversation. Language 50(4): 696–735.

52.

Schegloff

(1987) Between micro and micro: Contexts and other connections. In: Alexander

Giesen

Munch

et al. (eds) The Micro-Macro Link, Berkeley and Los Angeles: University of California Press, pp. 207–234.

53.

Schegloff

(1997) Whose text? Whose context? Discourse Society 8(2): 165–187.

54.

Stoller

(2013) Big Data, thick description and political expediency. Huffington Post. Available at: http://www.huffingtonpost.com/paul-stoller/big-data-thick-descrption_b_3450623.html (accessed 28 November 2016).

55.

Streeck

Goodwin

LeBaron

(2011) Embodied Interaction: Language and Body in the Material World, Cambridge, UK: Cambridge University Press.

56.

Venturini

Munk

Meunier

(2017) Data-sprints: A public approach to digital research. In: Lury

Clough

Chung

(eds) Interdisciplinary Methods, London and New York: Routledge, Available at: https://www.researchgate.net/publication/303017654_Data-Sprints_a_Public_Approach_to_Digital_Research (accessed 5 January 2017).

57.

Wang

(2013) Big Data needs thick data. Ethnography Matters. Available at: http://ethnographymatters.net/2013/05/13/big-data-needs-thick-data/ (accessed 8 August 2013).