Sage Journals: Discover world-class research

Abstract

Predictive uses of data are becoming widespread in institutional settings as actors seek to anticipate people and their activities. Predictive modeling is increasingly the subject of scholarly and public criticism. Less common, however, is scrutiny directed at the data that inform predictive models beyond concerns about homogenous training data or general epistemological critiques of data. In this paper, I draw from a qualitative case study set in higher education in the United States to investigate the making of data. Data analytics projects at universities have become more pervasive and intensive to better understand and anticipate undergraduate student bodies. Drawing from 12 months of ethnographic research at a large public university, I analyze the ways data personnel at the institution—data scientists, administrators, and programmers—sort student data into “attributes” and “behaviors,” where “attributes” are demographic data that students “can’t change.” “Behaviors,” in contrast, are data defined as reflective of what students can choose: attending and paying attention in class, studying on campus, among other data which personnel categorize as what students have control over. This discursive split enables the institution nudge students to make responsible choices according to behavior data that correlate with success in the predictive model. In discussing how personnel type, sort, stabilize, and nudge on behavior data, this paper examines the contingencies of data making processes and implications for the application of student data.

Keywords

Big Data institutions predictive modeling nudge higher education student data

Introduction

Be more white. Be more male. Be wealthier. Those are the biggest correlations with success. It’s terrible, but it’s the truth.

—Excerpt from interview with Don,¹ a university administrator

“The truth” was taken for granted among data scientists, administrators, and programmers in their work on predictive modeling using institutional data to anticipate whether or not a student would graduate within a four-year period. This observation came from Don, a long-time administrator at the university who is engaged in the application of nudges² derived from the predictive model’s outputs. Sitting across a table from me in the university’s student union, alert for eavesdroppers, he explained how he thought of such inequalities as common but mostly unspoken knowledge among faculty and administrators at the university. Over the course of my fieldwork, he explained to me and university stakeholders that the predictive model only drew from “behavior” data—what he described as “things students can change”—and not demographic markers like race, gender, and socioeconomic status. Don, in telling me “the truth,” suggested these demographic markers were accepted by stakeholders as not only immutable but also indicative of a student’s prospects for success.³

But despite the collectively understood disparities in success, a focus only on the “things students can change” serves to make demographic differences less obvious and integral to the university’s model of success. Through the sociotechnical obscuring of demographic data in predictive modeling and illumination of data that instead highlight behaviors that correlate with success, the likelihood of a student graduating in four years appears more contingent on those behaviors than demographic markers.

In this article, I explore an institutional shift from reliance on demographic data to what administrators, data scientists, and programmers at the university have constructed as behaviors. Universities have long had involvement in reshaping demographic categories (Hanel and Vione, 2016), not only through admissions processes that shape student bodies but also through research conducted in association with institutions that influence culturally held ideals about meritocracy (Warikoo, 2016). As universities take a data-driven approach to such ideals, they increasingly seek predictive power, more context for enrollees and applicants, inventive ways to produce knowledge about students, and methods of sidestepping contentious issues around inequality (Selwyn, 2015). In a departure from demographic data, which are solicited through self-reporting and traditional markers of a student’s category membership, personnel tout behaviors as a better, more neutral alternative. Automated data collection alleges more consistency, standard instruments for gathering, an expansive sample, and direct proxies. However, I argue that in practice the making of new data—behaviors—is no less fraught.

Making data

How do data become behaviors, and how do those behaviors come to take the place of demographic data in how the institution manages its student body? In this essay, I build on critical data studies scholarship (see Iliadis and Russo, 2016) by empirically demonstrating not solely that data are made but also how they are made. In doing so, I illustrate how actors at the university discursively render data into types such that demographic markers can seem removed from student success.

Drawing from qualitative research I conducted at a large public university in the United States, I argue that by revising predictive modeling and nudging to focus even more intensively on what they deem behaviors, data personnel—data scientists, administrators, and programmers—position students as subjects of nudges responsible for their own success. The institutional reframing of success in terms of “what students can change” enables the institution to transfer the burden of success away from itself and keep the tacitly held knowledge of inequality out of the university’s visions for predictive modeling.

To demonstrate how data personnel produce behavior data in a way that enables the institution to minimize the impact of demographic markers on success, I first provide a brief overview of my research site and predictive analytics in higher education in relation to a larger landscape, the literature that underpins my analysis, and my methodological approach. I then address the typing of data into behaviors and attributes, the sorting of data into behaviors, maintenance of behaviors as accurate proxies, and nudging of students on behaviors as processes that assist personnel in solidifying behavior data.

Research site

This qualitative case study draws from ethnographic fieldwork I conducted at a large public university in the United States across a 12-month period. Data personnel characterize the institution, which hosts roughly 30,000 undergraduate students, as at the forefront of applying institutional data to success initiatives. As an administrator told me, “no one’s ever done the data mining like we’re doing the data mining,” alluding to the university’s computational approach institutional research (IR), expansive data infrastructures, and vision for how data could revolutionize higher education. At the time of its foray into predictive modeling, the university was indeed unique for its repurposing of wireless network usage data as a proxy for students’ whereabouts. Moreover, it has digitized much of its institutional data. When I interviewed Henry, a data scientist, he got up to go to his bookshelf and returned with a massive binder, at least four inches thick. The binder held a printout of deidentified, aggregated student data, the precursor to what is now an online, interactive visualization.

M: Oh my god. Volume one of how many?

H: Uh, I want to say less than ten but more than two.

M: That’s a lot. That’s a lot of paper.

H: And they would do this every year. It would take them from the time the data was available until, like, December, just to be able to produce a book of this. Now, when the data’s available, that [interactive visualization is] updated the next day. So, like, that was the world that they were living in, so, like, when you’re living in that world, you don’t have time to do the advanced stuff.

To be able to “do the advanced stuff,” the university’s multiple information technology (IT) and IR offices worked together on collecting and sorting data, especially computationally. Solicited by university admissions through the application process, demographic data, which include data such as high school ZIP code, parental income, race, ethnicity, and gender, are readily available for IR to incorporate into its studies of the institution and student body. While demographic data have long factored into many aspects of IR, the university has more recently sought other sources of data for quantifying and understanding its student body. As it moves ahead in its foray into academic predictive analytics, wherein it uses Big Data to predict outcomes, from a student’s likelihood at graduation within a four-year period to a student’s prospects at getting an A or B in an individual class, the university has moved away from demographic data into what data personnel refer to as behavior data.

Predictive analytics: From higher education to a broader landscape

The deployment of student data in higher education in the U.S. is widespread and under varying degrees of scrutiny, from the College Board’s short-lived “Adversity Score”⁴ to growing concerns about student privacy and dataveillance on campuses (Selwyn, 2015). The datafication of universities and surge of predictive analytics projects prompt questions about the future of higher education and what roles universities play in society.

Higher education is one area in which predictive uses of Big Data are gaining momentum. While personnel who worked most closely with student thought their work was exceptional among what they regarded as controversial applications of data, such as predictive policing, predictive analytics in higher education is nonetheless situated within a broader landscape. For example, during my fieldwork I was chatting with the director of IT at the university in an elevator. After hearing more about my research, he immediately asked me about Black Mirror, the speculative Channel 4/Netflix series exploring techno-dystopic futures, and specifically referred to an episode about social credit systems.⁵ When we met more formally for an interview, he directly compared data at the university to intensive, artificial intelligence-driven data collection in healthcare and precision medicine, noting what he deemed “the promise and the threat, that massive quantities of data will “reveal things to you that you don’t know about yourself,” both to a person’s benefit and detriment.

While medicine was his immediate reference, the university’s development of nudges was at its height during revelations about the analytics firm Cambridge Analytica using data problematically mined from Facebook users to target prospective voters. Beyond the manipulation of politics, John Cheney-Lippold (2011, 2017) has identified the reordering of people according to their social media data as a “new algorithmic identity,” in which data enable new ways to organize people, based on an affinity of clicks as opposed to traditional social categories.

Data that correspond with people are ample and are mobilized by institutions through predictive mechanisms. “Data doubles,” or data that stand in for people in systems and analytics, frequently outperform them (Haggerty and Ericson, 2000; Raley, 2013). Gavin JD Smith (2016), David Lyon (2003), and Cathy O’Neil (2016) have all pointed out the potential destructiveness of proxies and data doubles when treated as the people they represent: denied loans, extended sentences, increased insurance rates, and lost job opportunities, to name a few. Wendy Hui Kyong Chun (2018) writes that these data “absolve one of responsibility … by creating new dependencies and relations” in standing in for what is unknown or inaccessible. It matters which data become doubles, and how those data become data in the first place. This problem is evident in research and investigative reporting on predictive policing and risk assessment, in which police departments take past events as predictive of future activity (Brayne, 2017; Selbst, 2017), or where algorithmically calculated scores are meant to indicate the likelihood of recidivism (Angwin et al., 2016; Benjamin, 2019). The translation of data into predictions is not solely algorithmic; it is also wrapped up in structural inequalities and notions about what society is and ought to be (Eubanks, 2018). And so while the work data personnel in universities do is not the same as predictive policing, it occupies a similar imaginary⁶ in which what people will do is both possible to anticipate and open to intervention.

Relevant literature

States and institutions have long sought to account for and predict people and activities within them via data collection (Scott, 1998). Data enable auditing practices (Power, 1997; Strathern, 2000), the quantification of people (Bouk, 2015; Desrosières, 1998), and the calculation of risk (Hacking, 1990; Harcourt, 2007). Quantification, especially commensuration, is constitutive of people: quantification, as a social act, makes what it purports to represent (Espeland and Stevens, 1998, 2008). As scholars in science and technology studies (STS) have demonstrated, measuring and predicting are fraught social processes requiring investment and validation through institutional politics and infrastructures (Porter, 1995; Star, 1999; Star and Ruhleder, 1996). Processes of making data and their infrastructures are frequently subject to what Susan Leigh Star (1991) has called “deletion,” or the invisibilizing of labor in scientific work. Such deletions have been at the fore of research on scientific and technological practice and related institutions in STS, though are less present in inquiries into higher education.

Scholars addressing Big Data in educational contexts have largely explored its possibilities, testing out in-classroom technologies, courses scaled up to enroll unlimited students (such as massive open online courses, or MOOCs) (see Jones et al., 2014), and predictive uses of data collected from learning management systems (e.g. Blackboard, Canvas). As George Siemens reports in mapping the field, learning analytics is the use of data to improve learning (2013: 1382). While learning can refer to any variety of educational settings, learning analytics has expanded rapidly in higher education, where such technologies are used by universities to manage risk and understand student bodies (Wagner and Longanecker, 2016).

Such projects, which typically draw from third-party consultants or are developed in-house, have received scrutiny as critics express concerns about universities surveilling students (Harwell, 2019; see also Hope, 2016). While much of the learning analytics literature explores the significance and effectiveness of specific learning analytics initiatives, more recently scholars such as Neil Selwyn have argued that “learning analytics needs to be critiqued as much as possible,” given the potential to disparately impact students (2019: 11).

And learning analytics is critiqued. Scholars are addressing effects on student data privacy (see Ifenthaler and Schumacher, 2016; Rubel and Jones, 2016; Slade and Prinsloo, 2014; Sun et al., 2019). Juliane Jarke and Andreas Breiter (2019) discuss how education is changing with datafication, and Ben Williamson (2017, 2018, 2019) has written extensively about the implications of large-scale data collection on students, both in and outside of higher education. Other education scholars have interrogated the ethics of learning analytics (Johnson, 2014; Slade and Prinsloo, 2013) and prospects for just approaches (Shahar and Harel, 2017).

Data analytics projects like that I discuss deploy nudges in tandem with predictive outputs to suggest to students how they can improve their graduation outcomes and grade point averages (GPA). The nudging that personnel use is aligned with Richard H Thaler and Cass R Sunstein’s outline of nudges and “choice architecture,” in which architects structure the “context in which people make decisions” to “nudge” them toward particular choices (2009: 3). Thaler and Sunstein frame nudging as “libertarian paternalism,” where people are ultimately capable of making their own choices—a nudge is a mild intervention. However, Karen Yeung argues that in contexts of Big Data, the array of data and analytics dynamically available to choice architects means that nudging is “subtle, unobtrusive, yet extraordinarily powerful” thanks to the magnitude and networks of data (2017: 119).

Some of the literature about learning analytics offers strategies for how to more productively nudge students, in which students are framed not just as consumers but also active partners at universities who should be accountable for their own success (Fritz, 2017; see also Pascarella and Terenzini, 2005). The notion of choice architecture in learning analytics rests on conceptualizations of agency where students have unrestricted access to a full range of choices. This take on agency is in contrast to social theorizing on agency, in which actors work within and against constraints (see Bourdieu, 1980; Ortner, 2006).

Some education scholars have commented on the contradictions of deploying nudges in relation to more liberal views of the purpose of education (see Clayton and Halliday, 2017; Hartman-Caverly, 2019). Jeremy Knox et al. explore a growing trend of educational institutions integrating datafied behavioral economics approaches. They remark on the implications of “[shaping] students’ choices and decisions based on constant tracking and predicting of their behaviors, emotions and actions,” noting the potential for disparate impacts (2020: 39).

Some of the appeal of Big Data, and why, perhaps, it links up so well with the surge of behavioral economics in education, is rooted in pervasive and influential “mythologies” of data as truthful and omniscient, which critical data studies scholars have challenged, recognizing data as partial and always already political (Boyd and Crawford, 2012; Dalton and Thatcher, 2014). The promise of data is evident in institutional data mining projects that endeavor to take the place of self-reporting: data personnel understand data as more direct proxies, comprehensive and accurate, or as Rob Kitchin and Tracey P Lauriault put it, “exhaustive in scope” and “fine-grained in resolution” (2014: 2). The presumed neutrality of data enables them to seem prior to interpretation, an incredible, “raw” resource that can reveal insights about humanity (Boellstorff, 2013).

But data must be made. They do not exist as prior to processing. Lisa Gitelman and Virginia Jackson (2013: 3) write that “data need to be imagined as data to exist and function as such.” As I discuss herein, the discursive work involved in creating data is ongoing and layered; it relies on a great deal of labor and transformation. Nonetheless, data are treated by personnel as a stable, bounded entity, not unlike how the engineers in Diana Forsythe’s (2001) work regarded knowledge in programming expert systems. The ways that personnel imagine behaviors and attributes materialize as data, and in turn those data shape how personnel produce and use those categories. Technologies, as materialized discourses that reflect broader social epistemologies, naturalize and crystalize concepts (Suchman, 2007). In the case of data collection, technologies create the categories of people and activities they purport to measure, making them manageable (Foucault, 1972, 1977).

Societal discourses of data draw upon mythologies of data and so seem like a neutral means of revealing order intrinsic to society, although social theorists have demonstrated that ordering processes are a means through which actors make society (Bowker and Star, 1999; Jasanoff, 2004; Latour, 1990). In data technologies, ordering processes make the subjects of ordering ready to be taken up in a system, scaled, standardized, predicted, and nudged (Cheney-Lippold, 2011; Raley, 2013; Stark, 2018). I take the conditions of ordering in the form of discourse as a fruitful focal point to look at how data personnel as actors give shape to data: how they make sense of the institution, their social contexts, and their ideas about data are part of the data technologies they design and implement.

Methods

In this qualitative case study, I used a combination of interviewing and participant observation in university IT and IR offices in which personnel render students into data. I interviewed 30 data personnel using semi-structured techniques in interviews lasting 60–90 minutes, and I conducted follow-up interviews with five key interlocutors who worked most closely with deploying the model and constructing nudges (Bernard, 2011). These personnel primarily included data scientists, developers, and IT administrators, but also network architects and stakeholders involved in developing predictive outputs for students.

Much of the participant observation of my fieldwork took place in meetings. Meetings covered a range of topics, from monthly development updates to explanations of technical details of the predictive model to workshopping nudges to debates about what data mean. Meetings were places where multiple teams came together, data scientists painstakingly explained the mechanics of modeling or qualified results, programmers explained why they arrived at a particular form of nudging, and administrators nixed nudges and passed along institutional memories of data sources. In these spaces, personnel discursively challenge and solidify not only the technical dimensions of modeling but also the data that inform it (see Brown et al., 2017; Sandler and Thedvall, 2017). The constraints and limits personnel face become evident in such spaces, where their ideas are curbed by the top-down vision of the current university administration or where they must execute a stage of development over which they are not in total agreement owing to rapidly approaching deadlines and desire to receive the approval of stakeholders. Their institutional entanglements operate as a check on what they understand as choices available to them.

Although this article is informed by participant observation, I mostly utilize interviews in my analysis because they function as a central space for actors to map out a sociotechnical imaginary of data technologies at the university. In interviews, actors articulate their work and their visions for predictive projects so that modeling is integrated into such an imaginary, in what Sheila Jasanoff has described as “collectively held, institutionally stabilized, and publicly performed visions of desirable futures” (2015: 4; see also Jasanoff and Kim, 2009). The top-down discursive organizing of data that occurs before, during, and after modeling, especially in the context of interviewing in which personnel are asked to provide an account of modeling and nudging, is critical to the formation of an imaginary of predictive technologies. As Nick Seaver (2017: 8) has observed in his ethnographic approach to algorithms, “interviews do not extract people from the flow of everyday life, but are rather part of it.” Interviews enable personnel to imagine the concepts on which their projects hinge.

I transcribed and coded interviews and field notes in NVivo, a qualitative analysis environment. As Foucault (1972, 1977, 1978) has elucidated, discursive practices make the categories they describe, rendering them measurable, governable, and here, nudgeable. I identified where personnel defined data, explained to me or to each other what a data type meant to them, or decided which data could function as proxies for students. I focused on moments in interviews in which personnel speak, define, and sort behavior data into a fixed category (see Wood and Kroger, 2000).

This analysis illuminates personnel’s implicit and explicit delineations about what data are and what they represent, along with how personnel thought data ought to be classified (Strauss, 2005). Interviewing, transcribing, and coding all helped me to make sense of the conceptual work involved in handling institutional data. Because I began my fieldwork well after modeling began, interviews helped me to reconstruct narratives of decision-making about data sources, modeling, and nudging.

I have structured my findings to reflect a chronology of data processing. However, because some of this work occurs simultaneously, I also conceptually order findings, layering them on top of a foundational concern with demographic data and an imperative to nudge.

I begin with the problem of removing demographic data from modeling, which prompted personnel to think about data in terms of types (i.e. attributes and behaviors). I then explore the work of sorting the available data at the institution into a category of behaviors and assigning proxies. By maintaining data as accurate proxies, personnel help behavior data begin to hold together. Finally, the solidification of a category of behaviors enables personnel to nudge students. I conclude by discussing the implications of making institutional data.

Typing data into “attributes” and “behaviors”

The typing of data was the result of conscious attempts from data personnel to nudge students not only effectively but also fairly. In one of my first interviews with Don, I sat in his office, across from him again over his cluttered desk, and asked him to recount some of the early decision-making in model development. He had been involved in the original design of the model and determining which data should be incorporated into it. Don summed up one of the key decisions regarding data:

And what we found initially was that all the standard things that you would guess correlate with student success that students can’t change were the big drivers: race, gender, ethnicity, socioeconomic status, what high school they came from, certain kinds of grades, whatever. Well, students can’t do anything about any of that. So, the idea was to take a look and see, well, is there other stuff that seems to correlate.

The notion of “behaviors” emerged from efforts by personnel to avoid utilizing traditional indicators of success and instead look at other data that could be predictive of graduation within a four-year period.

When data personnel explain what goes into the predictive model, they divide the data neatly into two major categories: “attributes” and “behaviors.” They define attributes as fixed categories, the “standard things” that students “can’t change.” These categories are made up of demographic data, where data on parental income and high school ZIP code are indicators of socioeconomic status. Universities collect data on race, ethnicity, and gender in standardized forms, whether through college applications or through reporting mechanisms in university systems. Data personnel treat attributes as outside of the model’s purview because while they correlate with graduating in four years, they are not actionable. For example, personnel cited that a student cannot retroactively attend a different high school. Moreover, while a student could transition while in college and change gender markers in university systems or might experience socioeconomic mobility, data personnel would not construct nudges to instruct them to do so. Therefore, data personnel regard attributes as off limits in making recommendations, and they were quick to assure me that they would never do such a thing.

By drawing boundaries around attributes, data personnel attempt to seal them off and open up other types of data for usage. The discursive and computational effects of relegating some data as attributes are that those data, and the students who provide them, become stable entities. That is, by treating demographic data as attributes that are frozen—everything a student “can’t change”—personnel remove those data from an ongoing conversation about what they can use in the model. The differing experiences students have on campus that interlock with their race and socioeconomic status, for example, are no longer part of data projects because personnel define them as fixed. Computationally, when some data become attributes, data scientists no longer include them in the predictive model: demographic data, characterized as attributes, do not factor into calculating the likelihood of graduation in four years.

The effect of framing some data as attributes that are off limits for nudging is not that they are permanent, but instead that personnel cannot nudge students to change them. Will, an administrator who helped to develop the model, explained to me that the sidelining of attributes prompted data personnel to look for other factors involved in success at the university:

So, since we’re pulling in all this data at the same time and dropping it into the algorithm, obviously there are a number of things that are highly predictive of student success on campus. Their academic preparation before they come into campus. Their GPA while they’re at [the university] obviously is highly predictive. Socioeconomic status things. Demographic markers. But they’re all things that either because it’s too late in the game, we can’t tell a student, “Boy, it would have been great if you would have studied harder in high school.” And we certainly can’t tell a student on a demographic or socioeconomic thing, we can’t say, “Hey, it’d be good if you weren’t so poor.” There’s nothing a student can do with that. Even though it does put ‘em in a higher risk category. So we took those things that were malleable by the students. Things like, how much time they were spending on campus. Whether they were a proxy for whether we believed they were paying attention in class by how much data they were downloading in a class.

Instead of nudging on attributes, data personnel have constructed nudges about “things that were malleable” and that the university collects and monitors. For example, while a student’s past grades do factor into predicting how a student will perform in an individual course in terms of calculating the likelihood of getting an A or B, those “things” are not malleable because it is in the past. In nudging, personnel instead focus on correlations between what is malleable and GPA or getting an A or B in a course. In the model, such variables include engagement with learning management systems and class attendance (derived from network activity), where more posting on discussion boards and downloading of course materials correlate with higher GPAs.

Malleability, however, is not a given: it has to be translated into a behavior. Will refers to data downloaded in class as a potential proxy for paying attention, where the data on downloading are available for data personnel to match up with a behavior. The question is not if downloading data is a proxy but rather if it is a reasonable proxy for paying attention. The data available for modeling predate the model itself. Data on students’ downloading habits in class were originally collected for the maintenance of network infrastructures, but personnel have repurposed them as a proxy for attention. Data on downloading were not always behavior data.

Massive amounts of data are available to data personnel, and it is not self-evident what is a proxy for what, nor was it apparent to me if a hard line between attributes and behaviors existed for personnel. I asked Don how data personnel went about distinguishing between attributes and behaviors. He first depicted behaviors as what was left after attributes were removed:

We tried to be blind to all of those, and only look at behaviors. Only look at different numbers we had that were indicators of behavior. Behavior could be grades you made in your prior classes, here at [the university]. It could be how many credit hours you’re taking, it could be where you’re living, it could be, any of these things that you have control over, we’ve just clumped them all into the behaviors bin.

When I asked how he and other personnel decided what students had control over, he explained that the distinction came down to choice:

I guess that we assume that what [students] did in the course of the day, they had control over. Right, so they chose whether they were gonna eat or not … they chose the gym or not, being on campus or not … They chose living where they chose to live. I think they have some say in that … So it seemed to me that any time that they had an opportunity to make a decision about what they were going to be doing, we called that a behavior.

While it may seem obvious that “behavior” indicates what a student does or does not do, the invocation of choice and what a student “has control over” is central to nudging. By regarding behaviors as always dynamic, personnel minimize—even unintentionally—the constraints that students face through assuming that students have an enormous amount of agency in structuring everyday life.

The focus on behaviors defines students both as radical agents and as nudgeable subjects who would benefit from behavioral recommendations based on their data, which include spending more time on campus, attending class, attending supplemental instruction sessions, and registering for courses earlier. Predictive outputs are meant to be engaged with, not just observed.

The implication that students can act on predictions and improve their prospects for success is contested among data personnel at the university. In general, personnel, particularly those who worked closely with students, wanted nudges to have an encouraging vibe that motivated students to act on nudges and incorporate behavioral changes into their daily lives. However, data scientists in team meetings cautioned against giving students false hope. They argued that the likelihoods they modeled are accurate enough that spending more time on campus would not impact the predicted outcome enough to make a substantial difference in the space of a semester. While personnel are not in agreement about the potential effects of nudging and some are torn about its utility, the notion remains that students are responsible for their outcomes.

Sorting data: Assigning data to categories

Data do not automatically fall into categories of attributes and behaviors; rather, they are assigned and are products of discursive moves. As I discovered, the types of data that comprise a student body are multiple, as are their uses and sources. They serve several institutional offices simultaneously and outlive the original intentions behind them.

As a way to demonstrate the array and possibilities for sorting, I arrange types of data in loose sets in Table 1. The data I include in the table are general categories that I have derived from interviews, documentation from an external review, and administrators’ conference presentations about the model. The table lists data that the model does not incorporate, such as demographic data, but I add such data to show how the kinds of data that data scientists have de-siloed are put in conversation with other data sources.

Table 1.

Arrangements of data incorporated into initial data mining and predictive modeling.

Self-reported	Infrastructural	Accumulated
• Gender• “Race/ethnicity” • Parental income• Standardized test scores• High school ZIP code• High school GPA	• Network logs (device geolocation, timestamps, duration) • Network activity• Card swipes (dorm access, dining halls, gym)	• GPA• Enrollment (registration status, courses, grades) • Degree plan• Learning management systems• Historic grade distribution

ZIP: zone improvement plan; GPA: grade point average.

The assignment of meaning to data in the model, while not arbitrary, is not strictly linked to data sources. That is, the data could align with other interpretations and proxies, and personnel indeed mobilize them for purposes other than their initial use. In my table, I have created three columns and labels to reframe data in terms of how the institution makes and collects them rather than what personnel either offer up as attributes and behaviors or large, unsorted lists of variables decontextualized from sources. I use “self-reported” to describe data that students provide to the institution, typically through the college application process or in campus systems. The data I organize in “infrastructural” data are data created through the everyday operations of the university. Finally, I use “accumulated” as a group for data that students generate as they move through the university in terms of enrollment, grades, and coursework.⁷ The table is not exhaustive; rather, I aim to depict that data at the university are multiple and extensive.

The primary data that personnel position as indicative of behavior are network logs, which personnel use because they describe them as the best available proxy for behavior. For this, data personnel have repurposed data originally collected by IT to monitor the health and usage of campus WiFi networks. Network logs contain data about time, date, and duration of a student’s use of the WiFi network, along with which routers they connect to and some general information about browsing activity. Because students must log in to the WiFi network using unique accounts administered by the university, they are associated with their WiFi use.

In an office similar to Don’s but in the IR office, I asked Jenny, an administrator involved with data governance at the university, how she and personnel decided to use network logs as a proxy for attendance. She explained how personnel came to use network data:

And how we ended up on network logs, you know, it’s just having the right people that are thinking, you know, probably someone picked up their phone and was like, “Hey, I just connected to the WiFi, right.” It’s like, oh, yeah! The WiFi, right. If we want to make the model better, what kind of data, when you think about behaviors, would you want to include, and then you just start thinking, how might you get that data.

In this example, Jenny maps out the steps for me to describe how the development team decided to use network logs. The team wanted data on class attendance, but they could not get it because not all faculty take attendance in their classes or in consistent formats. In the absence of data collected specifically to track attendance, network logs seemed to the development team like a reasonable substitute in that logs can indicate where a student’s device is. Network logs became a proxy for a student’s physical presence on campus because personnel appropriated data the institution was already collecting to serve as more than indicators of network traffic and reliability.

Will formulated the leap from attendance to networks differently, recalling his early involvement in institutional data collection. He talked about moving from surveys to Big Data. To him surveys were a problematic proxy for campus engagement.

And you give [a survey] to the students at the end of the year and…it would measure how integrated you were to the campus and what your commitment was to it. Or we’d use the NSSE survey, which is the National Survey of Student Engagement, where you’d say, like, “Over the last semester, on average, how many hours a week did you study? How many hours a week did you meet with professors outside of class? How many hours a week did you meet with your peers outside of class?” Those kinds of things. And these were relying on self-report surveys, often after the fact, to measure that level of engagement and integration. And what we provided, in the [model], was nope, here’s an actual behavioral marker where we can truly see how much time a student spent on campus.

In Will’s description, network logs are behavioral markers: they are not only where a student is but also what a student does. That rendering of network logs as a proxy for behavior makes the logs a proxy for the student. Will makes a discursive move that prioritizes automated data collection over self-reported surveys. Notably, in his description, surveys are “after the fact,” while automated data collection is live and thus does not pull from unreliable memories or potentially amended self-reporting responses. Moreover, network logs are “an actual behavioral marker where we can truly see.”

The allure of Big Data as a replacement for surveys is that the interpretation is invisible, so smoothly deployed that data are the behaviors. Even as personnel talk about proxies and readily acknowledge that network logs unevenly record data owing to outages, network overcrowding, and poor connections, the discursive framing and subsequent treatment of data as “actual” enable data to start to hold together as behaviors. Students become data and their data possess more veracity than the students themselves.

Maintaining data as accurate proxies

The assignment of behaviors to data, and vice versa, requires investment. While I conducted my fieldwork, I had access to the campus WiFi network and through an interface with the predictive model, could see a visualization of my network logs. I kept a personal account of my campus whereabouts and compared it against the network logs. I consistently found chunks of missing time, incorrect geolocations, and overall an inaccurate picture of my time on campus.

I brought the disparity in data up to data personnel, who were either intrigued or unsurprised depending on their proximity to working with the data. Some even joked with me about how their own network logs made it look like they were never at work. Personnel know that network logs are not a neat substitute for the time a student spends on campus. Network outages aside, competition for network access and brevity of connection might prevent a student’s device from registering. Moreover, connecting in the first place requires a student to have a WiFi-enabled device. Missing data abound, for many reasons.

Nonetheless, personnel at the university maintain network logs as an inventive and actual proxy for behavior. I asked Henry, a data scientist working on the predictive model, about absent data, citing my network logs, and he explained that personnel have to move forward without those data:

If stuff’s missing, I mean, there’s nothing you can do about it. You just have to hope, and in most cases this is the case, that there is a uniformity to it. So either the whole day is missing, that’ll sometimes happen. That’s fine. Because there’s enough of the data to pick up the slack there … Most of the variables that I’ve made deal with that elegantly … If I just don’t have a certain amount of the data, I don’t say that there was a class session at all. I just say there wasn’t one. So, for a percentage of absences, like, it’s not going to affect it at all. Otherwise, for missing data, the hope is that it’s sufficiently random that for any machine learning purpose, it will not matter that it is missing. Because anything that is sufficiently small and random won’t have an impact on the prediction. That may or may not be true, but it’s an assumption that we have to make because we don’t have a lot of choice.

In Henry’s explanation, some proxies are maintained out of necessity. Discrepancies have a technical solution that can maintain the integrity of the overall project. The alternative is that if the proxy falls apart, the model becomes compromised. While missing data suggest that institutional Big Data are not consistently complete or comprehensive, data scientists suggested that those data do not matter. Nick, another data scientist, responded similarly when I asked him about missing data, he said that as long as the sample is present, the missing data have “minimal effect” on the predictive outputs. Technically, the missing data may not influence modeling outputs. However, I include missing data because of Henry’s explanation that “it’s an assumption” crucial to modeling. As becomes clear in the way that personnel act on proxies, the entire intervention the institution tries to make with the model rests on consistent support for data as proxies for behavior.

Nudging on “behaviors” and promoting self-regulation among students

The discursive separation of attributes and behaviors allows for personnel to treat behaviors as actionable, and it enables a more data-driven form of institutional management of students. By making something called behavior and making it legible through data proxies that minimize gaps between data and what they purport to measure, the institution can track students and monitor them. Ultimately, the university uses these data to formulate a narrative of success that maps onto behavior. Following such moves, the institution can relocate where the possibilities for success reside.

Through the assignment of behaviors to data, the category of behaviors holds together. The formation of a category of behaviors allows students to become subjects of modeling and nudging. By reconceptualizing network logs and activity as behavior, personnel effectively produce behavior that they can nudge, whereas before the deployment of the predictive model, the university could not nudge students based on what were primarily demographic data, now it can via behaviors.

In this case, the institution mobilizes behaviors as a means to encourage students to self-regulate and make responsible choices that move them toward graduation. Owing to the discursive maneuvering enacted by personnel, data classified as behaviors become directly tied to students and their activities, as illustrated by Will’s description of data as “an actual behavioral marker.” Akin to how Wendy Nelson Espeland and Mitchell Stevens (1998) have discussed quantification as similar to a speech act, data as a metric for behavior become student behaviors.

Correlations, even as personnel insist they are just that, seem causal in nudges because of the implication that students could improve their likelihoods of success by adhering to particular behaviors. The model of a student presented in nudging is one who attends all classes, spends time on campus outside of class, engages with student organizations, does not browse the internet in class, visits office hours, and so on. If students want to succeed, they should match the model and attune their choices to the behaviors that correlate with success. In an automated manner, the model and related nudging reflect student data onto students, signaling that they are continuously documented and brought into a system of recommendations.

Because attributes are removed from the model and nudging, the reliance on behaviors suggests that students’ choices are at the heart of their success at the institution. Because demographic data are not incorporated into the predictive model at all, success is linked with behaviors and students’ choices. The purposeful presentation of data to students encourages students to internalize those data and act on them. As such, responsibility now rests on the students to take hold of their success. The visualizations of certain kinds of data—namely data students ought to use to inform their everyday decision-making—and obscuring of demographic data place the burden of responsibility and success on students. By minimizing the role that race, class, and gender play on graduation outcomes, the institution, through the model, can present behaviors as major factors in the likelihood of a student graduating within four years. If students do not attend class, a low GPA is a consequence of that decision.

Thus, the constraints around choices become invisible. The university and its existing inequalities start to vanish because success is placed in the hands of students. Social climate problems, structural barriers, issues of belongingness, and resource shortages disappear. A student cannot cite external factors in this model of success dominated by behaviors. The result is a shift in a locus of responsibility, wherein nudging is meant to give students tools to manage themselves and regulate their own behavior based on insights they ought to draw from their data.

Conclusion

One of the aims of this article is to demonstrate how data become behaviors, based on a collective decision to avoid attributes in modeling and nudging, those fixed demographic markers that “can’t change.” The remaining data are then sorted by personnel into behaviors based on what is available and fits within a mold of what personnel understand students to “have control over.” Personnel stabilize data as accurate proxies by explaining and accounting for inconsistencies. Data further solidify as personnel make them act through nudges that direct students to match behaviors correlated with success. The discursive separation of behaviors and attributes allows the university to pin success on behaviors, or everything that a student appears to have a choice in. At a time when the use of demographic data in higher education is ever fraught, behavior data seem full of promise to universities as a step toward the meritocratic ideals of higher education. However, as I have shown, behavior data are not a neutral alternative, and what students “have control over” is not self-evident.

As I sat in his noisy, shared office, Nick, one of the data scientists, responded to my question about where he thought predictive modeling in higher education was headed with an answer about what he felt was more pressing. He thought that nudging had a long way to go, and that the information in nudging needed to be more useful to students.

My theoretical framework is a very traditional economical perspective which could be way wrong … but it doesn’t mean that it’s not useful. The theory is that every person chooses optimum behavior for him or herself, based on the constraints, his ability, his resources, his information. He cannot do anything about the first and second, but he maybe can do something about the third.

The purpose of modeling and nudging is to give students that information so they can “choose optimum behavior.” But when both the information and the behaviors are more contingent and rooted in the imaginative discourse of personnel than in inherent differences in data, how should students relate to their data?

The use of behavior data to nudge students and inspire a regime of self-regulation prompts questions about behavior data not only as a more accurate substitute for demographic data but also as a source of knowledge about students. As this article demonstrates, making behavior data is a contingent process. Data proxies and data doubles do not acquire form naturally. They must be made, and they are made by actors under institutional constraints and imaginaries alike.

The other aim of this article is to identify processes of making and sorting data as problem spaces. If data, students, and behaviors are imprecisely matched and subject to institutional pressures and sociotechnical limitations, how should those data be understood, especially in the context of nudging where actors put them into play? To refer back to the emergence of what Cheney-Lippold (2011, 2017) has called the “new algorithmic identity,” in which what people do is more representative of who they are than more traditional concepts of social categories, what happens when behavior data are just as fraught as demographic data?

As predictive analytics become more widespread in areas beyond education, such as policing and sentencing, finance, healthcare and medicine, and social media, it is necessary to illuminate the data that underpin predictions, not just technically but sociotechnically. The everyday, frequently mundane processes that make data, proxies, and data doubles hold together are subject to deletion, which allows for the linkages between data and people to appear seamless. They are not. Understanding how those data come together and how they stabilize is a continuing task for critical data studies, and one that I explore in the context of higher education.

Footnotes

Acknowledgements

Sincere thanks to Zoe Nyssa,Evelyn Blackwood,Kendall Roark,and Geneva Smith for reading previous drafts and offering insightful feedback. Earlier iterations of the manuscript benefitted tremendously from comments from Sheila Jasanoff and engagement with Fellows in the Program on Science,Technology and Society at Harvard University. I am grateful for fellowship support from Purdue University. I also wish to thank the three anonymous reviewers for their generous and constructive comments.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) received no financial support for the research,authorship,and/or publication of this article.

Notes

ORCID iD

Madisson Whitman

References

Angwin

Larson

Mattu

, et al. (2016) Machine bias. ProPublica, 23 May. Available at: www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing/ (accessed 1 October 2019).

Belkin

(2019) SAT to give students ‘adversity score’ to capture social and economic background. The Wall Street Journal, 16 May. Available at: www.wsj.com/articles/sat-to-give-students-adversity-score-to-capture-social-and-economic-background-11557999000/ (accessed 1 October 2019).

Benjamin

(2019) Race after Technology: Abolitionist Tools for the New Jim Code. Cambridge: Polity Press.

Bernard

(2011) Research Methods in Anthropology: Qualitative and Quantitative Approaches. Lanham: AltaMira Press.

Boellstorff

(2013) Making big data, in theory. First Monday 18(10). Available at: https://firstmonday.org/ojs/index.php/fm/article/view/4869/3750/ (accessed 31 May 2020).

Bouk

(2015) How Our Days Became Numbered: Risk and the Rise of the Statistical Individual. Chicago: The University of Chicago Press.

Bourdieu

(1980) 1992. The Logic of Practice. Stanford, CA: Stanford University Press.

Bowker

and Star

(1999) Sorting Things out: Classification and Its Consequences. Cambridge: The MIT Press.

Boyd

and Crawford

(2012) Critical questions for big data. Information, Communication & Society 15(5): 662–679.

10.

Brayne

(2017) Big data surveillance: The case of policing. American Sociological Review 82(5): 977–1008.

11.

Brown

Reed

and Yarrow

(2017) Introduction: Towards an ethnography of meeting. Journal of the Royal Anthropological Institute 23(S1): 10–26.

12.

Cheney-Lippold

(2011) A new algorithmic identity: Soft biopolitics and the modulation of control. Theory, Culture & Society 28(6): 164–181.

13.

Cheney-Lippold

(2017) We Are Data: Algorithms and the Making of Our Digital Selves. New York: New York University Press.

14.

Chun

WHK

(2018) On patterns and proxies, or the perils of reconstructing the unknown. E-flux. Available at: www.e-flux.com/architecture/accumulation/212275/on-patterns-and-proxies/ (accessed 1 October 2019).

15.

Clayton

and Halliday

(2017) Big data and the liberal conception of education. Theory and Research in Education 15(3): 290–305.

16.

Dalton

and Thatcher

(2014) What does a critical data studies look like, and why do we care? Seven points for a critical approach to ‘big data’. Society & Space. Available at: https://www.societyandspace.org/articles/what-does-a-critical-data-studies-look-like-and-why-do-we-care/ (accessed 31 May 2020).

17.

Desrosières

(1998) The Politics of Large Numbers: A History of Statistical Reasoning. Cambridge: Harvard University Press.

18.

Espeland

and Stevens

(1998) Commensuration as a social process. Annual Review of Sociology 24(1): 313–343.

19.

Espeland

and Stevens

(2008) A sociology of quantification. European Journal of Sociology 49(3): 401–436.

20.

Eubanks (2018) Automated Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. New York: St. Martin’s Press.

21.

Forsythe

(2001) Studying Those Who Study Us. Stanford: Stanford University Press.

22.

Foucault

(1972) The Archaeology of Knowledge. London: Routledge.

23.

Foucault

(1977) Discipline and Punish: The Birth of the Prison. New York: Vintage Books.

24.

Foucault

(1978) The History of Sexuality, Vol. 1. New York: Vintage Books.

25.

Fritz

(2017) Using analytics to nudge student responsibility for learning. New Directions for Higher Education 179: 65–75.

26.

Gitelman

and Jackson

(2013) Introduction. In: Gitelman

(ed.) ‘Raw Data’ is an Oxymoron. Cambridge: The MIT Press, pp.1–14.

27.

Hacking

(1990) The Taming of Chance. Cambridge: Cambridge University Press.

28.

Haggerty

and Ericson

(2000) The surveillant assemblage. The British Journal of Sociology 51(4): 605–622.

29.

Hanel

PHP

and Vione

(2016) Do student samples provide an accurate estimate of the general public? PloS One 11(12): 1–10.

30.

Harcourt

(2007) Against Prediction: Profiling, Policing, and Punishing in an Actuarial Age. Chicago: The University of Chicago Press.

31.

Hartman-Caverly

(2019) Human nature is not a machine: On liberty, attention engineering, and learning analytics. Library Trends 68(1): 24–53. DOI: 10.1353/lib.2019.0029.

32.

Harwell

(2019) Colleges are turning students’ phones into surveillance machines, tracking the locations of hundreds of thousands. The Washington Post, 24 December Available at: www.washingtonpost.com/technology/2019/12/24/colleges-are-turning-students-phones-into-surveillance-machines-tracking-locations-hundreds-thousands/ (accessed 30 January 2020).

33.

Hope

(2016) Biopower and school surveillance technologies 2.0. British Journal of Sociology of Education 37(7): 885–904.

34.

Ifenthaler

and Schumacher

(2016) Student perceptions of privacy principles for learning analytics. Educational Technology Research and Development 64(5): 923–938.

35.

Iliadis

and Russo

(2016) Critical data studies: An introduction. Big Data & Society 3(2): 1–7.

36.

Jarke

and Breiter

(2019) Editorial: The datafication of education. Learning, Media and Technology 44(1): 1–6.

37.

Jasanoff

(2004) Ordering knowledge, ordering society. In: Jasanoff

(ed.) States of Knowledge: The Co-Production of Science and Social Order. London: Routledge, pp.13–45.

38.

Jasanoff

(2015) Future imperfect: Science, technology, and the imaginations of modernity. In: Jasanoff

and Kim

S-H

(eds) Dreamscapes of Modernity: Sociotechnical Imaginaries and the Fabrication of Power. Chicago: The University of Chicago Press, pp.1–33.

39.

Jasanoff

and Kim

S-H

(2009) Containing the atom: Sociotechnical imaginaries and nuclear power in the United States and South Korea. Minerva 47(2): 119–146.

40.

Johnson

(2014) The ethics of big data in higher education. International Review of Information Ethics 7: 3–10.

41.

Jones

Flamenbaum

Buyandelger

, et al. (2014) Anthropology in and of MOOCs. American Anthropologist 116(4): 829–838.

42.

Kitchin

and Lauriault

(2014) Towards critical data studies: Charting and unpacking data assemblages and their work. The Programmable City Working Paper 2; 1–19. Available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2474112/ (accessed 31 May 2020).

43.

Knox

Williamson

and Bayne

(2020) Machine behaviourism: Future visions of ‘learnification’ and ‘datafication’ across humans and digital technologies. Learning, Media and Technology 45(1): 31–45.

44.

Latour

(1990) Drawing things together. In: Lynch

and Woolgar

(eds) Representation in Scientific Practice. Cambridge: The MIT Press, pp. 19–68.

45.

Lyon

(2003) Introduction. In: Lyon

(ed.) Surveillance as Social Sorting: Privacy, Risk, and Digital Discrimination. London: Routledge, pp.1–9.

46.

O’Neil

(2016) Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. New York: Crown.

47.

Ortner

(2006) Anthropology and Social Theory: Culture, Power, and the Acting Subject. Durham: Duke University Press.

48.

Pascarella

and Terenzini

(2005) How College Affects Students: A Third Decade of Research. San Francisco: Jossey-Bass.

49.

Porter

(1995) Trust in Numbers: The Pursuit of Objectivity in Science and Public Life. Princeton: Princeton University Press.

50.

Power

(1997) The Audit Society: Rituals of Verification. Oxford: Oxford University Press.

51.

Raley

(2013) Dataveillance and countervailance. In: Gitelman

(ed.) ‘Raw Data’ is an Oxymoron. Cambridge: The MIT Press, pp.121–145.

52.

Rubel

and Jones

KML

(2016) Student privacy in learning analytics: An information ethics perspective. The Information Society 32(2): 143–159.

53.

Sandler

and Thedvall

(2017) Exploring the boring: An introduction to meeting ethnography. In: Sandler

and Thedvall

(eds) Meeting Ethnography. London: Routledge, pp.1–23.

54.

Scott

(1998) Seeing like a State: How Certain Schemes to Improve the Human Condition Have Failed. New Haven: Yale University Press.

55.

Seaver

(2017) Algorithms as culture: Some tactics for the ethnography of algorithmic systems. Big Data & Society 4(2): 1–12.

56.

Selbst

(2017) Disparate impact in big data policing. Georgia Law Review 52(1): 109–196.

57.

Selwyn

(2015) Data entry: Towards the critical study of digital data and education. Learning, Media and Technology 40(1): 64–82.

58.

Selwyn

(2019) What’s the problem with learning analytics? Journal of Learning Analytics 6(3): 11–19.

59.

Shahar

and Harel

(2017) Educational justice and big data. Theory and Research in Education 15(3): 306–320.

60.

Siemens

(2013) Learning analytics: The emergence of a discipline. American Behavioral Scientist 57(10): 1380–1400.

61.

Slade

and Prinsloo

(2013) Learning analytics: Ethical issues and dilemmas. American Behavioral Scientist 57(10): 1510–1529.

62.

Slade

and Prinsloo

(2014) Student perspectives on the use of their data: Between intrusion, surveillance and care. In: Challenges for Research into Open & Distance Learning: Doing Things Better – Doing Better Things. Oxford, UK: European Distance and E-Learning Network, pp.291–300.

63.

Smith

GJD

(2016) Surveillance, data and embodiment: On the work of being watched. Body & Society 22(2): 108–139.

64.

Star

(1991) The sociology of the invisible: The primacy of work in the writings of Anselm Strauss. In: Maines D (ed.) Social Organization and Social Process: Essays in Honor of Anselm Strauss. New York: Aldine de Gruyter, pp.265–283.

65.

Star

(1999) The ethnography of infrastructure. American Behavioral Scientist 43(3): 377–391.

66.

Star

and Ruhleder

(1996) Steps toward an ecology of infrastructure: Design and access for large information spaces. Information Systems Research 7(1): 111–134.

67.

Stark

(2018) Algorithmic psychometrics and the scalable subject. Social Studies of Science 48(2): 204–231.

68.

Strathern

(2000) New accountabilities: Anthropological studies in audit, ethics and the academy. In: Strathern

(ed.) Audit Cultures: Anthropological Studies in Accountability, Ethics and the Academy. London: Routledge, pp.1–18.

69.

Strauss

(2005) Analyzing discourse for cultural complexity. In: Quinn

(ed.) Finding Culture in Talk: A Collection of Methods. New York: Palgrave MacMillan, pp.203–242.

70.

Suchman

(2007) Human-Machine Reconfigurations: Plans and Situated Actions. Cambridge: Cambridge University Press.

71.

Sun

Mhaidli

Watel

, et al. (2019) It’s my data! Tensions among stakeholders of a learning analytics dashboard. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, pp.1–14.

72.

Thaler

and Sunstein

(2009) Nudge: Improving Decisions about Health, Wealth, and Happiness. New York: Penguin Books.

73.

Wagner

and Longanecker

(2016) Scaling student success with predictive analytics: Reflections after four years in the data trenches. Change: The Magazine of Higher Learning 48(1): 52–59.

74.

Warikoo

(2016) The Diversity Bargain: And Other Dilemmas of Race, Admissions, and Meritocracy at Elite Universities. Chicago: The University of Chicago Press.

75.

Washington

and Hemel

(2019) By omitting race, the SAT’s new adversity score misrepresents reality. TIME, 21 May. Available at: https://time.com/5592661/sat-test-adversity-score-race/ (accessed 1 October 2019).

76.

Williamson

(2017) Big Data in Education: The Future of Learning, Policy and Practice. Thousand Oaks: SAGE.

77.

Williamson

(2018) The hidden architecture of higher education: Building a big data infrastructure for the ‘smarter University’. International Journal of Educational Technology in Higher Education 15(1): 1–26.

78.

Williamson

(2019) Policy networks, performance metrics and platform markets: Charting the expanding data infrastructure of higher education. British Journal of Educational Technology 50(6): 2794–2809.

79.

Wood

and Kroger

(2000) Doing Discourse Analysis: Methods for Studying Action in Talk and Text. Thousand Oaks, CA: SAGE Publications.

80.

Yeung

(2017) ‘ Hypernudge’: Big data as a mode of regulation by design. Information, Communication & Society 20(1): 118–136.