Abstract
Introduction
This paper seeks to contribute to the fast-growing body of
In fact, there is a visibly growing body of work that describes the expanding datafication and digitalisation of education policy and practice, enhanced by the promotion of the so-called evidence-based governance (e.g. Bellmann, 2015; Grek and Ozga, 2010), including research that has explicitly focused on the production and processing of international student assessments (Bloem, 2016; Gorur, 2014; Lewis, 2017; Villani, 2018). Nonetheless, the increasingly digital and automated formation, recoding, storage, manipulation and distribution of data, all of which have become integral features of education governance (Hartong, 2016, 2018a; Landri, 2018; Sellar, 2015; Selwyn, 2014: 1; Williamson, 2017), have not yet been extensively examined (see West, 2017 for an important exception), representing a ‘black box’ for most education researchers and practitioners. In other words, as described by Selwyn (2014: 13–14), there remains a pressing need to better understand ‘[…] how various forms of digital data are [specifically] set to work within educational contexts, including what data is used, what the uses and consequences are, and how data has become embedded within different organisational cultures’.
With this paper, we seek to respond to this need by examining selected features of expanding data infrastructures, flows and practices of school monitoring in three state
1
education agencies in two different national contexts, the
While the main goal of our study is to unpack for school monitoring what Kitchin and Lauriault (2014) describe as ‘data assemblages at work’, we also seek to contribute to a growing number of studies that focus on how the datafication and digitalisation of educational governance has manifested across educational contexts and systems. Notwithstanding a clear globalness in terms of ongoing transformations and thus broad commonalities between datafication policies in various countries (e.g. Lingard et al., 2015; Williamson et al., 2018), such examinations have also identified the significant influence of local contexts – including cultural, social or institutional settings – resulting in a significantly different ‘re/territorialisation’ of data infrastructures, flows and practices (Hartong, 2018a). Two of many examples are Schildkamp and Teddlie’s (2008) analysis of School Performance Feedback Systems in the US and the Netherlands, and a comparative study on educational data production, availability and use in China, Russia and Brazil by Centeno et al. (2018). The study presented here complements such analyses of digital technologies sitting ‘alongside pre-existing cultures and structures of educational settings’ (Selwyn, 2013: 209), while simultaneously filling a gap by focusing on a key, yet widely under-researched actor in the digitalisation of education governance so far, namely state education agencies and their role as ‘data hubs’ between global, national and local data infrastructures and flows.
The following section is devoted to further, yet brief, conceptual and methodological explanations before we explore the results of the study, principally drawing on 16 interviews with 20 state agency experts conducted in Hamburg and Massachusetts between December 2017 and April 2018. 4 Our particular emphasis hereby lies in documenting how the implementation of data-based school monitoring and leadership appears less as a purely technical procedure, but instead as a complex entanglement of very different (technical and social) logics, practices and problems. Specifically, we identify different types of ‘doing data discrepancies’, which, as we discuss in our conclusion, illustrate and conceptualise typical challenges associated with the pursuit of data, measurement and commensuration across many other domains of governmental or state activity, thus also offering important implications for the wider field of critical data studies.
Conceptual and methodological framing
The goal of our study was to better understand how datafication, in particular the ‘doing’ of school monitoring and leadership, has become enacted in two state education agencies across two country contexts. Since we have already discussed this conceptual framing extensively in former contributions (Hartong, 2018a, 2019), this section is limited to a brief summary of the main concepts employed:
A central theme of our analysis is that of
As our empirical observations will show, a key mechanism within the doing of state school monitoring is the fabrication of commensuration, which is the transformation of different qualities into comparable, usually quantified metrics (Espeland and Stevens, 1998). At the same time, however, commensuration requires enormous organisation, decision-making and weighting which, as many of our interviewees reported, can pose significant challenges (which are mostly externally invisible). The problem of fabricating commensuration equally concerns software/coding activities and the embedding of these kind of activities into wider institutional practices (e.g. school support, accountability or reporting), which, to a large extent, means linking numbers to norms, values, and politics, and vice versa – for example by deciding which targets schools are expected to meet and, consequently, when to intervene as a state. As Diesner (2015) has argued, small decisions can thus produce a big (governmental) impact. Consequently, aside from an earnest attempt to understand the enactment of data infrastructures in state education agencies, we aim to unpack (and typify) at least some of these underlying, often ambivalent and difficult decision-making processes, including their political implications (e.g. profiling, social sorting, control creep, etc., see also Kitchin and Lauriault, 2014).
With our exploration of state education agencies in Massachusetts (US) and Hamburg (Germany), we selected contrasting and simultaneously similar cases, resulting from a complex entanglement of different contextual dimensions (Sobe and Kowalczyk, 2018). On the one hand, the US and Germany contain federal, multi-level architectures, where state education authorities to a large extent decide on the implementation, transformation and use of education monitoring systems. On the other hand, both countries stand in stark contrast in terms of using and relying on (quantified) data for educational governance. While in the former we find a strong traditional belief in the value of testing, rankings and the expertise of private test providers (Sacks, 1999), the latter has for a long time placed its faith more strongly in teachers, exerting ‘[…] weak control and evaluation of the processes and almost no external control of the outcomes of schooling’ (Hopmann, 2003: 472). Even though Germany underwent a tremendous turn towards data-based school governance at the beginning of the 21st century (thus still qualifying as a global ‘latecomer’), this scepticism towards standardised testing and public rankings is still largely visible. In contrast, at least for the last 40 years, educational governance in the US has been characterised by the ever-growing importance of data-based accountability (Schildkamp and Teddlie, 2008: 262), further intensified by the so-called
Methodologically, the presented findings draw firstly on material collected through extensive online research and document analysis, including organisation charts, policy papers, documentation on the development and usage of data instruments, as well as online data dashboards. Building on this initial research, we further conducted 16 semi-structured interviews with 20 state agency experts, ranging between 60 and 90 minutes each. We focused on the most relevant institutions conducting state-level data work related to school monitoring and leadership – the DESE in Massachusetts and, for Hamburg, the BSB as well as the IfBQ. We talked to as many ‘data experts’ as access allowed, operating across the fields of data collection, validation, modelling, storage, processing and distribution.
Having concluded transcription of the interviews, we completed multiple reviews of the collected material, using topical coding as well as conceptual framing outlining (Rivera, 2018: 8). We first reviewed sources for the two cases separately, annotating the text using codes referring to (a) the data infrastructure and flows in educational monitoring and (b) descriptions of specific data practices. We then combined the annotated text sections from both cases in a new document, sorting, comparing and typifying data infrastructures and practices described across the two cases/three agencies. Despite a wide range of topics, complexities, entanglements and narratives covered in the extracted text, we also inductively found what we subsequently coded and further analysed as ‘doing data discrepancies’. We typified particular dimensions of such discrepancies assigned to certain text passages. Finally, we generated two visualised heuristics that facilitated further refinement of our findings, which we turn to in the next section.
Doing data-based school monitoring in state education agencies: Insights from Massachusetts and Hamburg
While it must be recognised that minor variation exists, the technical infrastructure/process of state school monitoring in Hamburg and Massachusetts can be broadly summarised as follows (Figure 1). Firstly, numerous data points are digitally collected from school and/or student information systems (in the US, with districts acting as data mediators) within varying time frames (from annually to daily). Submitted data is then validated, using a combination of automated and human checking processes, before being centrally stored (either in a data warehouse or in an oracle database). From there, different departmental units make use of data for modelling, analysis and/or data visualisation aligned to different data tools, while also working with external/internal research experts. Finally, the laboriously edited data is widely reported, both publicly and within different portals used by schools, parents or (in the US) districts.
The technical infrastructure of state school monitoring.
In general, most of our interviewees were well aware of the complexities behind this technical infrastructure, which might look straightforward on paper, but in practice includes various interdependencies and requires data to flow back and forth multiple times. In fact, most interviewees contrasted their work around data with linear procedures or loop circle models (as the technical infrastructure would suggest), instead describing it as highly experimental, involving significant elements of ‘messing around’ or, as one interviewee phrased it, ‘cooking’ with multiple ingredients (data, algorithms or models) to find working solutions within a highly diverse entanglement of often very different logics, stakeholders or problems. 5
In line with this argument, interviewees reported that it has become increasingly difficult to organise and work internally with growing amounts of data, which also means an increasing dependence on particular programs, algorithms or indices – with consequent effects due to their selectivity. As one DESE actor in Massachusetts reported: [I]t’s the program that specifies what you’re doing to the data. It says filter these things out, count that and don’t count this and add these things, but don’t add those things. It’s literally the program, the query that pulls across the data and so on.
6
An index value always expresses a particular background question, specific method-related consideration. And in fact this partly determines how to look at this data.
7
Framed by these more general narratives of both data expansion and data reduction, we identified different discrepancies about which our state agency actors raised concerns when describing their data work, both in a more narrow or technical sense (writing algorithms, building models for calculation, linking data) and also in a wider sense (embedding such technical practices into the wider contexts of school monitoring and leadership) (Figure 2). We discuss the most frequently reported discrepancies, which all share political implications, in greater detail within the following sections.
Doing monitoring in state education agencies.
Data simplification versus data accuracy
A central goal of state education agencies is to nudge schools, teachers, parents or the wider public in the direction of using (their) data more frequently and to improve data-based communication. However, at the same time, our interviewees were well aware that many of their addressees lacked the time, expertise or motivation to dedicate much time to understanding, using and interpreting the (rising amount and complexity of) data in the ‘right’ way. As one DESE actor in Massachusetts expressed: What we’re trying to do right now is expand our outreach because we know that there’s a huge opportunity for parents and kids, other audiences to use this data but they’re not going to have as much experience with data, they are not going to be the ones to download it and put it into Tableau and run analytical reports to figure out which school has the best support program.[…] [W]e are trying to […] really work […] on data visualisation and doing more actionable data with less interpretation of the data. So that we do it so that parents, we can reach that audience that is not data experts. It makes much more sense to arrange data in a way that takes less effort to use. Instead one can instantly draw on it, show things. This […] map is designed to be printed in any format ready to use for presentations.
This user-friendly simplification, however, is accompanied by a significant risk of neglecting the multiple possible interpretations of data, intended to be viewed in a context-sensitive manner. In other words, a key issue raised by a number of interviewees was what they described as an unfortunate discrepancy between the demand for simplification and a simultaneous demand for data accuracy, which appeared just as relevant for data communication as for data production and processing.
As an example, the person responsible for the Someone was looking at [the data and said] maybe I should use this and start to encourage kids who are at high risk of not going to college or not persisting at college to advise them away from post-secondary. We were like, no! That’s exactly the opposite of what this is. I’ve been a little worried that we have so many data tools, as you’ve seen […] and we develop them in so many different ways and we deploy them in so many different ways and I’m worried […] that we are not clear on what these are all for, who should use which ones, what’s the right audience and that kind of thing. [This instrument] is not really user friendly, because it offers something for everybody, which means it actually doesn’t offer the right thing for anybody.
Who to compare?
Closely related to the problem of how (strongly) to contextualise data, a key component of doing monitoring for school monitoring and leadership lies in commensuration, which is making particular things
One challenge comes along with what our respondents reported as the increasing adaptation of the so-called ‘fair comparisons’. Different from comparing, for example, a school’s performance to the performance of neighbouring schools or a student’s performance to peers in his/her class, fair comparisons instead allow us to relate a particular performance result to the schools/students across the state that are the most statistically similar, thus promising a better (fairer) and context-related understanding of data. Such de-territorialised forms of comparison, which have been made possible by data centralisation, interoperability and standardisation, are not limited to measuring and comparing performance data, but instead have increasingly become part of all kinds of data tools used by state education agencies. However, while this growing reference to statistical ‘context’ has introduced a new (‘fairer’) dimension of data contextuality, it has also further complicated the question of how much (territorial or statistical) context is needed to properly understand and use data (see last section). In other words, interviewees suggested that is has become increasingly difficult to determine how many and which comparison options should be ‘offered’ to data users or directly built into data tools.
In Hamburg, we found such challenges reflected in the fabrication of a social index (
In Massachusetts, an instrument reflecting the discrepancy between territorial versus statistical contextualisation is the so-called Resource Allocation and District Action Reports (RADAR) instrument. RADAR collects and models various district-level data with a focus on improving finances and spending. An important part of RADAR is the option for districts to make sense of their data by comparing themselves to up to 10 other districts, including those which are territorially far away but similar, e.g. in demographics or student performance. However, as one DESE actor reported, getting districts to use different (fairer) forms of data contextualisation has proved challenging: […] [M]ost districts will pick at least a couple of comparison districts from those right around them. Then we had thought that our DART list [DART =
Both examples presented highlight the complexity underlying increasing options for commensuration using either territorial or statistical relation-making, or indeed trying to find a balance between the two.
Speeding up data production while improving data validity
Another key problem of doing monitoring within state education agencies is improving data validity while handling (public or political) expectations to produce data more rapidly, frequently or, at best, in real-time. The majority of our interviewees expressed concern about this tension and the challenges of developing solutions to deliver data more quickly or, at least, ‘fast enough’ (e.g. for publishing educational monitoring reports by the time they are needed to inform decision-making), while still ensuring data quality and validity.
In both cases, state education agencies are dealing with this issue by setting a specific (yet not overly extensive) timeline for data validation practices, including a deadline which defines the moment at which data becomes ‘frozen’. In other words, at that point in time (also described as the ‘single point of truth’) data in the system is perceived to be correct and is further processed into reporting or additional data modelling, while further data changes are permitted. As one DESE actor described it: […] [O]nce they certify it, once every district says okay, I’m certifying this data and we compile it, I think of it as like the big steel door shuts, that’s it. […] You can’t go back […] because once you have that data and you start reporting it out, it’s reported out in so many places. So we don’t have control of all that anymore. Well, one has to accept that it’s actually the right thing not to correct [it] anymore because you already reported the data to the KMK [Standing Conference of the Ministers of Education of the German States] after all, the senator launched the numbers at the press conference. […] This data shouldn’t be changed because it would open up an already published stage of affairs. What’s the bigger problem is [the schools] […] fix [their data] […] in order to get the submission through, but they never go back to fix it in their system so that the next time they don’t get those errors, is probably the bigger problem. So we fix these […] [data errors] from June while we’re scoring the essays over the summer. So it takes four weeks or five weeks to score 2 million essays, 3 million essays we are scoring and know the response from that. While that’s going on […] [the schools are] fixing the data. […] In September we have official results, and it’s almost always perfect. […] [U]nfortunately one has to say, you only have a certain period of validation. So the data is also looked at in detail afterwards. If something slips through, which regrettably occasionally happens and is very important, the data warehouse and the single point of truth will be changed in retrospect. […] [Y]ou live with the one or two errors for school reporting but at the student level we don’t live with it. We still fix it, even now especially for high school because you can’t graduate unless you pass the test. […] We will go back. Two years later someone says: I know I passed that test, how come I can’t graduate? Okay. So we’ll go down and we’ll do handwriting analysis […]. For graduation and sometimes for scholarships that we also give because of the tests, we have one person who is always working on forensic examinations.
Increasing both data transparency and data security
A fourth discrepancy which was frequently reported by our interviewees in both countries was the problem of simultaneously increasing demands for data transparency and data security.
Particularly in the US, where a much stronger value has traditionally been placed on publicly available data, respondents strongly supported publishing as much data as possible: I’m a believer in publishing as much data as we possibly can. I think that the education community, or even just the public at large, has become much more fluent in being able to look at data and understand data. I […] believe in that you can effect change by just purely publishing data and letting people see it. Because then I think people will start asking questions about it and that’s a form of accountability. We publish an awful lot of data and it’s sort of a philosophy that says, “Let’s put as much out there as we can and let people harvest or digest whatever the appropriate level that’s right for them.” That’s sort of the general idea. There are plenty of people who go overboard and they overreact to the data that they don’t really understand sometimes and they make decisions, even though it’s good data you can make a bad decision with good data. […] [T]he public looks at these and you may think that a 1% difference from the third grade this year and reading results so next year it goes down 1% you would think, so what? That’s statistics. I live in a town where that 1%, people will get concerned about that. I would worry about there’s tons of ways to use this data inappropriately, either tracking them or shaming students and worry about some of that. So I want to make sure that folks who are using it are using it in a way that helps them. We don’t have a parent portal, that’s always been talked about, there are lots of really tactical implications around how to ensure that the right people have access to it. One could suspect that [relation]. We never really examined that. But these are topics and things [school choice decisions] that are mostly talked over privately, unrelated to the social index. Not even all parents know about that instrument. […] Parents have a particular idea of school anyways […] one could assume that some aspects of parental school choice are influenced by that. But that is something you would have to take a closer look at. As soon as schools notice something is done with their data, you should carefully consider whether you do it or not. […] Schools are quite sensitive in this regard. Data protection has become much more important over the past few years, compared to 10-15 years ago, because there are many more possibilities and there is much more data available than a few years ago. Hence, it’s important to describe methods and processes very clearly, to define the conditions for linking data.
Proactively considering the ambivalent effects of accountability
As mentioned in the case selection part of this paper, the US and Germany stand in stark contrast in terms of their attitudes towards using and relying on (quantified) data – particularly for accountability purposes – with Germany being much more sceptical towards published data, standardised testing and the use of high-stakes rankings. Having said this, while using data for accountability was still much less of an issue in Hamburg (yet this seems to be gradually changing, see below); in Massachusetts it was frequently mentioned as a key feature of doing data, which simultaneously, however, appears to intensify the aforementioned discrepancies. DESE actors were thus well aware that using data to enforce accountability and build accountability models is strongly influenced by the norms and values used as underlying benchmarks: […] [W]e’ve done some pretty deep philosophical discussions when we are debating what indicators to include, how much improvement we should expect to see and it’s an interesting balance of this technical side and then this normative side. Because in the end, you’ve got to say, did this school make it or not? From the accountability perspective, it could be tricky to put [particular] […] kind[s] of measures in accountability. Like, for example, we’re piloting a school climate survey. We could down the road consider putting that in, but you create these centres for teachers or principals or whoever to tell the kids, make sure you fill all those out, the top possible score in a way that’s harder to do with an assessment, like it’s harder to manipulate assessment graduation rates. So that’s where I think the conversation gets more challenging, is around using accountability. Political decisions won’t be linked to that data […]. Not like in the US where schools can be closed [based on accountability scores]. […] We don’t think about this. And I think that’s good. There are a lot of schools that, from our view, are doing a good job on […] [using data]. They look at the results, discuss them […], build on them for school development. Schools have also got their target agreements with their school supervision agency, where such topics are discussed as soon as they notice that they perform lower than comparable schools. They say they want to change something, want to improve some student groups who perform badly. Many schools do a great job on that.
Concluding remarks
The aim of this paper was to provide empirical insights on how state education agencies in Germany and the US enact the rising datafication of schooling, looking, in particular, on data infrastructures, flows and practices for school monitoring. Our findings have thus illustrated some selected features of this ‘doing’ monitoring as reported by actors from three state education agencies in two different national contexts. Even though our findings generally reveal what Selwyn (2013: 198) has described as the ‘messy’ realities of technology and education, we still identified different types of ‘doing data discrepancies’ that present somewhat typical challenges and ambivalences described by our interviewees in both countries in surprisingly similar ways. This is despite the fact that our respondents from Hamburg continue to articulate strong criticism of (public) rankings and high-stakes testing. Nonetheless, we found contextuality to be highly reflected in the doing of data, for example as evident in the example of making data publicly available. It is important to mention once more that both selected cases –
At the same time, the discrepancies reported by our interviewees show how the social and technical are not only deeply interwoven with data-based school monitoring, but also – as emphasised in the existing critical data studies literature – how data practices always have political implications, particularly when applied to systems of (high
Nonetheless, a key result of our study is that datafication, at least in our selected state education agencies, does not appear to produce single centres of calculation and data power, but is instead mediated through multiple infrastructures and practices that together perform calculation, commensuration and data work. Against this backdrop, we fully agree with Gray et al. (2018: 1) that instead of (only) calling for data literacy in the sense of competencies in reading and working with datasets, there is a pressing need for so-called
