Abstract
Introduction
On the first day of the inaugural Data Science for Social Good (DSSG) program at the University of Washington in 2015, Nick Bolten stands in front of program participants to introduce the project he will be co-leading over the summer. He opens with a story about a man pseudonymously named Scott. Scott uses a wheelchair, and one day as he was traveling through downtown Seattle, he came across a section of the sidewalk that was closed due to construction of a new building—a common occurrence in one of the fastest growing cities in the USA. A construction worker pointed Scott to a temporary pedestrian bridge that would get him safely across the construction zone, and he set out on this detour. However, as he approached the bridge, Scott's wheels got stuck in the soft tarmac of a ramp that had recently been laid, and someone had to come help him get unstuck.
Had Scott known he would encounter a construction zone with a closed sidewalk that morning, he could have planned a different route and avoided the whole situation. However, unfortunately, there are not good informational tools for planning your commute if you have a disability, Nick tells the group. This is why Nick and his colleagues came up with the idea for AccessMap, a routing application that is similar to tools like Google Maps or Waze, but accounts for the needs of people with limited mobility. In recommending routes, the app would take into consideration features such as steepness, sidewalk closures, crossing signals, and curb ramps. However, before such a map can be widely used, it needs to be populated with data about the pedestrian environment. This is where the student interns in the DSSG program come into play; they will help develop the pedestrian data infrastructure that forms the base layer AccessMap will be built upon.
While doing that work, the team will encounter myriad choices about how to generate and represent data. This paper tells the story of their deliberations and decisions to show that their experience—though perhaps atypical—can teach us much about ethical reasoning during the development of data-intensive technologies and methods. I develop the concept of “ethical abduction”—a process by which actors can intentionally and systematically address ethical issues that arise during their day-to-day actions by making decisions informed by a foundational ethical worldview. This mode of ethical reasoning entails tacking back and forth between divergent but complementary ways of thinking: between establishing ideals and making decisions given practical constraints; between understanding historical context and anticipating future consequences; between acknowledging structural dependencies and accepting responsibility for moral agency.
Relevant work: The ethics of data-intensive technologies
Nick and his teammates conduct their work amid heightened interest in the ethics of data and the technologies they enable. Ample scholarship has exposed the propensity for such technologies to complicate accountability (Wieringa, 2020), automate and obfuscate discrimination (Noble, 2018), trample expectations for agency and consent (Barocas and Nissenbaum, 2014), and enable unprecedented forms of surveillance (Andrejevic and Gates, 2014). Literature focused on ethical interventions to address these and other transgressions has burgeoned in recent years. In this body of work, data, data science, algorithms, machine learning, and artificial intelligence are often treated as closely related phenomena (Peters et al., 2020; Saltz et al., 2019; Whittlestone et al., 2019). Data are understood to be the foundation upon which algorithmic technologies are built, and as such are implicated in conversations about the ethics of machine learning and artificial intelligence (Slota et al., 2020). Although the case presented herein does not involve the development of AI, a review of literature relevant to data ethics touches on this broader corpus of scholarship.
Just as this area of work cover a broad swath of related technologies and methods, so too does the term “ethics” take on expansive meaning, ranging from “open-ended philosophical investigation into moral conditions of human experience” to “ bureaucratized expectations of professional behavior” (Metcalf et al., 2019: 449). Here, I broadly consider ethics to mean an intentionally cultivated understanding of what constitutes good or righteous action. Although my research subjects did not frequently use the term “ethics” during the course of their work, I follow Whitman (2021) in considering “acting upon data with ‘good’ intentions” (p. 7) to be a salient mode of ethical practice and take etic liberties in casting many of their deliberations as ethical choices.
It has become commonplace for organizations that develop data-intensive technologies to enumerate ethical principles guiding their work (Floridi and Cowls, 2019; Jobin et al., 2019). Critiques of this “principled” approach to ethics, meanwhile, are almost as prevalent as lists of principles themselves (Bietti, 2020). Although some analyses question whether powerful actors are co-opting the language of ethics to burnish their reputations (Bietti, 2020), numerous scholars address “the gap between principles and practices” (Morley et al., 2020: 2143) as a matter of execution rather than intent—a longstanding question of concern for technical practice (Ackerman, 2000). Saltz and Dewar (2019) found that broad ethical principles do not map well onto specific ethical concerns raised in data science and therefore provide little concrete guidance for practitioners. According to Mittelstadt (2019), organizations can only live up to ethical ideals if they develop processes for translating between high-level principles, mid-level organizational norms, and low-level technical design requirements. Scholars and technology developers have tried to facilitate such translations by creating “tools” designed to help practitioners address ethical issues in their work (Ayling and Chapman, 2021; Morley et al., 2020). This includes, for example, templates for documenting decisions and transformations made during research or design (Gebru et al., 2021), frameworks for conducting algorithmic audits (Brown et al., 2021), activities for prompting critical reflection (Ballard et al., 2019), and checklists for ensuring due diligence (Madaio et al., 2020). Schiff et al. (2020) argue that possibly too many such tools exist for practitioners to thoroughly and meaningfully vet them, while Morley et al. (2020) found that many AI ethics tools suffer from poor usability.
For other scholars, however, the shortcomings of a principles-first ethical approach are not limited to the lack of better translational tools. Rességuier and Rodrigues (2020) argue that the very act of agreeing on a static set of principles encumbers the ethical imagination and risks stifling reflection—a necessary feature of ethical practice. For Peters et al. (2020), there is an inherent tension between the establishment of higher order principles and the realities of “context-specific, actionable practice,” such that “there will always be ethical decisions and tradeoffs that are not amenable to universally applicable specifications” (p. 35). Indeed, as Barocas et al. (2017) note, “in most cases there are no ‘right’ choices or even a ‘right’ way to make these choices,” only sets of “difficult trade-offs between competing goals and values” (p. 71). Taking these critiques seriously means that we ought to study and support ethical reflection among practitioners within specific contexts of practice.
The few studies to do this yield valuable insights. Scholarship on practitioners developing artificial intelligence and computational technologies more generally highlight how individual ethical agency is “extrinsically bound” (Orr and Davis, 2020: 725) by more powerful actors, structures, and cultural forces (Metcalf et al., 2019). At the same time, seemingly “mundane” decisions made in the course of routine work regarding the structure, format, and quality of data have profound downstream ethical implications (Leonelli, 2016)—a fact that is frequently not lost on practitioners, who often make a “scapegoat” of “bad data” when their technologies go awry (Slota et al., 2020: 2). Practitioners can recognize particular moments when they face decisions of ethical consequence through the incorporation of activities that prompt reflection and discussion about ethical values (Shilton, 2013). “Ethical work” is often hidden, time-consuming, resource-intensive, and unending, for the moral determination of what counts as “good” or “bad” must be perpetually made and remade through situated action (Ziewitz, 2019). This means that “ethical reasoning” must become an “integral part” of data-intensive practices (Leonelli, 2016: 3).
To summarize, recent work demonstrates that the articulation of ethical principles is not sufficient for producing ethical practices, and that the challenge of translating between the two is a thorny one. Specific tools and activities designed to bridge this gap can only go so far because data-intensive projects and contexts are highly idiosyncratic. What is most needed, therefore, is the normalization and scaffolding of ongoing, robust, and adaptive ethical reasoning.
But, what does ethical reasoning mean, what does it look like, and how do we support it? Answering these questions is essential to deepening our understanding of—and strengthening interventions upon—data ethics. I address this important opening in extant scholarly conversations by telling a story about practitioners who recognized the ethical consequentiality of the many mundane choices they made regarding the production of data. As an ethnographer with a front-row seat to their work, I documented and analyzed their in situ reflections and deliberations to distill lessons that can be applied to ethical reasoning in data-intensive practices more broadly. Herein, I describe and name the process of reasoning they used as “ethical abduction.” My work on ethical abduction is in conversation with recent scholarship that draws on eclectic theoretical inspirations—including pragmatist, feminist, and analytical philosophical traditions—to theorize the mundane nature of ethically consequential decisions made during the everyday practices of data-intensive work. It marks a shift away from treating ethics as “a form of argument or an abstraction” and toward the development of an “ordinary ethics” (Metcalf et al., 2019: 455), which, in keeping with the analytical philosophy of Wittgenstein (cited in Lambek, 2010), treats ethics instead as “moral commitments embedded in actions” (Metcalf et al., 2019: 455). Like Jaton (2021), I attend to moments of “hesitation” (p. 2) when actors face what pragmatist philosopher William James called “genuine options” (cited in Jaton, 2021) offering up distinctly divergent futures. Noticing such consequential junctures requires the “attentiveness” called for in feminist ethics of care (Tronto, cited in Rességuier and Rodrigues, 2020: 3) that can result in a “constantly refreshed capacity to perceive the world” (Rességuier and Rodrigues, 2020: 3). Building on these efforts to reframe the ethics of data-intensive technologies as quotidian, ad hoc, and generative, the concept of ethical abduction describes a process of making those theoretical commitments concrete. As a mode of reasoning for addressing ethical challenges in data-intensive work, ethical abduction signals several key moves that are developed in the conclusions of this paper. First, it provides a substantive but flexible alternative to the problematic “principled” approach to data ethics. Second, it underscores that data ethics entails skilled labor requiring material support. Third, it situates ethics in its proper place as a process of intellectual discovery and knowledge-building.
Methods
This paper should not be construed as an argument that all data scientists think deeply about the ethical implications of their work. The story I tell is not meant to be representative of data science practice or the development of data-intensive technologies overall. Rather, it is a case carefully selected from among many in the corpus of my ethnographic data precisely because it offers an example—perhaps a rare one—of what happens when a team of technologists do attempt to intentionally build a data infrastructure that reflects and propagates a carefully considered ethical worldview. It is an exploration of the processes that can support ethical reasoning in data practices, and it is part of my own normative commitment to advancing ethical practices in data-intensive methods and technologies.
This empirical case is situated within a long-term ethnographic study of data science practice and culture at the University of Washington. The analysis is based on a three-year period of intensive fieldwork from 2015 to 2017 conducted in the DSSG program, a 10-week long summer internship for undergraduate and graduate students. I acted as a participant-observer during both the implementation of this internship program each summer and during program planning throughout each academic school year. I conducted approximately 1500 hours of observations and participated in the program by assisting with logistical planning and developing a curriculum over the course of several years around data ethics and stakeholder engagement. I adopt a constructivist grounded theory development approach (Mills et al., 2006), as modified along guidelines laid out by Charmaz and Mitchell (2007) to accommodate unstructured ethnographic data generated through fieldwork. The specific analysis presented in this paper resulted from a situational mapping process (Clarke, 2009) conducted on a subset of data encompassing field notes related to the AccessMap/OpenSidewalks project. I retroductively constructed an ethnographic narrative around that analysis (Ragin, 1994) that foregrounds key moments in which project team members grappled with major decisions impacting the trajectory of their work.
Ethical decision-making in the AccessMap/OpenSidewalks project
“How do you optimize for delight?”
On a warm spring day in Seattle, three researchers are huddled in a small meeting room tucked away into the corner of the Data Science Studio at the University of Washington. Two of them, Anat Caspi and Nick Bolten, are spearheading the push to develop the AccessMap application introduced earlier. Their project has recently been selected for the annual Data Science for Social Good (DSSG) program run by the eScience Institute at the University of Washington, meaning that during the coming summer, a team of four student interns will be assigned to work on their project full time. The third person in the room, Vaughn Iverson, is a data scientist from the eScience Institute who will mentor the students throughout the program. The three of them are meeting to develop a plan for the small slice of this sprawling, long term initiative the students will be tackling during the 10-week long internship. Anat and Nick intend to use the open source, user-generated OpenStreetMap (OSM) as the base layer for AccessMap; this summer they want to focus on improving standards for incorporating pedestrian data into OSM. They refer to this effort as OpenSidewalks, a complementary sister-project to AccessMap. Owing to the overlapping goals and membership across these efforts, from here on I refer to the project and team by the portmanteau, “AccessMap/OpenSidewalks” or simply “AMOS.”
Vaughn arrived at the meeting carrying a single-wheel electric skateboard that he leans against the wall before taking his place at a small round table. “I’ve been paying a lot more attention to Seattle's sidewalks since I started using that thing,” he says, gesturing to the device.
Anat immediately seizes on Vaughn's comment by pointing out that he has highlighted yet another population that could potentially benefit from their project. Although AccessMap is designed with mobility-impaired individuals in mind, she says, similar information about the condition of sidewalks is relevant to many kinds of pedestrians: people pushing shopping carts and baby strollers, people dragging wheeled suitcases, or people like Vaughn who are taking advantage of powered pedestrian devices like electric scooters and skateboards. And if the OpenSidewalks initiative succeeds in facilitating the creation and sharing of data that is relevant to a wide range of pedestrians, Anat muses aloud, what other kinds of information would they care about? On hot summer days, might they want to know what route offers the most shade or has a water fountain along the way? On a gorgeous spring day like today, might they want to know which of Seattle's streets are lined with blossoming fruit trees?
The trio sums up this line of thought by posing the question, “how do you optimize for delight?”
Paradoxes in the AccessMap/OpenSidewalks project
In practice, however, such diverse priorities are not so easily harmonized. In the following stories and analysis of the AccessMap/OpenSidewalks project, I explore how tensions, trade-offs, and compromises play out in the team's day-to-day work. I trace decision points at which they weighed the priorities and constraints of various stakeholders in their project and attempted to strike a balance that would ultimately support their overarching sociomaterial worldview and ethical vision.
The social model of disability as sociomaterial worldview
As Nick wraps up the introductory talk described at the beginning of this paper, he opens the floor to questions from the audience. Samuel (a pseudonym), one of the data scientists from the eScience Institute, raises his hand and says he has a “50 thousand foot question.” You know the sad story of the iBOT, he begins. It flopped, but why isn't
The iBOT was a powered, four-wheel-drive wheelchair that could “climb up and down stairs and curbs, roll across varied terrain, raise a seated user to eye-level-standing height by rising up and balancing on two wheels, and travel in this mode—all while relying on sophisticated sensors and gyroscopes to maintain the chair's balance” (Vogel, 2016). After the device came on the market in 2003, insurance companies balked at its $24,000 price tag, and its production was discontinued in 2009 due to slow sales.
In framing his query as a “50 thousand foot question,” Samuel is signaling that the subject he raises may be orthogonal to the project team's immediate work, but is more broadly fundamental to the way we approach disability in general: should we be focused on building a world that is accessible to the disabled, or on equipping people with disabilities to access the world as it is? Anat briefly responds to Samuel's question by highlighting Medicaid's role in influencing the development of accessibility technologies in the US, and how that played a part in the iBOT's demise. Then she quickly redirects the conversation back toward the informational questions her team is focused on over the summer.
However, a bit later, when the group has dispersed into their assigned teams so that the students can learn more about their projects and get acquainted with their new teammates, Anat circles back around to the issue Samuel raised. She explains that essentially Samuel was asking if it would be more efficient and cost-effective to invest in wheelchairs that can go upstairs than to redesign the built environment to accommodate disabilities? Anat takes the opportunity to share her own perspective on disability. In the US, we tend to treat disability as a medical problem, she says, a flaw within an individual that needs to be fixed. However, we can also think of disability as a situationally-determined social construct, something that happens because the world is designed only with certain abilities in mind, not with the abilities of all.
These interactions reveal the perspective of AMOS leadership regarding the relationship between technology and society. Anat was essentially introducing the students to what is known as “the social model of disability.” According to this perspective, first articulated by Michael Oliver in
Anat's rebuke to Samuel's question was not leveled at him or the iBOT itself; rather she was taking issue with the existence of a narrative that we should give all people with mobility impairments fancy wheelchairs that “fix” the person instead of addressing the cultural assumptions, political incentives, and sociotechnical systems that have played a role in producing their disabilities. Given this positioning, while the AMOS team wants to build a tool specifically for people with limited mobility, they hope to do so while simultaneously attempting to address the broader culturally derived assumptions that lead to limited informational resources for people with disabilities. Owing to their belief in the social model of disability, they see themselves as not just building an app, but also trying to normalize the needs of people with limited mobility as one among a range of options—not a special category, but part of a spectrum of abilities, desires, and needs in a pedestrian-centered transportation network and information system. And so ultimately, their objective of designing a flexible pedestrian data infrastructure that could support widely varied needs, priorities, and uses becomes even more important than building AccessMap itself.
Deliberations over how to represent data
On a practical level, this means that the AMOS team needs to develop a proposed data schema that will enable them to build a tool specifically for people with disabilities but simultaneously support myriad other purposes as well. They are cognizant that choices made about the representation of data are value laden and consequential, enabling some uses while inhibiting others, elevating some perspectives while suppressing others. For example, the team had a lengthy conversation during which they deliberated how to account for “street furniture” like benches when determining a route in AccessMap. If a bench narrows the sidewalk, perhaps it should be labeled as an
The team's objective was to develop a schema for pedestrian data in OSM that could become the new “functional standard” on a platform that does not actively enforce any true standards, but instead promotes agreed-upon best practices. To be successful, the AMOS team needed to put forth a proposal that would be likely to receive widespread support across various constituencies within the community. As a result, most of the team's time in the DSSG program was not spent on the sorts of canonical tasks most often associated with data science—things like data cleaning, visualization, analysis, and model building. Instead, they did layers upon layers of qualitative, communicative, and interpretive work in support of stakeholder engagement: the team interviewed users of assisted mobility devices to find out what kinds of information were useful to them in navigating the city; observed the work of accessibility auditors; and spent hundreds of hours engaging with the OSM community by studying their archives, contributing to listservs and wikis, and attending events.
The team carefully considered the insights they gleaned from each of these interactions when making key decisions about how to represent pedestrian network data. In doing this, they called attention to the ways that choices about the representation of data had the potential to encode judgments that privileged some perspectives and values over others. To illustrate this, below I provide a detailed description and mapping of five key decision-points the team faced regarding the representation of data. In Figure 1, I have summarized the values, stakeholders, and trade-offs relevant to each decision. I characterize the four primary values the AMOS team juggled (depicted in bold black letters) as intuitiveness, relevance, interoperability, and computational efficiency. While these terms are surely recognizable to my research subjects, they were not necessarily the exact terms deployed in their conversations, so I treat them as ethnographer-derived analytic categories rather than in vivo actor's categories. By

Tradeoffs in the AccessMap/OpenSidewalks project. This diagram maps salient choices the AMOS team faced regarding how to represent the pedestrian network in a data schema they planned to propose to the OpenStreetMap (OSM) community. Values and stakeholders are depicted in black, while the tradeoffs they considered appear in gray. Gray arrows indicate which values were in tension at each respective decision point, with the larger arrowhead indicating which value(s) ended up prevailing in the tradeoff.
Tradeoff 1. Pedestrian data as attributes or features: Weighing intuitiveness and computational efficiency
Like many modern mapping tools, the norms in OSM were developed primarily with automobile travel in mind. Thus, built features serving pedestrians—things like sidewalks, crosswalks, and curb ramps—are typically included as annotations denoting attributes of streets rather than as distinct features drawn on the map. Essentially, pedestrian features are often treated as metadata rather than data. For example, a sidewalk most often will be documented with a tag on a roadway indicating that a sidewalk exists on its left or right side.
One of the first, and most fundamental, decision points the AMOS team confronted was whether they should make an argument to include pedestrian data as features on the map in their own right, rather than as annotated attributes of streets. Very little of the pedestrian network was documented in OSM, and the team felt this was partially because adding pedestrian metadata involved a cumbersome and non-intuitive process; it required map contributors to wade through several layers of documentation in order to “tag” streets with the proper annotation. If instead, users were able to view pedestrian features on the map as shapes with their own geospatial location, they could more easily ascertain when pedestrian data were missing, and more easily contribute them to the map. This seemed important for increasing the coverage of the pedestrian network in OSM.
However, the team also had to consider how this would affect the OSM community, and whether they would be receptive to the proposed change. Vaughn, the data scientist assigned to be a mentor on the project, had experience as an OSM contributor and was able to help the team think through reasons the community might have concerns with their proposal. It would take much more server space to add pedestrian data as features rather than as attributes and that might be an important consideration, he warned them. “I don't think anyone is going to stand up and say, ‘screw people with disabilities,’” he reasoned. “They’re going to do a cost-benefit analysis: ‘Should we triple the size of the database and go through all this work to benefit a small fraction of users?’” Still, the team felt strongly that getting pedestrian data on the map would require a more intuitive approach for map contributors, so they pressed on with the decision to propose adding these features to the map, even though this would compromise computational efficiency for the OSM organization.
Tradeoff 2. Sidewalks as polygons or lines: Weighing relevance, intuitiveness, interoperability, and computational efficiency
Once the team was sure they would advocate for the pedestrian network to be included as features on the map, they had to decide how those features should be represented; essentially, what shape they would take. Should they document sidewalks as polygons or lines? Polygons would allow for mapping precise dimensions of sidewalks and would be helpful in documenting any characteristics of the sidewalk related to its surface, such as pavement texture. This could enable valuable information relevant to their primary constituency of potential AccessMap users; for example, someone in a wheelchair might find it useful to know that a given sidewalk segment might be too narrow to navigate, or where the pavement is too rough for their needs. Simple lines, on the other hand, would be far more intuitive for contributors to add to the map as they would not require the same degree of precision. They would also be more computationally parsimonious, taking up less OSM server space. And it would be more straightforward to create a routing application that traversed lines rather than polygons, making lines a better choice for interoperability with applications built on top of OSM, including AccessMap. Given all that, it made sense to trade some degree relevance that polygons would lend potential AccessMap users for the benefits of mapping sidewalks as lines: it would mean that their schema was more likely to be accepted by OSM because of computational efficiency, more likely to spawn contributions of pedestrian data because of intuitiveness, and more likely to proliferate tools for a multitude of pedestrian uses because of interoperability.
Tradeoff 3. Curb ramps as nodes or polygons: Weighing relevance and intuitiveness
The AMOS team had similar deliberations over how to represent curb ramps in OSM. The team had already decided they would advocate for curb ramps to be visually represented in OSM, but they had to decide what shape to assign them. Again, they considered polygons as an option, but this time weighed that possibility against the option of demarcating curbs with nodes—simple dots on the map. Polygons would of course offer the most fidelity to reality, and possibly give people with limited mobility relevant information. For example, some ramps wrap around an entire corner, while others point into the intersection at a 90-degree angle, and seeing that level of detail might be valuable to certain people with limited mobility. Given that curb ramps take up so much less surface area than sidewalks more generally, and given that the team had already ensured routing capabilities by documenting sidewalks as lines, computational efficiency and interoperability were less important considerations in this decision. However, intuitiveness for map contributors was still very important. Polygons would require a high degree of precision and skill to map, whereas Nick pointed out that “a node is the easiest thing to add to OSM.” Once again, potential for dramatically increasing coverage of the pedestrian network by making it intuitive for contributors to map won out over the possibility of adding data that could be particularly relevant for potential AccessMap users: the team opted to advocate for curb ramps to be documented as nodes on the map instead of polygons.
Tradeoff 4. Curb ramps as single or multiple nodes: Weighing relevance and interoperability
Even still, there were several different ways that curb ramp nodes could be represented, and they had to decide which one they would advocate for in their proposed schema. The first option they discussed involved simply placing a node at the junction of two sidewalk segments to indicate the presence of a curb ramp. This option would require such a trivial operation that it could easily be automated by writing a computer program to add a node if municipal records indicate that a curb ramp is present anywhere along that corner. This would help facilitate interoperability, since routing applications could then easily incorporate curb ramp data into their recommendations.
A few days following that discussion, the issue of curb ramp representation came up while the AMOS team was on a visit to learn how a local public transportation agency determines which routes are accessible to their disabled customers. The team was walking through a neighborhood near downtown Seattle with contractors who do accessibility evaluations on behalf of the transit agency. The contractors were explaining to the AMOS team what kind of features they look for in the built environment, how they measure obstacles, and how they document their data. In Figure 2, the group has paused to discuss whether a particular intersection is accessible or not. Across the street from where they stand, in the shadow of an abandoned building, is a corner with irregular curb cuts. As one can see in the photo, the side of the street facing the group has a curb ramp, but it is offset some distance from the marked crosswalk on the street. The perpendicular side of the block, meanwhile, has no curb cut at all. Nick says that among wheelchair users they recently interviewed, there are different levels of “adventurousness.” One set would be okay with this situation—as long as there is a curb cut

The AccessMap/OpenSidewalks team contemplates irregularly placed curb ramps.
Tradeoff 5. Manual contributions or mass imports: Weighing interoperability and computational efficiency
To adopt OSM as the base layer for AccessMap—to make these tools interoperable, in other words—the AMOS team would need the pedestrian network data in OSM to be relatively complete. However, they worried that if they were to wait for individuals to manually contribute enough data, the map would never approach completeness. As such, they hoped to import municipal data about sidewalks, curb ramps, and crosswalks into OSM en masse. Through their engagement with OSM, however, they realized that some important segments of the community had reservations about doing bulk imports. OSM keeps a permanent record of every edit ever made to the map; if a mass import contained a large number of errors, then the organization would be stuck maintaining a map bloated with errors that would reside on their servers forever. OSM has, therefore, historically approached mass data imports very cautiously. In a sense then, the value of interoperability was in tension with the value of computational efficiency.
Ultimately, the AMOS team conceded to the need for computational efficiency in a couple ways. First, they decided to separate their proposal for a mass import of data from the rest of their proposal for the new pedestrian data schema, knowing that the former would likely be more controversial than the latter. And second, they altered their plan so that instead of bringing municipal pedestrian data into the map all at once, they would propose to import it in smaller batches, develop tools for human verification of that data, and organize volunteer sessions where groups of contributors would conduct this manual verification. In this way, they compromised on the value of interoperability to accommodate OSM's need for computational efficiency in the parsimonious use of server space.
Conclusions
How these decisions and choices played out are not haphazard; mapping them in the way that I have in Figure 1 shows something of a pattern or flow to the decision-making of the AccessMap/OpenSidewalks team. Looking at the periphery of this map, one can see that the value of interoperability yielded to the values of relevance and computational efficiency (Tradeoff #4 and #5), and both of those values in turn yielded to the value of intuitiveness (Tradeoff #1 and #3). Since each of those values are associated with the needs of distinct stakeholder groups, this also means that there was a bit of a hierarchy to the prioritization of different stakeholder needs. When it came to choices that could make their own work as software developers of AccessMap easier (Tradeoff #4 and #5), they opted to make decisions that would be technically more challenging for them to pull off, but more palatable to OSM and more beneficial for potential AccessMap users. The needs of both these stakeholders in turn deferred to the priority of producing a data schema that would be easy for OSM manual contributors to work with (Tradeoff #1 and #3). Turning to the inside of the map in Figure 1, we can also see that relevance for the potential users of AccessMap was compromised by interoperability for developers, computational efficiency for OSM, and intuitiveness for map contributors (Tradeoff#2). As a whole, then, this situational analysis shows that although the entire AccessMap/OpenSidewalks project began with a very specific group of target beneficiaries in mind—people with limited mobility—consideration for their particular needs was eclipsed by values more strongly associated with other stakeholder groups in multiple decisions about the representation of data.
However, given the way the AMOS team understands the sociomaterial nature of the problem they are addressing, these compromises can also be seen as moves that further their overarching ethical agenda. Owing to their perspective on the social model of disability, the AMOS team hoped to address disablement's social construction by designing a system that accommodates difference and diversity, rather than one that assumes the “normal” mode of travel is to get from A to B by the shortest time and distance. They sought to position the informational needs of people with limited mobility not as a special case or as a deviation from normal, but as a part of a diverse array that included numerous priorities and uses—even the ability to “optimize for delight.” From this perspective, it was ethically coherent for the team to privilege choices that would enable buy-in from a broader set of stakeholders, provide ample flexibility in the representation of data, and facilitate the widespread adoption of their proposed data schema. Yet, in conversations I had with several members of the AMOS team, they wanted to be sure that their work would not be portrayed as a perfect exemplar of ethical data science. From their perspective, there was no such thing as an ideal ethical solution to the technical choices they faced, only a set of imperfect trade-offs and compromises that represented the best they could do.
The role of ethical abduction
This process that the AMOS team followed—in which they intentionally and systematically reasoned their way to the best possible set of decisions while being informed by a particular sociomaterial worldview—is what I call “ethical abduction.” Abduction in logical inference entails tacking back and forth between
Ethical abduction is an alternative to principled data ethics
As noted at the outset of this paper, critics and scholars of data-intensive technologies have argued that the articulation of ethical principles is insufficient for bringing about ethical practices. The AMOS story offers an alternative to the prevalent “principled” approach, one that instead hinges on ….while we are pouring over our digest of the facts and are endeavoring to set them into order, it occurs to us that if we were to assume something to be true that we do not know to be true, these facts would arrange themselves luminously. That is abduction. (n.p.)
In other words, abduction is notable for being simultaneously theory dependent and theory generating. Similarly, the AMOS team, rather than being guided by a set of ethical principles, was guided first and foremost by the social model of disability—a sociomaterial theory about the recursive relationship between technology and the social world. This, in turn, informed their own practitioner theories about how the technology they were building could serve as a point of ethical intervention, and how the technical choices they faced at various junctures might impact their ethical goals.
What does this accomplish that a principled ethical approach does not? The ethical abduction approach is different because it relies on theories with explanatory power rather than principles with prescriptive mandates. Whereas it can be a dauntingly abstract task for practitioners to understand or map how ethical principles commonly articulated in this space—including transparency, fairness, non-maleficence, responsibility, and privacy (Jobin et al., 2019)—can be operationalized in the day-to-day tasks of data science, the social model of disability gave the team a causal explanation for how technology is implicated in the perpetuation or interruption of inequitable social arrangements. It helped them avoid the common shortcoming in technical practice of “failing to consider how social context is interlaced with technology in different forms” (Selbst et al., 2019). Ethical abduction requires an intellectual commitment to engaging with the complex relationships between theory and practice, structure and agency, historical interpretation and futuristic speculation. As such, a promising path forward for advancing ethical data-intensive technologies is the incorporation into data science pedagogy and practice various theories of sociomateriality as articulated by scholars in science and technology studies, media studies, communication studies, and other fields.
Ethical abduction is skilled labor
What the AccessMap/OpenSidewalks story shows us is that the deep integration of ethical abduction into data practices involves layers of skilled labor that are not typically ascribed to data-intensive practice, including intentional and prolonged engagement with relevant stakeholder communities. Team members did several rounds of user research to understand the concerns and preferences of different types of wheelchair users, went into the streets to observe the work of accessibility evaluators, drew deeply from their own experiences as caretakers and family members of people with limited mobility, and took time to understand the procedural norms and cultural values of the OSM community.
Although the whole team recognized how fundamentally important this labor was to their project, they nonetheless sometimes grumbled about all the time they had to spend “just talking” (as one participant put it), rather than producing tangible outcomes or practicing skills that are more widely recognized and rewarded in data science—including writing code, producing visualizations, and conducting analysis. Even empirical scholarship intending to surface the “invisible labor” that goes into data-intensive science (Scroggins and Pasquetto, 2020) can exclude the kind of qualitative, interpretive, and communicative work that was so central to the AMOS team's experience of developing a data infrastructure ethically. Ethical abduction should be acknowledged as a highly nuanced and sophisticated skill that cannot be instantaneously mastered or casually implemented; like other kinds of work involved in doing data science and developing data infrastructures, it requires training, mentorship, practice, time, and effort to be done well.
Ethical abduction is intellectual discovery and knowledge building
Certainly, just as critical data scholars have pointed out, data-intensive systems have the potential to become black boxes (Diakopoulos, 2013)—opaque veils behind which the cultural assumptions of a privileged digerati get reified and amplified. The AccessMap/OpenSidewalks project illustrates the truism that data are never neutral or objective (Bowker and Star, 2000). However, this is because the world they represent is neither neutral nor objective, and the work of the AMOS team shows us that this point need not be lost in the development of data infrastructures. As we saw in the team's deliberations, while a casual pedestrian can easily walk past a bench on the sidewalk without thinking about what that object means to individuals with diverse embodied experiences of the world, the AMOS team had to contemplate such a question when deciding if a bench should be counted as a barrier or as an amenity. Their deliberations over the representation of data threw into relief the inherently political nature of what is otherwise usually taken as a perfectly innocuous object intended only for rest and repose. This perspective is aligned with recent work by Abebe et al. (2020) that highlights how the necessity to “formalize” values in algorithmic design can “give people an opportunity to directly confront and contest the values these systems encode” (p. 254–255). In other words, it is sometimes possible for data practices to be sites not for the obfuscation of the value-laden, power-infused nature of the social world, but for its revelation. Understanding the process of reasoning that leads to such revelation as ethical abduction situates ethics in its rightful place. Ethics are too often considered to be something that must be gotten out of the way before the real intellectual work begins—getting a stamp of approval from an Institutional Review Board at the outset of a project, for example. Instead, the concept and process of ethical abduction places ethics squarely in the realm of ongoing intellectual discovery and knowledge production. It is an especially promising concept in the realm of data-intensive technologies and methods because abductive reasoning is acknowledged to be a mode of inquiry already widely deployed in data science approaches (Tanweer et al., 2021).
Enacting ethical abduction
If ethical abduction offers a promising path forward for integrating ethics into the day-to-day practices of data-intensive technologies, then how can that mode of reasoning be supported? It is important to acknowledge that the setting in which I observed the AccessMap/OpenSidewalks team was not typical of how data-intensive technologies and methods are developed. The Data Science for Social Good program is first and foremost an internship focused on student learning. Although operating within intense time constraints, the pedagogical environment meant there was latitude for contemplation, exploration, and iteration that may not be afforded when financial profit motives are in play. Nor was the AMOS team necessarily typical of data-intensive technology developers. They operated under the leadership of an academic computer scientist, but one who also had a degree in feminist studies and a child with disabilities—experiences that deeply informed her worldview and allowed her to introduce the social model of disability as an orienting theory for the team. Therefore, if we consider the entire landscape of data-intensive technology development, the AMOS project is something of an outlier. However, lessons learned from contexts situated outside the norm can generate provocations for unsettling and nudging more mainstream practices.
This brings me to my own role in leveraging the lessons learned from the AMOS project and the team's process of ethical abduction. I conducted fieldwork in the DSSG program during my doctoral studies, and as noted in the methods section, a part of my role as a participant-observer became developing a data ethics curriculum and approach to stakeholder engagement. These are responsibilities I volitionally took on and embraced more fully over time because I saw a number of well-intentioned teams either fail to recognize ethical issues in their work, struggle to change trajectories when they did become aware of ethical issues, or overlook opportunities to act on their ethical values when they arose. As such, I took it upon myself to develop ways of scaffolding critical reflection in the program. The organization valued these and other contributions to such an extent that they invited me to work for them full time and chair the DSSG program once I graduated, a role I have occupied since 2018.
In other words, over time I became increasingly involved in shaping the phenomenon that I started out largely just observing. I had the opportunity to study the AMOS project early in that trajectory, as I was first starting to think about how I might use my position as an ethnographer to benefit the community and impact data science practice. The team engaged in a stakeholder power mapping exercise that I introduced into the program in 2016, but aside from that activity, the rest of their process was of their own making. And as it turned out, their work would inform many of the changes I subsequently introduced to the program. For example, seeing how the AMOS team engaged with sociomaterial theory via the social model of disability, I began introducing the related concept of “sociotechnical thinking” through a workshop in the first week of the program, and bringing speakers from the fields of science and technology studies, feminist studies, critical race studies, and other critical traditions to regularly talk with the interns. When I observed other teams interact with stakeholders in a pro forma way—without preparing for the encounter, debriefing following it, or intentionally reflecting on how stakeholder perspectives should influence their work—I used the AMOS team's approach as a counter example and template for more robustly scaffolding that work across the program. Therefore, when telling the story of AMOS as an exemplar and advocating for the process of ethical abduction to be fostered in data practices more broadly, I am not just talking the talk but have started walking the walk. Yet it is important to recognize that robustly cultivating and supporting ethical abduction as a widespread norm in the development of data-intensive technologies is not something that will be accomplished through individual interventions alone. It will require systemic changes in the way we teach and incentivize practitioners to do this work.
