Abstract
Keywords
Introduction
Designed to extract patterns from huge amounts of data to make actionable predictions when confronted with new data, machine learning (ML) systems are making considerable inroads in society. 1 In particular, deep neural networks (DNNs) are now a state-of-the-art ML technology in myriad domains, including image recognition, machine translation, and automated trading (Kelleher, 2019). However, whereas the inner workings of some ML architectures, such as decision trees, are easily understood by experts, DNNs are generally associated with opacity: although people using DNN-based systems may value explainability highly (König et al., 2022), it is exceedingly difficult to establish, through “human styles of semantic interpretation” (Burrell, 2016: 3), how and why DNNs end up generating specific actionable predictions. This can be attributed to the fact that DNNs may contain more than a 100 hidden layers of nonlinear transformations, several hundred thousands of neurons, and millions of weights (Kelleher, 2019; Samek and Müller, 2019). The opacity of DNNs adds to wider concerns that ML systems may be perpetuating gender, racial, or other biases (e.g. Buolamwini and Gebru, 2018)—if the decision-making processes of DNNs are opaque, then any biases may be hard to identify and address. Against this backdrop, it has been argued that key progress concerning biases, fairness, and the trustworthiness of ML and DNN systems hinges on the ability to explain the actions and inner workings of these systems. In other words, explainability is the foundational challenge to be solved and constitutes the basic condition for progress in these dimensions (Doshi-Velez and Kim, 2018; Gilpin et al., 2019).
The opacity problem has sparked considerable computer science research on “explainable AI,” often abbreviated to XAI (e.g. Escalante et al., 2018; Samek et al., 2019). XAI research aims to devise tools that can render DNNs at least partially intelligible to humans. While such tools constitute valuable steps toward explainability, they remain tentative and incomplete (Gilpin et al., 2019). To push the field forward, scholars from the humanities and social sciences have argued that XAI research needs to incorporate insights from these to better capture what an explanation is and how it might be attained. For example, Eshan et al. (2021) call for considering XAI in light of social and organizational dimensions, just as Felzmann et al. (2019) draw on human–computer interaction research to argue for a relational conception of ML transparency that focuses on the nexus of technology providers and users, including its contextual dimensions. Similarly, Miller (2019) mobilizes insights from philosophy, psychology, and cognitive science to argue that explanations of ML systems would be significantly improved if framed in a way that takes seriously how people usually make and evaluate explanations; whereas Munk et al. (2022) argue for combining discussions of ML explainability with novel forms of “computational ethnography.” More generally, boyd and Crawford (2012) stress the importance of interpretation in algorithmic contexts.
Rather than proposing that explainability can be attained in this or that particular way, and rather than developing specific explainability tools, we suggest studying the mundane efforts ML experts and users take to reach some degree of DNN explainability. The ambition of such a “sociology of ML explainability” is twofold. The first is to render explainability an empirical object. Given the importance of ML explainability, any breakthroughs in this domain will likely affect the entire ML field in the future. In other words, it is conceivable that successful explainability models will shape the ways experts and users conceive of, deploy, and relate to these systems. Accordingly, it is important to understand and critically assess the emergence, evolution, and implications of such models, and scrutinizing empirically the efforts that go into creating such models is, therefore, an important first step. Second, through this empirical endeavor, a sociology of ML explainability can help to further understand how human–machine encounters take shape in the ML domain—including detailing similarities with and differences from human–machine encounters that do not involve ML systems. Accordingly, the key question we ask is, how do ML experts and/or users engage in human–machine encounters in order to explain the actionable predictions of ML systems? It follows that a sociology of ML explainability entails an analytical shift from an emphasis on the transparency of ML systems to an understanding of their inner workings (similarly, Ananny and Crawford, 2018).
In this paper, we present elements of what an explorative sociology of ML explainability might look like by analyzing how staff at a trading firm, Tyler Capital Limited (its management decided to waive anonymity), specializing in DNN-based automated trading, seek ways to explain their DNN system’s trading decisions by treating it as if it were a human person and by developing a caring relationship with it. Focusing on Tyler Capital’s work in this domain is analytically helpful because its staff members comprise ML experts and domain (market) experts—and in that sense, through its practitioner emphasis, merges an expert and user dimension.
One of the implications of our analysis of Tyler Capital is that it allows us to reassess the relationship between humans and ML systems. Whereas Fazi (2021) argues that this relationship is characterized by fundamental “incommensurability,” as humans have no way of meaningfully understanding ML systems such as DNNs, we propose to see this relationship as more empirically open. Here we follow large and diverse literature on the multiple ways humans form ties with machines and other nonhuman objects. This literature includes various incarnations of social studies of science (STS), human–machine interaction analytics, and social robotics studies (e.g. Alač et al., 2011; Jones, 2017; Knorr Cetina and Bruegger, 2002; van Oost and Reed, 2010; Suchman, 2007). Of particular importance to our discussion is Vertesi’s (2012) analysis of the ways NASA scientists and engineers engage in comprehensive collective forms of visualization and embodiment to interpret how their Mars robots perceive and operate; as well as research arguing that the ties humans establish with animals may serve as a relevant framework for conceiving of contemporary human–machine relationships. For example, Darling (2021)—drawing upon Haraway (2003)—demonstrates that humans can develop strong ties, or companionships, with robots and proposes to see such human–machine ties in the continuation of human–animal relationships. In line with such research, we argue for taking seriously the ways in which ML practitioners are basing their explainability efforts on establishing a caring, social relationship with their DNN systems, rather than perceiving a fundamental incommensurability between humans and DNNs.
We begin by fleshing out our proposal for a sociology of ML explainability. This includes situating the study of human–machine encounters in the ML domain vis-à-vis earlier studies of non-ML systems, to which end we particularly focus on the work by Vertesi. We further discuss what a sociology of ML explainability entails for algorithmic ethnography. Against this background, we present our methods and data. Next, through our case study of Tyler Capital, we demonstrate how this firm seeks to attain DNN explainability through a particular type of human–machine interaction. We discuss this effort considering the notion of human–machine companionship. The conclusion summarizes.
From XAI to a sociology of Ml explainability
In response to the challenge of understanding what happens within a DNN—that is, how and why particular predictions are made based on the data fed to the system—XAI research seeks to produce tools with which to enhance the interpretability and explainability of DNNs. Interpretability is often associated with the ability of humans to understand an ML system on a “global” level, referring to the extent to which “a person can contemplate the entire model at once” (Lipton, 2018: 40). Explainability, by contrast, concerns the “local” level of understanding the rationale behind specific ML decisions, something also referred to as “post hoc” explanations of individual decisions (Escalante et al., 2018; Lipton, 2018). Global interpretability is generally considered hard to attain for DNNs, whereas local explainability is more feasible and XAI research on it is extensive (Escalante et al., 2018; Samek et al., 2019).
Some XAI research starts from the assumption that an “explanation need not require knowing the flow of bits through a complex neural architecture—it may be much simpler, such as being able to identify to which input the model was most sensitive” (Doshi-Velez and Kim, 2018: 6). One method devised for such sensitivity analysis is visual heat maps or “saliency analyses” (e.g. Zeiler and Fergus, 2014). For example, some research into DNN-based self-driving cars produces heat maps demonstrating which parts of the car’s image stream (its view of the traffic situation) are attended to by the DNN (Kim and Canny, 2018). This helps explain which inputs affect the model output. Another approach seeks to produce local linear explanations by manipulating the input data and then testing the DNN on data that only contain some of the original input data (Ribeiro et al., 2016). Accordingly, the concept here is to identify the input regions that have the greatest impact on the output. Other approaches seek to incorporate human expertise. For example, some deep-driving models are combined with neural networks that learn what human drivers pay attention to while driving (so-called human gaze prediction; Palazzi et al., 2018). Based on these data, the idea is that the DNN can learn to mimic human drivers and use this learning to avoid attending to spurious input, thereby also improving explainability (Kim and Canny, 2018).
While these are all interesting tools and approaches, they face significant limitations (Kindermans et al., 2019). For example, sensitivity heat map visualizations tend to register singular input features, but disregard the role played by any relations between them. Moreover, saliency analyses have been criticized for “only uncover[ing] regions with high contributions for the final prediction, while the reasoning process still remains behind the scenes” (Atanasova et al., 2020: 7353). Similarly, attempts to incorporate human input are still in their infancy, and it remains a challenge to develop interfaces for user expertise systematically (Ras et al., 2018; Samek and Müller, 2019). Furthermore, existing XAI research has been criticized for being too siloed. Rather than combining tools, XAI scholars have tended to pursue individual approaches, each characterized by blind spots (Gilpin et al., 2019).
Instead of seeking to develop specific explainability tools, other scholars have tried to broaden the XAI debates by considering “
Extending this wider approach to explainability, and rather than taking specific XAI tools for granted, we argue that a broader “sociology of ML explainability” research agenda is needed, which examines practical attempts to attain some degree of ML explainability. We deliberately speak of degrees of explainability, as previous explainability work within XAI suggests that advances in this field are piecemeal and that the best one can often hope for are partial post hoc explanations. That said, we see a sociology of ML explainability as potentially examining all aspects of the practical work that goes into explaining the actionable predictions of ML systems—including every step from overall decisions concerning the design of the ML architecture to data curation, training, deployment, refinement, post hoc explainability attempts, and so on, thereby appreciating that decisions of relevance to explainability may be embedded in each step in the development, deployment, and use of ML systems (Ananny and Crawford, 2018).
What should a sociology of ML explainability focus on more specifically? Possible questions to be addressed include: What kinds of expertise are being mobilized in the explainability efforts (Collins and Evans, 2007)? What assumptions and “algorithmic imaginaries”—that is, “the ways in which people imagine, perceive and experience algorithms and what these imaginations make possible” (Bucher, 2017: 31)—are undergirding the explainability work? What folk theories, that is, the “intuitive, informal theories that individuals develop to explain the outcomes, effects, or consequences of technological systems,” are being mobilized (DeVito et al., 2017: 3165; see also Siles et al., 2020; Ytre-Arne and Moe, 2021)? How do humans (ML experts and/or users) interact with the ML systems to understand how these “think” (Burrell, 2016), sense, and operate, and how might these forms of human–machine interaction be fostered organizationally?
In this paper, we focus particular attention on human–machine interactions and how they form part of practical endeavors to explain post hoc the actionable predictions of ML systems. We zero in on human–machine interactions to demonstrate how a sociology of ML explainability may connect to broader STS scholarship. Of particular importance is Vertesi’s (2012) analysis of how NASA scientists engage in embodiment when seeking to understand how their Rover robots at Mars are visualizing the landscape. Vertesi shows that, when human scientists are trying to see like the robot, they twist their bodies and perform all sorts of physical gestures to put themselves in its place. In Vertesi's (2012: 397, 400) words, “they learn first and foremost to acquire the robot’s own native representation of Mars, as well as its own bodily orientation and apparatus,” and they do so to develop a “sensibility to what the [robot] might see, think, or feel, in relation to specific activities that must be planned.”
Vertesi makes two central points for the present discussion. The first is that the human–machine encounters she analyzes are not characterized by human primacy, in the sense that the robot is a mere extension of humans. Rather, if anything, it is the machine that has primacy, with the humans trying their best to understand it. Despite the NASA scientists talking about the robot as if it had human traits, Vertesi (2012: 400) therefore stresses that these human–machine encounters entail a form of “technomorphic” rather than anthropomorphic move, “in which team members take on the robot’s body and experiences as part of their practice and narrative of their work.” The second point is that this identification with the robot not only takes place as a deliberately designed collective, organizational effort that involves coordination locally and across geographies, but the “deeply emphatic relationship” individual NASA scientists have with the robots additionally makes them “into a collective team” (2012: 406). In other words, the humans unite in their shared relationship with the robots.
From the point of view of a sociology of ML explainability, Vertesi’s analysis is important because it draws attention to the ways in which machine interpretation efforts are tied to particular human–machine interactions where humans join forces to gauge and assess how the machine operates. We suggest extending the analysis of such encounters to the ML domain, which means detailing empirically how such interactions may manifest for different types of ML systems and applications.
As we will be providing such an examination in the context of financial markets, it is worth noting that the machine identification processes described by Vertesi resonate with Knorr Cetina and Bruegger’s (2002) work on how human traders active in (the pre-automated domain of) electronic trading engage the market through their computer screens. Instead of interacting face to face with known peers on a traditional exchange trading floor, the electronic screen traders studied by Knorr Cetina and Bruegger engage each other anonymously in an electronic market, manually placing orders on their computers. The social interaction emerging from this is not an interhuman one. Rather, it is a “postsocial” form of interaction, as traders primarily interact with an object, their screen. Moreover, the electronic screen traders followed by Knorr Cetina and Bruegger conceive of the market as an independent being, a collective entity whose action requires a particular relationship for its explanatory understanding and interpretation. These traders would specifically seek to understand and interpret the market they encounter on-screen by “experiencing, feeling, remembering and responding to the market by means of ‘identifying’ with it” (Knorr Cetina and Bruegger, 2002: 179). We show that similar identification processes may be at stake in the ML explainability domain.
A sociology of ML explainability that grants important attention to human–machine interactions entails a reorientation in how algorithmic systems are often ethnographically investigated (e.g. Christin, 2017; 2020; Lange et al., 2019; Seaver, 2017). As Christin (2020: 902) notes, owing to the opacity and proprietary nature of many ML systems, existing studies have mainly focused on the broader cultural and organizational structures that shape how algorithmic systems are built or on the everyday implications of these systems in various fields; only “few studies explicitly focus on the inner workings of algorithmic systems.” Christin (2020: 907) proposes to extend existing qualitative work and suggests a set of analytic tactics to this end, which are designed to “bypass algorithmic opacity.” In contrast, we call for empirical work that, instead of bypassing opacity, details the various ways in which humans seek to address ML opacity by engaging in practical explainability efforts.
As we mentioned earlier, ideally, this means examining all steps involved from ML architecture design, data curation, and training to post hoc explainability attempts. However, precisely because of the proprietary nature of most ML systems, we also appreciate that, in practice, it may be exceedingly difficult to follow empirically—or even trace after the fact—each step in the development and deployment of ML systems. Accordingly, specific sociology of ML explainability examinations will therefore likely be partial, making it important to extend to this field Hannerz's (2003: 213) observation that, ethnography (in the form of interviews and participant observation) “is an art of the possible, and it may be better to have some of it than none at all.”
Research context, methods, and data
We focus on securities trading because this is a field that has been thoroughly automated in the past decades (MacKenzie, 2021) with various kinds of ML systems, including DNNs, getting traction. However, as Hansen (2020; 2021) observes, many market professionals are concerned about ML opacity and therefore tend to prefer non-DNN architectures such as to retain some degree of model explainability—just as model simplicity is often supplemented by “human overlay,” that is, human judgement, which is exercised by traders in situations in which relying on ML systems is deemed too risky. That said, some firms do deploy hard-to-explain ML systems (Borch, 2022; Borch and Min, 2022). By focusing on a firm in which all trading strategies and decisions are generated by a DNN system, our analysis complements Hansen’s in two ways: (1) we discuss the deployment and explainability issues associated with complex DNNs that are generally believed to defy human understanding (the types of systems that Hansen’s key informants tend to avoid) and (2) by delving further into the interaction between humans and ML algorithms, we show that explainability might require more than model simplicity and human overlay—it may demand a distinctively social relationship between humans and machines.
Our empirical analysis of the quest for ML explainability revolves around a case study of Tyler Capital, a London-based proprietary trading firm. This firm focuses on high-frequency trading (HFT) in futures contracts on the Chicago Mercantile Exchange, although it is also active in other markets around the globe. Tyler Capital has a staff of around 50 members. While some automated trading firms are significantly larger, it is common to find firms with only 10–20 staff members. However, size is no clear indicator of importance and, as MacKenzie (2021: 5) notes, “even an HFT firm with no more than a few dozen employees can be a significant player.”
Since the firm was established in 2003, Tyler Capital has engaged in different forms of electronic and automated trading. In 2014, a new management team comprising Chief Executive Officer (CEO) Mike Bushore and Chief Technology Officer (CTO) Chris Donnan decided to revamp the business model fundamentally. As a result, Tyler Capital began to explore a more systematic approach focusing on just one trader: a DNN system that the firm has dubbed OPUS.
The firm is divided into functional teams that all sit in one big room, each employee facing a stack of screens. The DNN architecture of OPUS is designed by the Research Engineering Team, whose members have backgrounds in fields such as computer science and physics. This team is led by the CTO, who has long-term ML expertise from finance and beyond. The research engineers work closely with the Trading Team, which consists of experienced traders who monitor and seek to influence the DNN system’s behavior (more on this later), and the Production Team, which comprises engineers tasked with ensuring optimal operations and testing of the trading system. An Infrastructure Team provides computing resources for the in-house simulation systems and the infrastructure connecting the firm and its trading system to the markets. Other teams provide technological and managerial support for the firm and its employees.
We have been following the work of Tyler Capital since 2017 when the first author conducted interviews with an ML engineer at the firm and then with its CEO and CTO. This led to a qualitative case study in which the authors jointly conducted interviews at the firm from 2018 to 2019, followed by additional interviews done remotely in 2020 due to COVID-19 lockdowns and restrictions. While our data collection also included participant and nonparticipant observations related to the core functions in the firm (Borch and Min, 2022), such as following staff’s work and discussions, and having them demonstrate their tools to us—most of which were directly concerned with the DNN system—in this paper, we primarily draw upon 23 formal semi-structured interviews with 18 people from the firm, including the founder, executives, team leaders, and team members representing all parts and functions of the firm (Table 1). The interviews typically lasted for approximately one hour and were subsequently transcribed (with a total of 429 pages of transcripts), coded, and analyzed using NVivo. As in Hansen and Borch (2022), we followed an open coding approach, with codes including “explainable,” “understandable,” and “XAI.” We analyzed interview passage where these and similar themes were being addressed.
Summary of interviews with case firm.
ML: machine learning.
The interviews focused on the operation of the DNN system, its limitations, human–machine interactions, and, more generally, the organizational structure implemented in the firm to facilitate DNN-based trading. Because the CEO and CTO have been central to the move to DNN-based trading, with the CTO being responsible for the design of OPUS, we conducted repeated interviews with them. The CTO takes a strong interest in XAI; thus, we further spent almost a day off-site with him to discuss this and related topics. Finally, we were granted access to confidential internal documents that described the test procedures, financial performance, and strategic considerations, etc. (totaling 324 pages) of the firm.
As indicated, in this study, we follow Seaver's (2017: 7–8) suggestion to “treat interviews as fieldwork” and mainly draw on our interview data—meaning that our analysis is “interview-centric” (2017: 7). However, we used observational data and internal documents to cross-validate what we found in the interviews. Further cross-validation was ensured by the broad and iterative interviewing process we pursued where, for example, we would take up issues from one interview in later interviews with the same informant or in interviews with other informants. We shared paper drafts with the firm to secure respondent validation, and this led to the correction of some minor technical details.
Our fieldwork at Tyler Capital was embedded in a larger qualitative examination of the automated trading industry. From 2014 to 2016, the first author conducted 23 interviews in this industry (11 of which were done jointly with Ann-Christina Lange). Between 2017 and 2020, an additional 190 interviews were conducted by the authors and 4 colleagues (Kristian Bondo Hansen, Nicholas Skar-Gislinge, Pankaj Kumar, and Daniel Souleles), including those at Tyler Capital. We conducted interviews at trading firms, brokers, banks, hedge funds, exchanges, institutional investors, and regulators—with a particular focus on trading firms located in financial centers such as Chicago, New York, London, or Amsterdam. Following a common interview protocol, we interviewed informants about their backgrounds, algorithmic development, strategies, organizational design, and industry transformations, among others. However, in this study, we decided to not draw upon these data because we did not encounter DNN-specializing firms that took a similarly systematic approach to explainability as Tyler Capital. This does not mean that other firms are not concerned with explainability or they have not developed interesting approaches to attain it. Our interview data from other firms are just not sufficiently rich on this topic.
Explainability through human–machine interaction
Tyler Capital’s motivation for focusing exclusively on ML is that this is believed to provide the most effective and scalable form of automated trading. The firm’s explicit ambition is to create, in the CTO’s words, “the best trader in the world,” that is, a trading system that is superior to both human traders and end-to-end human-defined automated trading systems. In contrast to such systems, the central idea behind Tyler Capital’s DNN system, OPUS, is that, given some overall desired outcomes, it develops its trading rules independently. These rules are not defined by humans. Large amounts of market data (in the form of so-called “order book data,” that is, an exchange’s list of pending orders to buy or sell securities) are fed to OPUS, which then learns to make market predictions and devises its trading policies accordingly. So, rather than merely suggesting strategies that human traders might then enact, OPUS identifies and implements its trading strategies independently and in a fully automated fashion (there are no longer any human traders in the firm who make independent trades). In the words of the CTO: OPUS has the entire responsibility to take information in aggregate over years and come up with an action policy. So basically, all the humans can ever do is give OPUS information, incentives, and penalties, and it’s up to OPUS entirely to come up with the trading policy. We can’t do it. We can inform, we can enforce risk controls, like “you can’t put on more than 100k of margin,” right? Of course, we have controls that are of that form, but we don’t have rules where a human comes up with a rule. We try to capture their intuition as information representation. A wiggly line that OPUS gets to see, and then OPUS decides if that wiggly line is meaningful or not. So OPUS, on some level, gets to choose from the options that the people give it. That’s part of it. Also, we can give it sometimes quite broad options so that it can choose on its own, and it can just ignore what it doesn’t want.
The fact that OPUS is based on a complex DNN architecture associates it with opacity. This impression was shared in our first interview with the firm in 2017, in which one of its ML engineers compared OPUS to his previous work on automated trading systems where humans would conceive of the trading rules: In my past firm I could tell you exactly why that strategy was trading [as it did], because of what I fed into the algorithm […] whereas now I have no clue. […] Interpretability of the results [of DNNs], it doesn’t exist yet. (ML engineer)
However, since then, the firm has implemented a set of measures to attain some degree of explainability. The overarching concept the firm uses in this endeavor is that of “humanizing technology.” This concept, which informs the algorithmic imaginary at Tyler Capital, indicates a belief in the value of establishing a close social relationship between humans and OPUS, just as it demonstrates the firm’s penchant for treating the DNN system as a kind of embodied intelligence and for describing it in anthropomorphic language. In interviews and when following their work, staff and management repeatedly referred to OPUS as a “he” in discussions about, for example, “increasing his scale” or “increas[ing] the scope of what he does” (CTO).
Anthropomorphizing algorithms is common within automated trading. Borch and Lange (2017) observed that traders who develop automated trading systems based on human-defined rules often see their algorithms as extensions of themselves and even develop an emotional attachment to them, comparing them to children. Such traders subscribe to the idea of their algorithms directly and uniformly manifesting their human-defined rules. OPUS is regarded in much more dynamic terms. The CEO described it as “a living, breathing organism,” which, similar to human beings, is independent but can be socially influenced. In the words of the CEO, “we have a relationship with [OPUS], and we try to coach it” and provide it with “social cues.” A senior developer echoed this view: OPUS does have its own objective function. It takes what we have given it and it amplifies it to learn over time. So, if we don’t stay in close and continuous association with OPUS, if we don’t invest in that relationship, then, you know, it can outgrow us. We might then struggle to teach it new skills, because we try to give it some new signals and it just doesn’t think of it that way. […] So yes, we can think of it as a relationship that needs to be maintained.
That which is “given it” comes in several forms. One thing is, of course, data. Given that the quality of the data will affect the system’s actionable predictions, the firm is preoccupied with ensuring that the market data OPUS is fed is of high quality. When we followed the work of some of the firm’s data engineers, we observed how they took care to validate data across different sources, and though order books contain millions of daily messages, even single data points that could not be validated were a cause of concern. Formulated in explainability terms, a systematic care with the input data is a basic precondition for explaining eventual actional predictions.
We will nonetheless focus our attention elsewhere—namely, on the skills staff seeks to provide OPUS and the associated dynamic human–machine relationships. Thus, as indicated by the senior developer quoted above, OPUS is not merely seen as an anthropomorphic extension of the human staff; but nor can staff’s relationship with it be fully described as a technomorphic move, in which, as per Vertesi, it is all about putting oneself in the machine’s place. Rather, at Tyler Capital, the human–machine interaction contains elements of both, and this materializes not least in how staff seeks to provide OPUS with “senses” through which (a) it may learn to see things in the markets that human traders have learned to see and (b) humans can start interrogating why it arrived at its actionable predictions. We focus on those teams whose interaction with OPUS is most direct: the research engineers (ML experts) who have full access to OPUS as well as the production engineers and the trading team who monitor its behaviors.
Aligning human and machine senses for post hoc explainability
The reason Tyler Capital would like to be able to explain the inner workings of OPUS is simple. A fully automated DNN system might have learned to engage in trading behaviors that appear solid, profitable, and legitimate, but are in fact extremely risky, appear deceitful, or are some combination thereof. It follows that because such issues may manifest in multiple ways, explainability contains myriad aspects: Why did [OPUS] do what it did? Between one version and another version [OPUS’s neural weights are updated daily], what are its preference changes? How can I explain that? When it does something that I wish it didn’t do, what are the root causes? What was it paying attention to? (CTO)
Tyler Capital has developed a set of devices that helps explain OPUS’s actions, what guides its neural network-based decisions, and what “his intentions” are, as one senior developer put it. Consistent with the ambition of humanizing technology, these devices are envisioned as tools that provide the ML system new “senses”—new ways of perceiving the world. So, compared with the NASA scientists’ engagement with their Rover robots, Tyler Capital’s endeavor is not just focusing on “seeing like” the machine (Vertesi, 2012), but more on providing the machine with new ways of seeing or feeling the environment wherein it acts. However, as already indicated, in addition to permitting the machine to perceive the market in more sophisticated ways (i.e. the machine looking out), these senses or perception tools are also considered means with which to look inside its decision-making logics (i.e. looking into the machine). The intertwinement of these dimensions is conceived in two steps.
First, when the firm looks at OPUS “as having senses and try to give it either new senses or reframe information coming into its senses,” the aim is to increase its perception bandwidth, endowing it with “other ways to touch the world” (CTO). Consequently, Tyler Capital does not just feed the ML system huge amounts of data, it also continuously provides OPUS with new ways to relate to these data, thereby expanding its epistemological repertoire and providing it with a more granular understanding of the market. An example of this might be adding senses that, each in their own way, permit the ML system to perform more precise hedging. The CTO compared this to seeing infrared light: [Say,] in the room right now, there’s a pattern on the wall and you’re like, “I don’t see a pattern.” And I go, “Well, I’m not giving you that information.” If I can install an infrared eyeball on your head, then, of course, you can see this pattern that exists there. But if you don’t have the sense, you can’t see it. So therefore, you can’t become fit to it. So sometimes, the humans become aware of some piece of information that we don’t believe has been represented as a sense to OPUS. Therefore, we try to give it that sense so that it can then do more, right? So that it can then come up with a more nuanced behavior.
In the second step, the firm then seeks to elicit what OPUS attends to, or “experiences,” through tools that measure: what senses are most important to the AI [system], because then we can understand what he’s paying attention to and what he rates highly. […] Most of the explanation that we try to fixate on is understanding how OPUS is looking at the markets and how that’s changing over time. (CTO)
The devices developed and deployed to this end come in different forms. The most important one is a set of “OPUS Explain Tools” (internal document), including a so-called “Inspector” tool, which the firm uses to zero in post hoc on both single positions OPUS has in the market and entire portfolios it is trading. These tools can display specific hedging, risk allocation, and trading behaviors and, as was demonstrated to us in our fieldwork, staff use them to infer what the ML system paid attention to in specific situations and the patterns to which it responded.
By helping pinpoint what the ML system attended to, the Inspector tools constitute a functional equivalent to the XAI saliency analyses we discussed earlier. More than that, they are mobilized in simulation. This includes simulating and testing potential strategies against historical data (backtesting) and, once a strategy is believed to perform well against historical data, testing it against live data, but without actively trading on the markets. Our broader investigation of the automated trading industry suggests that doing these kinds of tests is standard for any kind of automated trading firm, and it serves to check the soundness of a potential strategy. But using the Inspector tools in simulation also serves the explainability effort: In addition to testing the DNN’s action in various scenarios, including counterfactual ones (how would OPUS act if conditions suddenly changed?), simulation is a means of scrutinizing the firm’s explanations of the DNN system’s action, for example, by examining the predictive power of particular explanations.
We have argued that the various senses provided to OPUS are a way of endowing it with new perception capacities aimed at improving its understanding of markets and consequently its trading skills. In addition, the tools developed to understand which senses matter the most to it are in effect the means through which staff elicit what the DNN attends to and experiences. By enlarging the repertoire of these tools, additional experiences can be obtained, analyzed, and compared. Accordingly, these tools become the window through which staff members obtain insights into the inner workings of the DNN system. As the CTO stated, “These tools wound up being how people […] understand OPUS […]. So, they become, in a way, synonymous with OPUS.”
Incorporating human market expertise in the quest for explainability
In contrast to explaining a DNN that is tasked with, for example, correctly identifying objects in digital photos, OPUS operates in a constantly changing market environment. This means that the elicitation of the system’s experiences through local saliency analyses needs to be combined with tools or approaches that account for contextual factors. In the words of Tyler Capital’s CTO, “another role of explainability is just understanding how the environment itself, unconditional upon OPUS, is changing. Or how is the market, as OPUS sees it, changing?” Taking such a broad ecological perspective serves several purposes: understanding the context in which the DNN system is acting may help better explain why the system acted as it did in particular situations, just as an understanding of the markets in which the DNN system is navigating may help appreciate, and correct, the blind spots it undoubtedly has—thereby also contributing to its reliability. As OPUS is learning from its market interaction and automatically adapting by changing its neural weights (which regulate its behaviors), understanding this adaption is important: If [… ] the market’s changed a bit and the system has adapted, you still want to make sure that it’s done something sane. So, we’ll block the new weights from being used until a human goes in there and says, “Okay, I actually can reason about what’s doing here.” (CTO)
By gradually gaining insights into how and why the DNN acts as it does in particular situations, humans simultaneously obtain a sense of the contexts in which the system can easily navigate (and, say, make profitable predictions) and where it might struggle. It is widely recognized that the predictive abilities of DNNs hinge on the set of data on which they are trained—a DNN system trained to excel at playing Go would not know how to move a single piece in chess. In a market setting, this limitation may manifest in multiple ways. Examples include significant one-off events such as Brexit or the March 2020 market turmoil that was caused by the COVID-19 pandemic, which saw prices plummet and volatility spike to unprecedented highs. Brexit is a unique event, and no past data could provide an ML system pointer about how to predict its market implications. As for COVID-19, while there have been previous pandemics, as they were few and their contexts differed considerably, they offer no reliable data for automated market prediction purposes. Of course, the entire rationale of ML systems—be it self-driving cars or trading algorithms—is that they can form predictions and make decisions in unknown territory. If ML systems would only perform well with training data, they would be of little use. However, the data on which they learn also constitute an epistemological barrier with substantial potential risk: There are limitations to how far beyond the training data an ML system can navigate.
Tyler Capital has designed an organizational layer that seeks to identify and address these knowledge limits by incorporating human (non-ML-based) domain expertise. Specifically, this layer consists of human traders with first-hand non-algorithmic trading experience, that is, traders who, using Knorr Cetina and Bruegger’s (2002: 179) words, have developed a “feeling for the market.” Here, a first-person feel has been cultivated through many years of identification with the market and by human traders putting themselves in its place. The team is led by a former pit and electronic screen trader with more than 20 years of trading experience (most of which is from electronic screen trading). However, at Tyler Capital, the Trading Team’s role is not to actively trade, but rather to help understand, mentor, and influence OPUS’s behaviors and “cover off [its] blind spots,” as the Head of Trading put it. Any such influence would be exercised through a collaboration between the ML experts and the human traders, with the former (a) converting the latter’s advice into an information representation (in the form of numbers) recognizable to OPUS and (b) introducing rewards or penalties to influence the system in the desired direction—while appreciating that, in the end, OPUS would have discretionary power over what pieces of advice it chooses to incorporate into its decision-making.
Actively bringing in human expertise might seem akin to the incorporation of human input into DNNs we described in our discussion of XAI. However, at Tyler Capital, human domain-specific expertise is mobilized more extensively. One dimension of this concerns the market environment in which OPUS trades. As markets are constantly evolving with new types of regulations being introduced, new transformations in market ecology occurring, as well as overall and market-specific movements and changes in sentiment taking place, human traders are attributed the important function of identifying features that may not be reflected in the data on which OPUS is trained. If, for example, the DNN’s training data reflect a high-volatility market regime and volatility suddenly drops, then what the system has learned may no longer be useful and the human traders’ experience may then be utilized in attempts to improve the system. The Head of Trading recounted a specific situation, demonstrating a similar point: I noticed that one of the more dominant contributions towards the firm’s P&L [profit and loss] was misbehaving. I just couldn’t understand why it wasn’t exhibiting the characteristics you wanna see in that certain portfolio. And that certain portfolio was used to trading stable curves—curves that are relatively flat and just basically flex slightly like that, so you trade the noise back and forth. And that’s good while it lasts, but when curves start to move because the plumbing underneath the market changes, or the direction of capital underneath, basically the sentiment in the market, changes for whatever reason, usually for technical reasons these days—because of the Federal Reserve or the central bank likes to play games—the technical reasons behind it will not be picked up in data prior to that. If it’s actually a new modality of injecting or subtracting liquidity, that’s not in the data ever. […] That’s where human trading experience can say, “we’ve got a problem here. Not necessarily a problem, but we need to change behavior.”
This touches upon another epistemological weakness of ML trading systems where human understanding is important, their lack of sensitivity to politics. “When politics and markets collide, politics usually wins” (Head of Trading). However, with its emphasis on order book data, OPUS is not geared to take stock of politically complex situations: One of the things OPUS, or machine learning, can’t do is see into the future, but you know as a human. You can tell when there’s potentially bad news in the newspaper. OPUS doesn’t read newspapers, and machines don’t read newspapers very well. They will in the future no doubt, or they probably do but not on a trading level just yet, and they can’t always work out the paths of where the market may go. So, humans tend to have a better idea of the fight and flight. (Head of Trading)
From an explainability perspective, four things are important about adding the human traders’ experiences. First, they provide an understanding of the market ecology wherein the DNN is operating. Second, this ecological perspective is utilized both in explanations of the DNN’s actions and, third, to identify its knowledge limits (those market situations in which the system may not have learned how to navigate). Fourth, the insights into these knowledge limits can then fuel attempts to address the DNN’s blind spots. Emphasizing the ecological perspective and the context in which the DNN is trading, the explainability focus is shifted from addressing the opacity of the DNN system to emphasizing dimensions where human experience is believed to trump ML and where features that are understood well by humans are then integrated into the DNN system. “So, from an Explainable AI perspective,” the CTO said: it’s like the reverse view. […] It’s about actually making up something that we can make OPUS see and then control, and then it’s already been explained because it was understood before OPUS understood it, right? […] If you’ve conceived of it and taught it to OPUS, you understand what it means.
This returns us to the point about adding senses that enable OPUS to see and evaluate things in the market that humans can see and evaluate. At Tyler Capital, this aspect of explainability-oriented human–machine interaction is not merely strategically organizationally fostered, but, echoing Vertesi’s analysis of NASA, it also generates unanticipated organizational effects. “Once the [explainability] tools are there,” the CTO said, “they establish an interaction pattern in the human space. We became very aware that the frame of the tool was conditioning the human behaviors.” Specifically, the firm realized that the explainability tools facilitated productive discussions between members of the Research Engineering and Trading teams: The tools enable both parties to gain insights into OPUS’s trading decisions and provide a context for interhuman conversations, across specializations and forms of expertise, about why it engaged in specific behaviors. More generally, Tyler Capital’s approach to their DNN system has led to a more fundamental reformatting of interaction patterns in the organization. This was expressed in one of our interviews with the CTO: What has surprised you most about OPUS? The way it’s changed the people space is my impulse response. […] There’s an AlphaGo’ism that reminds me of how we feel about OPUS in this regard. The language they use is that AlphaGo is changing the way they view the game. It’s teaching them something. We feel very much the same circle with OPUS, which is unexpected, right? [OPUS] seems like a technical thing, where you give it some data, it trades, and mmah. But the truth is, it’s a continuous discovery process and the more deeply we understand it, the more we change our ways of working with it. […] And this virtuous cycle, I didn’t expect that at all. So, initially, you had the idea that once OPUS was developed, you could just let it do its work and then there would no need to engage with it? I would have never said it that way. However, I perceive that was my latent expectation [laughs] and especially now I have a hard time identifying with that, because it seems silly.
So, according to this view, entertaining a dynamic human–machine relationship with the ML system to understand and explain it will affect both parties and make them adjust their behaviors as the relationship deepens—much like in other social relationships. Of course, attempts to teach OPUS things, even when based on human experience, may be unsuccessful, as the system can eventually disregard any human suggestions. Still, entertaining a caring relationship with the DNN, assuming its role, and trying to elicit and put an ecological perspective on its experiences are not just a means through which to explain its inner workings and decision-making logics; it is also a way of increasing the success of human cues. In the words of the CTO, “To influence OPUS, you have to know him.” If the DNN system is poorly understood, if its independent trading policies are not explainable, then the likelihood of being able to influence it in any preferred direction is slim.
Human–machine companionship
How can we make sense of Tyler Capital’s attempt to explain the actionable predictions of its DNN automated trading system? We have argued that this form of human–machine interaction is one that has both anthropomorphic and technomorphic elements, where on the one hand, the ML system is endowed with human-like abilities to navigate in the markets and, on the other, humans are putting themselves in its place to understand how it arrives at its actionable predictions. Although anthropomorphic conceptions of AI and ML systems have faced critique (e.g. Salles et al., 2020; Watson, 2019), we follow Haraway (2003) and Vertesi (2012) in arguing that, in this specific case, the key question is not whether the human–machine interactions are based in part on anthropomorphizing language. Rather, what is important is that Tyler Capital’s explainability efforts espouse not just a form of postsocial relationship, but even a kind of
Conclusion
Given the widespread use of opaque DNN-based systems in contemporary society, research on how to explain their inner workings and decision-making logics is critically important. In fact, explainability research is too important to be left to the computer/data scientists and ML engineers populating the field of XAI research. We therefore not only welcome attempts to embed XAI research in broader humanities and social science discussions, but also call for sociological studies of ML that empirically and theoretically discuss practical attempts to attain DNN explainability, both within academic XAI research and beyond. To this end, we presented an empirical analysis of a DNN-based automated trading firm and its quest for explainability. We argued that their particular human–machine interaction approach merges anthropomorphic and technomorphic components in a companionship model, in which human staff seeks to explain the ML system’s actionable predictions by engaging in a caring relationship with it.
That firms such as Tyler Capital promote a companionship model does not entail that every human–machine companionship in fact enhances explainability. Just as interhuman forms of sociality can assume different degrees of depth and intensity, some manifestations of cross-species sociality among humans and ML systems may be shallower than others, contributing less to ML explainability than what might be hoped for. According to the CTO of Tyler Capital, their endeavors have contributed positively to the goal of achieving explainability. Comparing OPUS to automated trading systems based on human-defined rules, he said, “I would assert that we’ve been generally better able to explain our system than many of those sorts of systems.” Unfortunately, this is difficult for us to validate, given the data available to us. However, while it would be important to know, from an XAI perspective, whether particular forms of human–machine companionship could be seen as, say, a necessary or sufficient condition for explainability, this is only of secondary importance to the sociology of ML explainability research agenda we have proposed in this paper. As this research agenda is empirically open and committed to exploring the myriad practical ways in which experts and users seek to attain ML explainability, it might turn out that human–machine companionships constitute just one type of explainability approach among others, some of which might be superior to it when it comes to attaining actual ML explainability.
Regardless of the actual level of explainability achieved, we maintain that the practical efforts to explain ML systems are an important object of sociological study. Not only is ML explainability important in its own right, but as our analysis of Tyler Capital suggested, the quest for explainability might also lead to new ways of working with ML systems. In other words, it is conceivable that, for both experts and users, attempts to explain ML systems will change the ways they subsequently engage with these systems. Given this, we see the sociology of ML explainability research we have proposed here as an important subset of computational social science, which—instead of deploying seemingly proven templates of computational methods, including ML techniques, without explaining actual underlying analytical processes—examines sociologically the ways in which the design and deployment of ML systems and the human–machine interactions they involve form part of, and are dynamically changed by, more or less rigorous quests for explainability.
Going forward, we see two types of inquiry being particularly important to a sociology of ML explainability. First, as other firms and institutions might approach explainability in alternative ways, there is an important comparative exercise ahead in shedding light on how ML experts and users seek to attain explainability in different domains. Second, whereas we focused attention on the role played by human–machine interactions for post hoc explainability efforts, explainability may involve other types of concerns along the chain from the design of the ML architecture to data cleaning, training, and so on, and these too call for empirical analysis. Both types of inquiry would help to situate the companionship model presented here in the broader landscape of existing and emerging ML explainability efforts.
