Abstract
Introduction
As machine learning techniques are taken up in an ever-wider array of sectors for decision-making and decision-support, many have pointed to harms that might result from their careless or malicious implementation. Some harms surround fairness, as it proves to be difficult to make systems that do not exhibit bias, indirectly or in subsets of data (Hajian, 2013; Kamiran et al., 2012; Barocas and Selbst, 2016). These are nested within a range of linked concerns, including algorithmic transparency and accountability (Burrell, 2016; Keats Citron and Pasquale, 2014; Kroll et al., 2016; Nissenbaum, 1996), in-the-wild reliability (Žliobaitė et al., 2016); security against adversaries (Huang et al., 2011; McDaniel et al., 2016); entrenchment of inequality (boyd and Crawford, 2012; Harcourt, 2006); risks to privacy and due process (Hildebrandt and Gutwirth, 2008); and the enablement of ambient, ubiquitous surveillance systems (Hildebrandt, 2015; Kitchin and Dodge, 2011). These have mobilised a wide array of researchers and practitioners to consider how these technologies can be utilised whilst minimising the pitfalls and risks that might accompany them.
This paper focuses on how fairness and discrimination in machine learning systems can be mitigated within practical institutional constraints. Machine learning systems, which identify and utilise patterns in data, are designed to discriminate. We use these systems to distinguish data points from each other based on certain predictive characteristics. Some forms of discrimination however are considered unacceptable (Hellman, 2008). Legally ‘protected characteristics’, usually including disability, race, sexuality, gender, pregnancy, among others, are broadly illegal to use in most decision-making. These are not set in stone. For example, while single-sex sports clubs, toilets, or specific types of job advert (e.g. modelling) are usually not illegal, acceptance of them is changing. Discrimination usually also requires cases to be otherwise comparable. In some situations, sex might not be considered discriminatory where decisions hinge on differences in statistical life expectancies (Berendt and Preibusch, 2014).
Other bars for measuring fairness are less universal. Judging based on appearance; on events that occurred some time ago; on limited data; on actions an individual has already been sanctioned for, or in conditions of high uncertainty and rapid change, is sometimes acceptable, sometimes not. Judging based on arbitrary characteristics, like favouring those who access online forms with custom web browsers (Pinsker, 2015), might also seem unfair, perhaps because of the opportunistic short-lived nature of such correlations as well as the associated ways it might discriminate against those accessing forms from schools, or from libraries.
Some sources of unfair machine learning systems
There are several interacting ways that deployment of machine learning can potentially lead to unfair or discriminatory outcomes.
Unfairness in data, their collection and their processing
Many of the fairness issues in machine learning are primarily thought to arise from data. Some think, falling for what could be called the ‘neutrality fallacy’, that machine learning will provide a more even and objective treatment of individuals (Sandvig, 2015). As Latour indicates, we are often more than happy to declare value-laden issues as matters of fact, and let machines settle them for us (1999). This is rarely appropriate.
The high demand for labelled data in the context of supervised machine learning – the focus of this paper – can usually only be met by using data from previous decision-making. If these historical data reflect existing, unwanted discrimination in society, the model that is learned from it – essentially a similarity engine – will likely encode these same patterns, risking reproduction of past disparities. Machine learning algorithms are supposed to discriminate between data points – that is why we use them – yet some logics of discrimination, even if predictively valid, are not societally acceptable.
Furthermore, if some sub-groups are historically undersampled, or exhibit more complicated, nuanced or under-evidenced patterns compared to others, models might exhibit differential performance. It is not practically possible to have data on all individuals, quantifying or classifying all factors important to some social phenomenon. People, or aspects of their lives, are always missing. These skews fast make their way into data-driven systems.
Data are often also cleaned and transformed before use, in subjective ways. ‘Feature engineering’, where input variables are transformed to make them more amenable to modelling, has crucial downstream impact on the behaviour of machine learning systems. Feature engineering emphasises aspects of certain variables through augmentation, aggregation and summarisation of characteristics whilst downplaying others. For instance, aggregating those who subscribe to different branches of a religious doctrine (e.g. Catholic, Protestant; Shia, Sunni) within a single overarching doctrine (Christian, Muslim) might collapse distinctions which are highly relevant to questions of fairness and discrimination within certain contexts. Including a standard deviation of a characteristic as an input variable will make it easier for a machine learning model to emphasise divergence from a constructed average. As with many issues in machine learning, the political nature of this classifying and sorting has long been recognised (Bowker and Star, 1999). Categorisation does not just label people, it can create groups and alter future outcomes (Hacking, 1995; Harcourt, 2006), just as feature engineering can in machine learning (Rouvroy, 2011).
Unfairness from selecting and specifying a machine learning system
Humans carry their worldviews and make value-laden choices, with both foreseeable and unforeseeable consequences, during the whole modelling process. While machine learning is often portrayed as automated, a great deal of subjective human labour is involved in system design and deployment. Model choice itself can be political. Neural networks or random forests are more amenable to capturing synergy between variables than linear regression. Use of regression might omit important contextual variance, for example. Within a model family, further hyperparameters must be specified. Higher regularisation parameters penalise complexity in a model, which might help it generalise but might trade-off for certain complicated or rare patterns not being retained. Different evaluation mechanisms for models emphasise different aspects of performance (Japkowicz and Shah, 2011). Unfortunately, ‘neutral’ choices in machine learning systems do not exist – candidates for these, such as software defaults, are best thought of as arbitrary.
Finally, once a model has been built, there are various ways it can be deployed in practice which may introduce additional fairness issues. The extent to which a model may have different impacts on different groups may only become evident once that model is put into a decision-making system; for instance, the setting of thresholds for positive and negative outcomes could have significant consequences for different groups which may not be evident by merely studying the model itself. The introduction of an algorithmic system may also provide spurious justification for decisions which would otherwise have been more open to challenge under a purely human decision-making process (Skitka et al., 1999).
As with any sociotechnical, value-laden problem, we cannot expect to find simple or universal panaceas. We are stuck with layered, messy techniques to define, resolve and manage these complex challenges. This paper zooms in to examine one piece of this challenge – how potentially unfair patterns in datasets that make their way into modelling and decision-making processes might be remedied in practical rather than theoretical machine learning situations. We emphasise situations where actors designing and deploying such systems wish to avoid bias themselves, for regulatory and reputation-related reasons, rather than adversarial situations where external investigators wish to discover bias against the will of the organisations undertaking analysis. Legislative discussion within a European context of the ability to investigate algorithmic systems can be found in Edwards and Veale (2017).
Can we statistically ‘debias’ data and algorithms?
Computational techniques to prevent machine learning methods from perpetuating these forms of bias have been proposed in recent years by research communities such as discrimination-aware data mining (DADM) and fairness, accountability and transparency in machine learning (FATML). They involve altering usual data science processes in order to correct these forms of bias. They can operate at several stages, including pre-processing, in-processing and post-processing (Hajian and Domingo-Ferrer, 2013). In each case, the aim is to induce patterns that do not lead to discriminatory decisions despite the possibility of biases in the training data.
Anti-discrimination law has particularly motivated DADM and FATML communities, who have attempted to formalise these requirements for mathematical implementation. For instance, heuristics such as the US Equal Employment Opportunity Commission’s ‘80% rule’, which provides a suggested level of permissible disparity between protected groups and the general population, have been used to set parameters for fairness-aware models (Feldman et al., 2015). Within European contexts, non-discrimination and data protection are rights enshrined in the EU Charter of Fundamental Rights, and both potentially relate to the risks of unfairness inherent in machine learning applications (Gellert et al., 2013). Recital 71 of the EU General Data Protection Regulation (GDPR) refers in particular to fairness-aware data mining technologies and organisational measures.
There are multiple ways to define fairness formally in machine learning contexts. Most measures focus on differences in treatment between protected and non-protected groups, but there are multiple ways to measure differences in outcomes. These include: ‘disparate impact’ or ‘statistical/demographic parity’, which considers classification rates between groups; 1 ‘accuracy equity’, which considers the overall accuracy of a predictive model for each group (Angwin et al., 2016; Dieterich et al., 2016); ‘conditional accuracy equity’, which considers the accuracy of a predictive model for each group, conditional on their predicted class (Dieterich et al., 2016); ‘equality of opportunity’, which considers whether each group is equally likely to be predicted a desirable outcome given the actual base rates for that group (Hardt et al., 2016); and ‘disparate mistreatment’, a corollary which considers differences in false positive rates between groups (Zafar et al., 2016). Other measures focus not just on actual outcomes and their relation to true/false positives/negatives, but on counterfactual scenarios wherein members of the protected groups are instead members of the non-protected group (i.e. a woman classified by the system should get the same classification she would have done had she been a man) (Kusner et al., 2017).
Each of these measures of fairness are arguably reasonable ways to measure fairness. One might therefore hope that a fair system would satisfy all of these constraints. But unfortunately, recent work has formally proven that it is impossible for a model to satisfy several of these constraints at the same time, except in exceptional cases which are unlikely to hold in the real world (Berk et al., 2017; Chouldechova, 2017; Kleinberg et al., 2016). As a result, choices between the different measures will have to be made. In some cases it may be more important to focus on differences between positive classifications (e.g. loan applications), and therefore an ‘equality of opportunity’ measure might be preferable; in others, the cost of a false negative might be higher (e.g. the risk a violent criminal might pose to the public). Thus the choice of a particular fairness measure therefore ought to be sensitive to the context.
Setting aside these definitional problems, fairness-aware machine learning techniques are increasingly seen as desirable, viable and even in some cases legally recommended or required. However, an important challenge remains. To be successful, these techniques depend on knowledge about the potential correlations between features in the training data and protected characteristics that are the subject of anti-discrimination and data protection law. In practice, this is a condition that is either not always met, or not always desirable to meet.
Why knowledge of protected characteristics is both necessary and problematic?
To see why knowledge of protected characteristics is necessary, it is helpful to consider why certain naïve approaches to removing bias from modelling are inadequate. One could simply delete any sensitive variables related to discrimination, e.g. age, gender, race, or religion, from the training data. Unfortunately, this does not guarantee non-discrimination in the models that are trained on this data, as non-discriminatory items might exist which in some conditions are closely correlated with the sensitive attributes. Where geography serves as a sensitive proxy, this phenomenon is termed ‘redlining’. More broadly, it can be seen as an issue of redundant encoding.
In order to discover redlining in training data, one needs to be able to find out whether sensitive attributes might be encoded by other, apparently benign ones. For instance, to discover whether ZIP codes in a dataset are correlated with, e.g. race, it will be necessary to either have race as an attribute in the dataset, or to have background knowledge about the demographics of the areas in question (for instance, from census records). Proposed approaches to non-discriminatory machine learning assume that whoever is implementing the technique has access to the sensitive attributes which might be encoded (e.g. Hajian and Domingo-Ferrer, 2013; Hardt et al., 2016). Such access is necessary for assurance of computationally non-discriminatory models (Žliobaitė and Custers, 2016).
Despite this, in many cases organisations deploying machine learning will lack this necessary access, often for legitimate reasons.
First, the collection of personal data inevitably creates privacy risks. Many organisations have internalised the dictum of regulators and privacy advocates only to collect data that is necessary for their purposes. The concepts of data minimisation and purpose limitation within the GDPR are intended to prevent collection and processing of data for unspecified or disproportionate ends. Furthermore, the kinds of protected characteristics involved in cases of discrimination raise higher privacy and data protection risks than other kinds of data, and are given special protection under both the GDPR and other laws (Edwards and Veale, 2017). The proposition that organisations ought to collect a wide range of sensitive data that isn’t directly necessary for their primary purposes contradicts this general dictum. Yet fairness-aware machine learning seems to require organisations to do exactly that to adequately inspect and modify their models. 2
It is not our aim here to analyse the extent to which privacy and data protection law and best practice is substantively in conflict with the collection and processing of sensitive attributes for the purposes of fairness-aware machine learning. 3 It may be that collection and processing for such purposes is legitimate; however, it may still not be desirable. It would require data subjects to share sensitive attributes along with non-sensitive ones every time their data was to be used to train a model. The general result would be much more sensitive data in the hands of data controllers – a security risk even if it is intended to be used for the legitimate purposes of avoiding discriminatory outcomes. Even if organisations are permitted to collect and process such data, requiring consumers to provide it might make their service less competitive, or less trusted. For purposes of building a model that serves some narrowly prescribed goal, they may not see the need to collect sensitive data. In the context of data minimisation, the data controller must argue that it is proportionate to collect and process sensitive categories of data, and they may not be sufficiently incentivised to do so. Where individuals fear they are being treated unfairly, the collection of sensitive data by the organisation in question, even to explicitly remedy fairness issues, might not alleviate that perception-based fear. It could even make it worse.
Some approaches have been proposed to transform training data with anonymisation procedures to protect the sensitive attributes. This can be performed in tandem with pre-processing techniques to prevent discrimination (Hajian et al., 2014; Hajian and Domingo-Ferrer, 2012). While promising, this still mandates the comprehensive collection of sensitive attributes from individuals in training data for each form of discrimination for which mitigation is desired. Despite meaningful privacy protections, the concerns raised above are still likely to apply. Individuals are unlikely to be happy providing a comprehensive range of sensitive personal data to the very organisations who are in position to discriminate, no matter how technically robust their anonymisation process is.
Three approaches for appraising and improving fairness with limited data
Organisations developing learning systems need strategies to mitigate discrimination concerns in the absence of sensitive data. The challenge is to implement the techniques, such as those outlined above, without having to take on the additional burden and risk of collecting detailed sensitive data on the training sample.
We present three alternative approaches to overcome this challenge. The first is based on a multi-party data governance model, suited to contexts where little background knowledge about discrimination exists and a comprehensive assessment of potential forms of discrimination is needed. The second involves a collaborative knowledge sharing approach in which organisations can learn from each other’s experiences in similar contexts as well as relevant sociological and demographic correlations. The third involves exploratory analysis to build hypotheses of potential unfair characteristics of the data or system, which can be more formally tested as part of a due diligence process. Figure 1 pictographically illustrates these three distinct approaches.
Three approaches to fairness-aware machine learning without holding sensitive characteristics.
We do not argue that these three methods are perfect, nor that they provide complete solutions or assurances to the multitude of challenges surrounding machine learning systems. We argue instead that these are avenues that are important to explore to make fairer machine learning a practical reality in the multitude of settings that automated and semi-automated decisions will be occurring in our society in the coming years and decades.
Trusted third parties holding protected characteristics
Various proposals have been made for the involvement of external parties in the evaluation and auditing of algorithmic systems (Mantelero, 2016; Pasquale, 2010; Sandvig et al., 2014; Tutt, 2016). Some of these are reflected in law. Article 35 of the GDPR obliges organisations to undertake ‘data protection impact assessments’ wherever ‘profiling’ is used to automatically make decisions which have legal or significant effects on data subjects. In some cases these assessments may be audited by a data protection authority (Recital 84). In most governance approaches, external auditors are given access to an organisation’s policies, personnel, data collection procedures, training data, models, proprietary code, and other relevant aspects, in order to assess the ethical dimensions and legal compliance of a particular algorithmic system (see Binns, 2017).
This model assumes that the relevant information required to perform an audit will lie in the hands of the organisation being audited. As argued above, this might not be the case, rendering external audit process incapable of ensuring the kinds of algorithmic fairness that DADM and FATML techniques aim for.
This might be different, were trusted third parties enlisted to work alongside organisations from when data collection begins. This proposal could be achieved with a variety of different institutional and technical arrangements. Below, we illustrate several possible implementations.
The first party (the organisation implementing the algorithmic decision-making system) has access to historical data relevant to the classification or prediction task for which they are building a model. However, the first party does not and should not have access to any of the protected characteristics associated with the population used to train the model.
As discussed above, in order to statistically test the model for potential discrimination, the protected characteristics need to be linked somehow to the records used in the training data. To achieve this, a trusted third party is enlisted to collect data on the protected characteristics of those individuals whose data is used to train the model. For each individual, protected characteristics like race, gender, religious beliefs or health status are collected by the third party in parallel to the collection of the non-protected characteristics by the third party. The channel for communicating this information from the individual to the third party may depend on the platform (e.g. online, telephone, or in-person). It could be as part of a separate collection process, although this prove unwieldy, or be encrypted simultaneously and seamlessly at the point of collection (e.g. locally through JavaScript in a web browser 4 ) with the public key of a third party, and transmitted to the organisation in question. 5
Consider the following illustrative example: An insurer wishes to use a machine learning model to help determine customers’ premiums. They have access to historical customer, and use it to train a model to predict the amount of compensation a customer will claim over the term of their cover given certain attributes (e.g. postcode, occupation, qualifications). The estimated size of a potential claim – the output of the model – is used to automatically set premiums.
Based on this multi-party data governance model, there are multiple ways to proceed, depending on whether the goal is merely to detect bias or to both detect and prevent it, and what prevention techniques will be used (e.g. pre-processing, in-processing, or post-processing). We outline a set of possible variations here, and discuss their relative advantages and drawbacks.
Variation 1: Third party as ex post disparate impact detector
In cases where the third party’s only role is to detect discrimination (but not prevent it), the third party need only collect protected characteristics from each individual featured in the dataset used to train (and test) the model, along with an identifier. The records held by the first party for the purposes of model training could be linked by this identifier to the records held by the third party which contain the protected characteristics. The third party would be given access to the model developed by the first party (either directly or via an application programming interface (API)). By testing the outputs of the model on each of the individuals in their sensitive attribute dataset (using the individual’s identifier), the third party could detect disparate impacts.
An advantage of this variation is that the third party can only access the sensitive attributes, not the potentially non-sensitive ones. Since each record only contains sensitive attributes and an identifier this represents a lesser privacy risk; while the data itself is sensitive, it would be harder to re-identify an individual without other data types. This may also be beneficial from the perspective of a first party concerned about keeping their proprietary model secret, as it has been shown that unlimited access to a query interface for a prediction model can allow an attacker to extract and reconstruct a model (Tramèr et al., 2016). In this case, while the third party would have unrestricted ability to query the model by individual identifiers, and thus learn the distributions of outputs for each protected characteristic, they would not be able to reverse-engineer the model without access to the other, non-protected characteristics.
The disadvantage of this variation is that it only provides the first party with evidence of the disparate impact of their model. Disparate impact is a blunt measure of discrimination, because some disparities may be ‘explicable’, in the sense that the disparities might be accountable by reference to attributes which are legitimate grounds for differential treatment (Zafar et al., 2016; Žliobaitė et al., 2011). Furthermore, measures of disparate impact may not be sufficient for the first party to actually change their model to prevent it from being discriminatory. For instance, to remove bias from the training data, the first party would have to know which data points to relabel, massage or re-weight – i.e. the protected characteristics of the specific individuals, which they would lack. More generally, without the ability to check for redundant encoding of protected characteristics by non-protected attributes, it will be difficult for the first party to revise their model.
Nevertheless, the mere ability to detect disparate impact may be valuable in allowing third parties to flag up problems, which can then be dealt with by allowing the first party access to the necessary additional data to investigate and transform their model accordingly. Separating out detection of disparate impact and prevention could thus prevent unnecessary sharing of sensitive attributes and enable the third party to perform continuous monitoring.
Variation 2: Third party as ex ante discrimination mitigator
Alternatively, the third party could collect both the protected attributes and the other features used to train the model. This would enable the third party to play a more significant role, not only detecting disparate impact in model outputs but also helping to ensure the disparities are attributable to disparate mistreatment (i.e. that they are not explainable), and also to ensure that the model can be bias-free.
Third party as redlining detector
In this approach, the third party has both the sensitive and potentially non-sensitive characteristics, and puts them through a common framework to produce summary information that aims to flag obvious issues that might occur during model building. Upon acquisition of a cleaned dataset, the third party calculates and returns a set of redundant encodings and their strengths. The returning document might note that ‘race is correlated to zip code by 0.8’; ‘gender is correlated with aspects of profession by 0.2’, and so on. The first party could use this knowledge to make trade-offs in the model – removing certain features, or engaging in further discussions with the third party about potential procedures to scrub unwanted correlations from a model.
Naturally, such a framework could suffer from flaws which made it unsuitable for some types of data or problems, particularly highly contextual ones. Yet this approach would create a focal point for the improvement of discrimination detection methods for certain contexts and data types, which would foster active discussion and debate about best practices and processes that could be translated into on-the-ground practice with relative ease.
Third party as data pre-processor
Another approach would see the third party pre-process the training data in such a way as to preserve anonymity and remove bias, before handing it over to the first party. This could be achieved by modifying the data to preserve degrees of anonymity (using techniques such as statistical disclosure control (Hundepool et al., 2012; Willenborg and de Waal, 2012), and privacy-preserving data mining (Agrawal and Srikant, 2000), which allow the statistical properties of the data to be maintained), followed by applying one of a range of anti-biasing techniques described in the DADM/FATML literatures (e.g. Feldman et al., 2015; Hajian and Domingo-Ferrer, 2013; Kamiran et al., 2012).
6
It would even be possible, if it were desired, to introduce
The advantage of this variation is that the first party can develop whatever kind of model they like, without the risk of it learning biases from the training data. It also limits the involvement of the third party to a single step, after which the data could be deleted. Finally, it encourages the development of expertise on the part of the specialist third party and doesn’t require the first party to have in-house knowledge about fairness-aware machine learning. The disadvantage of this approach is that the anonymisation techniques only provide a degree of (quantifiable) anonymity. There is a clear trade-off between degrees of anonymity and utility of the dataset (Loukides and Shao, 2008), such that useful datasets will still likely carry re-identification risks. To the extent that such risks persist, the first party could learn more about individuals’ sensitive characteristics in this variation than it could in the other variations.
Who could act as a third party?
We have thus far assumed the existence of a suitable trusted third party, but it is worth considering what kinds of organisations might fulfil this role. This will likely depend on which of the variations are adopted. Each might pose different requirements of trustworthiness, technical expertise and incentivisation. In the case of a third party whose role is merely to detect disparate impact, relatively little technical expertise would be required, making it suitable for organisations with fewer resources and technical skills. The fact that disparate impact is already the focus of many civil society groups’ research activities may make them well situated to take on this role. Many potentially affected minority groups already have active representatives who could benefit from more formal auditing roles. Depending on the application context, it may be appropriate to involve different organisations; for instance, trade unions might be more equipped to address the fairness of algorithmic models deployed in human resources decisions.
If the third party is expected to be an ex ante discrimination mitigator, they will require more data collection and particular expertise in fairness-aware techniques. It may therefore need to be a specialist organisation, potentially working in collaboration with appropriate civil society organisations. It could be anticipated that consultancy or accountancy firms might provide these services to corporate clients, as they do with other forms of social auditing. 7
Another option might be statutory or chartered bodies whose remit includes monitoring discrimination, promoting equality, or enforcing law. For instance, the Equality and Human Rights Commission in the UK, or the Equal Employment Opportunity Commission in the US, are statutory bodies responsible for enforcing equalities laws. While traditionally involved in reviewing of individual cases for litigation, providing legal assistance and intervening in proceedings, these bodies could also take on more ongoing, data-driven monitoring of data-driven discrimination. Bodies more linked to data governance might help here too, such as the
Knowledge bases about fairness in data and models
Experiential knowledge concerning the construction or attempted construction of ethical algorithmic systems has been largely neglected in the DADM and FATML communities. This has created a not insignificant knowledge gap that we believe has problematic consequences on-the-ground. This neglect is surprising for several reasons.
As data governance tools move increasingly towards ex ante prevention and anticipation of harms, particularly through data protection and privacy impact assessments (Binns, 2017; Wright and de Hert, 2012), relying solely on in-data analysis of unfairness appears not just at tension with on-the-ground regulatory needs – it could even be described as paradoxical. It certainly seems problematic to have to link the data and train a system before you can decide whether you should even be doing either of those things. Many organisations cannot legally or practically proceed with any data work, even basic data access, cleaning, linking or exploration, until this stage is passed. Yet DADM and FATML approaches often implicitly assume that all the ingredients are on the table to build the tool, and the only decision to be made is whether to deploy or not.
Machine learning is a generic technology with sector-specific applications. High profile, consequential domains have included anticipating the geospatial distribution of crime (Azavea, 2015; Perry et al., 2013; Wetenschappelijke Raad voor het Regeringsbeleid (WRR), 2016), the need for child protection (Vaithianathan et al., 2013) and the detection of tax fraud (Khwaja et al., 2011; Sharma and Kumar Panigrahi, 2012). Some ethical issues are sector- or even location-specific, but others are likely to be shared. Highly problematic issues might only appear rarely, limiting their propensity to capture with in-data analysis.
Limited implementation and education surrounding DADM and FATML technologies threatens our ability to cope with pressing issues in today’s machine learning systems. Even though this research field has some history (Andrews et al., 1995; Custers et al., 2013; Hajian, 2013; Pedreshi et al., 2008; Vedder, 1999), usable software libraries remain largely unavailable, and little training exists. Given the current lack of practical ethics education in computer science curricula, rapid change seems unlikely (Goldweber et al., 2011, 2013; Spradling et al., 2008). A stopgap is sorely needed.
Diagnosing and addressing social and ethical issues in machine learning systems can be a high capacity task, and one difficult to plan and execute alone or from scratch. Ethical challenges or appropriate methods to tackle them might lurk within aspects of envisaged that are easy overlooked, such as hyperparameters, model structure, or quirks in data formatting or cleaning. Some issues that might arise might also not have their origins in the models or the data, but surrounding social, cultural and institutional contexts. Issues such as automation bias (Skitka et al., 1999), where individuals either place too much trust or too little trust in decision support systems, might be a synergistic result of both the model and the user interface. Other issues might have their origins in a model but likely solutions elsewhere. For example, for fairness grievances which are particularly difficult to detect or anticipate, better systems for decision subjects to feedback to decision-makers might be required. These issues might not have one-size-fits-all answers, but they are also unlikely to need to be treated as fresh each and every time they arise.
Issues of changing data populations and correlations are both currently under-emphasised in DADM/FATML work and appear difficult to fully address with in-data analysis. Concept drift or dataset shift refers to either real or virtual (differently sampled) changes in the conditional distributions of model inputs and outputs (Quiñonero-Candela et al., 2009) – for example, how changes in law might qualitatively affect prison population or the strategies of fraudsters. Fairness and transparency are not static but moving targets, and ensuring their reliability is important. But anticipating change is technically difficult. Knowledge around rates and causes of change can be tacit, obliging us to carefully consider how best to use expert input (Gama et al., 2013). In particular, these phenomena can be hard to examine when changes are nuanced, or even are a result of the actions of previous machine learning supported decisions themselves. An important key role for domain experts going forward is to explain and record how and why certain types of concept drift occur, rather than just help in their detection (Žliobaitė et al., 2016).
Practical aspects of a knowledge base for fairness
Given the above factors, we propose that a structured, community-driven data resource containing practical experiences of fair machine learning and modelling could serve as a useful resource both in the direct absence of sensitive data, and more broadly in its own right. Such a resource, held online, would allow modellers to record experiences with problematic correlations and redundant encoding while modelling certain phenomena, as well as sociotechnical ethical issues more broadly (such as interpretability, reliability and automation bias), and detail the kinds of solutions and approaches they used or sought to remedy them. It could operate on a relatively open, trust-based model, such as Wikipedia, or have third-party gatekeepers, such as NGOs or sectoral regulators verifying contributions and attempting to instil anonymity where possible or desired. It would create a stepping-stone to enable practical, albeit rudimentary, fairness evaluations to be carried out today.
Linked data technologies have already seen significant adoption in sectors where cross-organisational collaboration around data is necessary (Bizer et al., 2009). This does not necessarily mean an industry-wide, comprehensive, rigid ontology for the purposes of addressing the ethical challenges of machine learning has to be adopted. Rather, a minimal adoption of common practices would enable different organisations to collaboratively annotate and describe the resource.
Several challenges would need to be addressed before such a database could be implemented. Similar variables and entities would need to be aligned in order to make such a dataset structured and navigable. Higher level common identifiers might be needed to group variables even if the levels of such variables were different. Some categorisations might have given individuals the chance to specify non-binary gender identities, or to opt out from this question – but this is unlikely to make any correlations or lessons found completely irrelevant or non-transferable in practice. Database ontologies should incorporate broader parts of the modelling process, such as cleaning or user interfaces, but the best format to do this is unclear. Arriving at it will likely be a result of trial-and-error.
Metadata should also be standardised. What kind of discrimination discovery methods were being utilised? How could effect strength or statistical significance be captured across these? It is likely that a descriptive vignette would also be useful, particularly concerning social processes and organisational context, but should or could this take a standardised format whilst remaining effective?
Such a dataset might benefit from discussion and input from different viewpoints both within the organisations submitting the information, but also externally. Open annotation or discussion technologies might contribute questions and context to the methods and content of dataset entries (Pellissier Tanon et al., 2016; Simperl and Luczak-Rösch, 2014; Vrandečić and Krötzsch, 2014). Technologies such as StackExchange, a question and answer network initially aimed at developers, but recently with wider adoption, have proved practically popular technical and social tools for solving issues around software. Such a database could take inspiration from the factors that make knowledge communities run effectively in these virtual environments. Allowing organisations to trace the sources of the data in such collaborative knowledge bases would also be key; in this respect, much could be learned from proposed solutions to similar challenges in scientific data collaboration (Missier et al., 2010).
Most data scientists are already used to working collaboratively online, through leading technologies in this space such as Git, MediaWiki, or StackExchange. Yet data scientists form only one part of the puzzle. As discussed, fairness issues can concern different parts of the modelling process, and as such viewpoints from others such as user interface developers, project managers and decision subjects would likely be valid and useful. The technologies chosen should be clear and accessible to those who are not used to working in these virtual spaces, whilst incorporating the features and extensibility that more developed solutions bring. If they are not, they are likely to become exclusionary and not see the widespread adoption that would make them most useful.
It is not just modellers who can contribute information to this knowledge base. Quantitative and qualitative findings in the research literature that might be relevant to particular fields or data sources could be added. For example, considerable amounts of research exist on areas such as financial literacy, recidivism or child protection which are carried out with the aims of improving their fields, but not directly to make or inform decision support or decision-making. These forms of evidence could be used to directly inform model structure, or to inform in-data analysis and search for ethical issues and concerns. Many of these pieces of evidence are currently hard to locate – they are published across disciplines, behind paywalls, or with research questions that do not make clear the correlations that the research also unearths. In the medium term, text mining and natural-language processing might help populate such a database semi-automatically.
DADM/FATML methods, given their own technical opacity to laypersons, come with their own issues of transparency and legitimacy. Individuals are, under the GDPR, entitled to know when automated processing of their personal data is occurring, and for what purposes, although there are practical caveats regarding these rights (Edwards and Veale, 2017). Yet for them to understand the potential harms that could accrue to them by consenting is much trickier. Both they and trusted independent third parties usually lack the source data for investigative purposes. Even if they had it, it is unclear that it would be hugely useful or revealing given the rapidly changing nature of these datasets and the patterns within and the ample possibilities for data linkage that usually exist. Yet what they are (usually) interested in is not the data themselves, but the potentially problematic patterns the data support. An evidence base might help individuals or organisations understand what insights are held in different forms of data.
Potentially confounding issues
The proposal is largely grounded on the idea that organisations would be willing to spend time and money on cooperating to create a common resource. Primarily, this is a collective action problem, as there are great incentives to free ride and let others provide the information, which could result in non-provision (Olson, 1971). This is compounded by intellectual property concerns. If insights from data are viewed through an IP or a trade secrets lens, this could make organisations reticent to share.
Yet sharing of data for ethical purposes between firms is far from unheard of, particularly in other sectors facing similarly tricky societal challenges. Social and environmental issues in the global clothing sector are pervasive due to uncertainties around the environmental impact of processes, materials and chemicals, and uncertainties in the on-the-ground production systems characterised by multi-layered subcontracting. The Sustainable Apparel Coalition (SAC) emerged as a data-sharing body in 2010, now with over 180 members representing well over a third of all clothing and footwear sold on the planet. Together with the US Environmental Protection Agency (EPA), and with several large data donations and collection projects involving members, they have been developing the open-source Higg Index to give designers tools to better and more rigorously anticipate potential products’ sustainability further upstream. In some ways, withholding data about ethical concerns and potentially salient social issues could itself be seen as a controversial, reputational risk.
Furthermore, the institutional field of the technology sector does not seem unamenable to this form of cooperation. Institutional fields create like-minded communities of practice through three main mechanisms – coercive pressure, where influence from actors or actants enforces homogeneity; mimetic pressures, which stem from standard, imitative responses to uncertainty; and normative pressures, which stem from how a field coalesces and becomes professionalised (DiMaggio and Powell, 1983). Some promising normative pressures can be seen across the machine learning modelling field that give hope for this – communities of voluntary support on question–answer networks such as Cross Validated 8 (which themselves support mimetic pressures); pro-bono data science for non-profits on the weekends through growing organisations like DataKind; virtual discussions and events from field leaders on /r/MachineLearning and Quora; expectations of contributions to open source software, to name a few. Proposed coercive pressures, such as professional bodies, charters or certification for data scientists might also play a role here in the future.
Identifying and creating databases of ‘good’ or ‘best’ practices is a common but also a problematic policy approach to complex socio-technical challenges. This approach can mislead, as practices are usually assumed to lead to good outcomes rather than being treated as hypotheses subject to serious monitoring and evaluation. Even where evidence suggests good practices work in one context, they may fail elsewhere (Cartwright and Hardie, 2012). Instead of prescribing ‘good practice’, a database of experiences would serve a more exploratory function. Several organisations are well positioned to start or collaborate on such initiatives: private think-tanks such as Data and Society in the United States, proposed bodies such as the national data stewardship body described in a recent report by the Royal Society and the British Academy (2017), or one of many interdisciplinary collaborative melding computer science and social science in universities across the world. It might also connect individuals facing similar challenges across the globe, creating creative, discussion-enabling support networks that help like-minded individuals share advice, strategies and even code to tackle the trickiest challenges together.
Exploratory fairness analysis (EFA)
The situations above assume that information on protected characteristics are either possible to obtain, or available in parallel cases. Yet there may be situations where such data is restrictively difficult to obtain at all. Ambient computing, for example, judges people based on rather disembodied and abstracted features that environmental sensors can pick up, rather than through a data-entry method. Yet these systems might also exhibit fairness concerns; fairness concerns which might be particularly tricky to deal with.
These situations, where the protected data are not known, pose a difficult challenge for computational fairness tools. Yet we propose that there are concrete methods for these issues that while imperfect, could prove useful practices to both explore and develop in the future.
Building ex ante unfairness hypotheses with unsupervised learning methods
Before building the model, data can be examined for patterns that might lead to bias. Exploratory data analysis is a core part of data analysis, but teaching, research and practice into it has been historically marginalised (Behrens, 1997; Tukey, 1980). Results of previous research, such as DCUBE-GUI or D-Explorer, have shown how visual tools might help with the understanding of potentially discriminatory patterns in datasets (Gao, 2015; Gao and Berendt, 2011), even for novice users (Berendt and Preibusch, 2014). Still, as with other methods, these tools broadly come with the assumption that the sensitive characteristics are available in the dataset, which we have argued is often unrealistic.
If we assume that immediately sensitive data are unavailable, simply understanding the correlations in the dataset is of less use. Instead, the exploratory challenge can be seen primarily an unsupervised learning problem. Unsupervised learning attempts to draw out and formalise hidden structure in datasets. Through unsupervised learning, we can hope to build an idea of the structure of correlations within data. As we do not have the sensitive characteristics, confirmatory analysis is difficult. This does not mean there is nothing to be done. Exploratory data analysis has much to contribute in the building of hypothesis and the directing of future data and evidence collection as part of a broader process of due diligence.
A relevant subset of unsupervised learning methods we zoom in on here attempt to understand dataset structure through estimating latent variables that appear to be present. Some methods, such as principal component analysis (PCA), try to create a lower dimensional version of the data that captures as much variance as possible with a smaller number of variables. Some social science methods such as Q-methodology (McKeown and Thomas, 2013) use this approach to try and pick up latent dimensions such as subjective viewpoints. Other methods, such as Gaussian mixture models, assume that datasets are generated from several different Gaussian distributions, and attempt to locate and model these clusters.
These forms of analysis can be used to build hypotheses about fairness in datasets. For example, upon clustering or identifying subgroups within a dataset (which may or may not be related to any protected characteristics), these groups can be qualitatively examined, described and characterised. Experimental and sampling techniques might be used to gain more contextual information about the individuals in these clusters – for example, if their sensed or captured behaviour correlates with any sociodemographic attributes. These clusters can be used before or during the model building process to understand performance on different subgroups present in the data.
Building ex post unfairness hypotheses with interpretable models
A second approach to in-data analysis without access to protected characteristics examines trained models, rather than the input data alone. Once models have been trained, even complex models, there are several methods that are available for trying to understand their core logics in human-interpretable ways.
The literature on understanding models such as neural networks has traditionally distinguished between decompositional interpretation and pedagogical 9 interpretation (Andrews et al., 1995; Tickle et al., 1998). Decompositional approaches focus on how to represent patterns in data in a way that both optimises predictive performance whilst the internal logics remain semantically understandable to designers. Proponents of pedagogical systems on the other hand noted that not only was it difficult to get a semantically interpretable logic from models such as neural networks, although some try (Jin et al., 2006). The tactic they have adopted, which is broadly the domain of most current research in interpreting complex systems, is to see the interpretation as a separate optimisation problem to be considered.
The concept of pedagogically interpretable models is relatively simple to explain. The basic idea is to wrap a complex model with a simpler one, which through querying the more complex model like an oracle, can estimate its core logics. Candidates include logistic regression or decision trees. Increasingly, proposals for the analysis of more complex models acknowledge that the gap between the logics that can be represented by the simpler model and the logics latent in a more complex model are too vast to translate appropriately. Image recognition is a case in point. Instead, proposals in this area have tried to estimate the logics that locally surround a given input vector – such as an image – to understand why it was classified as it was (Ribeiro et al., 2016b). 10
Exploratory fairness analysts might manually examine mechanisms behind a model’s core logics and ask if they made sense. Specifically, analysts might wish to consider whether they would be happy publishing such information behind a model, or whether the public might take issue with the way and reasons behind decisions being made as they were. Some recent research that has highlighted gender bias in word embedding systems, which place words in relation to each other in high dimensional spaces to attempt to map different dimensions of their meaning, has gathered attention: and the methods of bias identification in this area are related to what we discuss here (Bolukbasi et al., 2017; Caliskan et al.. 2017). Future research should tangibly explore whether meaningful and relevant information about datasets or models known to be somehow biased can be discerned through this type of analysis.
Discussions and directions
Three approaches, three purposes
The three distinct approaches we have outlined in this paper point to three possible avenues for exploration in the research and practice of fairer machine learning. Each of them is suited for different purposes.
The third-party approach, where another organisation holds sensitive characteristics that they use to detect and potentially mitigate discrimination from data and models, is primarily useful where trust in the organisation interested in model building is low, or potential reputational risk is high. Insurance or hiring seem like prime cases here, particularly as they are areas historically associated with bias over protected variables. A challenge with this approach is that it is not easy to set up in low-resourced situations, or unilaterally.
The collaborative knowledge base approach, where linked databases featuring fairness issues noted and experienced by global researchers and practitioners, could be useful in a broad array of situations. It might provide benefit where general uncertainty is acute, risk assessment must be undertaken pre-emptively, or risks are complex, changing and sociotechnical. Yet this requires a change of mindset. Organisations involved in modelling should overcome a reluctance to openly discuss their models, and will need to dedicate time and money to give to as well as take from such a shared resource. Anonymous contributions could work as a model, but issues of who verifies provenance of the information given, and how easily it is to re-identify organisations based on modelling purpose would abound.
The exploratory approach requires the least organisational set-up, as it can be undertaken unilaterally on data where sensitive characteristics are not held. Yet while this approach enables the construction of questions and the probing of certain types of anomalous or potentially problematic patterns in the data, on its own it provides by far the least assurance that fairness issues have been comprehensively identified, assessed and mitigated. Further work should seek to formalise methods of exploring data for these kinds of patterns, and test modellers and processes for their efficacy in identifying a range of synthetically induced issues.
There are, unsurprisingly, limits to the effectiveness of technological or managerial fixes to contested concepts such as fairness. Unsupervised learning is particularly challenging to evaluate fairness upon, given that groups discovered are latent, although there has been some recent work beginning to explore this space (Chierichetti et al., 2017). Understanding fairness by demographic will also be hard to grasp when those demographics are latent – such as treating individuals holding particular political views similarly in regards to moderating content online (Binns et al., 2017). More importantly, even though the three approaches we outline deal with different levels of formality and different ways of understanding or conceiving fairness, they all remain broadly centred on the software artefacts themselves. We do not suggest that either these approaches or the broad mindsets that underpin them are sufficient for understanding equity or mitigating discrimination in a digital age. We do, however, tentatively suggest that where these software artefacts are used to make and support decisions, tackling technical aspects of these issues is likely a necessary piece of the puzzle – neither more nor less important than others, such as organisational culture, social methods of oversight, or decisions about the intention or direction of deployment. We also would draw attention to larger challenges with predictive systems: that they might not achieve social or policy goals at all by their nature (Harcourt, 2006), or that fairness might not be the most relevant issue as much as ideas of stigmatisation, over-surveillance, or the devaluing of particular cultural notions, such as family units (Blank et al., 2015). Where there are inherent conflicting interests between organisations deploying such systems and those affected by them, co-operation may not be feasible or desirable; affected groups may instead be drawn (understandably) to more adversarial forms of resistance and political action (Brunton and Nissenbaum, 2015; Danaher, 2016; Lyon, 2007).
Directions for empirical research
These three proposals illustrate how alternative institutional set-ups and ways of knowing might help in the governance of fairness in the context of machine learning. It focuses on one identified practical constraint – the absence of sensitive data. Each approach introduces limitations, caveats and provides few guarantees of performance. This might irritate researchers in this space, yet it reflects the messy reality of many contemporary on-the-ground situations.
We believe there are opportunities amidst the constraints. The practical limitations of fairness-improving approaches, including these three, will only become apparent upon their introduction and reflexive study within real-world settings. In particular, our second and third suggestions, concerning knowledge bases and exploratory analyses, are not amenable to the sort of mathematical guarantees that the DADM literatures may find comforting. In these situation,
Without this dimension, designed tools are likely to stumble in surprising and even mundane ways, which will affect their ability to deal with unfairness and discrimination in the wild. It seems unlikely that statistical guarantees of fairness will translate smoothly to individuals feeling that decisions about them were made fairly – something as much a result of process as of outcome. Researchers working in this space should trial their proposed solutions, monitoring their implementation using rich and rigorous qualitative methods such as ethnography and action research, and feed findings from this back into tool revision and rethinking. To adequately address fairness in the context of machine learning, researchers and practitioners working towards ‘fairer’ machine learning need to recognise that this is not just an abstract constrained optimisation problem. It is a messy, contextually-embedded and necessarily sociotechnical problem, and needs to be treated as such. This requires technical scholars to better grasp the social challenges and contexts; but also for social scholars to grapple more rigorously with the technical proposals placed on the table, and to ensure that critiques with operational implications reach the ears of the computing community.
