Abstract
Keywords
The rapid introduction of machine learning in social science brings together researchers with different ways of thinking about and doing science. This brings new ambiguities and potential clashes. Hofman et al. (2021) published recommendations to help scholars integrate machine-driven prediction practices with explicit goals of testing generalizability and developing new methods, categorizing model types and granularities, and adopting open science practices from the computer sciences. As the Hofman team’s recommendations were published in
Before proceeding, I draw the reader’s attention to the current revolution taking place in social science where the use of machine learning is one of the fastest growing trends. For obvious reasons, this journal’s focus on the intersection of computers and social science places it in the center of this revolution. Figure 1 displays the number of publications mentioning some form of machine learning in comparison to all publications in social science in general and in this journal (“

Key trends in computational social science, 1986–2020.
Understanding and Categorizing Computational Social Science Now
The Hofman team propose a four-category scheme to distinguish modeling approaches:
Descriptive and Explanatory Modeling (a.k.a. Social Science)
As the Hofman et al. (2021) team point out (Figure 1), these models try to explain how changes impact outcomes in a given situation and are developed predominantly by logic and experimental design. They tend to have goals of causal inference, theory development, and constructing and testing formal models (as with mathematical sociology). Effective usage of explanatory models necessitates careful consideration of the data generating model and whether there is random assignment and that all confounding and colliding pathways are accounted for before developing tests or drawing conclusions. This control knowledge can only come from theory and prior experience with the subject of study.
Whether a model is descriptive or explanatory has mostly to do with researchers’ prior expectations. Without assumptions or specific hypotheses to test, work is descriptive, that is, to uncover if there is an association of
The advantages of explanatory modeling are primarily scientific. They provide advances in categorization and description of human societies, behaviors, structures, and processes. Ideally, they better educate students and the general public about how and why things are the way they are and information to assist in policy-making. As the models tend to represent specific theories of a narrow range of social or behavioral processes, they are tested on data reflecting unique times, places, contexts, and especially sources. Thus, their explanatory “power” tends to be low, for example, regression coefficients and
A major drawback in explanatory modeling is haphazard deployment. Scholars rely on null-hypothesis significance testing (NHST) and often selectively report coefficients that have asterisks (
Fortunately, the open science movement and shifts toward meta-science are helping bring these issues to light. Also, perhaps driven by some influence from computer scientists and their predictive modeling approaches, explanatory modelers are increasingly running many models, testing robustness, and considering replication or meta-analysis to ensure that a theory (explanation of something) passes the scrutiny of many data sets and specifications and that a reported “effect” should be judged on other criteria such as relevance rather than simply being non-zero (Freese & Peterson, 2018; King, 1995; Stahel, 2021).
Predictive Modeling
This is essentially all forms of machine learning, also sometimes known as “algorithmic modeling.” The approach is generally a-theoretical and pays little attention to causal mechanisms or explaining anything. It is widely applied in computer science and in the private sector to predict online behaviors and sell products or improve investment decisions, for example. However, the use of predictive modeling has grown exponentially (see Figure 1). These models seek to exploit all known information from a given source of data, including meta-data and contextual data, to predict an outcome. This is done using a subset of the available data and then the preferred algorithm is tested on a different subset of the data. If the predictive power is high, then the model is acceptable. This makes for easy judging criteria, unlike with explanatory models where theoretical discussions, causal logic, consideration of previous literature, and various statistical tests and fit statistics are simultaneously used to decide if a model is useful.
In social science, being able to predict an outcome is of little use unless it benefits goals of classification or theory development. Thus, predictive modeling has entered the social sciences mostly in service of explanatory modeling. It can accomplish tasks that humans cannot. For example, qualitative coding of topics or events requiring too many human coders or the capacity to code data faster than human coders. The advantages can be monumental, for example, scholars could track the spread of the SARS-CoV-2 virus and public sentiment across the world daily thanks to predictive modeling. 1 This demonstrates how predictive modeling could contribute to an active social science with real-time data and results.
The major drawback of predictive modeling is that the factors driving predictive accuracy are more or less an abyss. Another drawback is data availability. Human behaviors and outcomes can be predicted with accuracy, but only when large data sets are available with thousands of variables, there is rarely so much information available except for specific surveys at specific moments. Thus, having a powerful and accurate machine algorithm is useless most of the time, as large-scale surveys are very rare and expensive and sensitive public information is not freely available. Other drawbacks are general replication issues, some of these are similar to those already well known in explanatory modeling (Breznau, 2021a; Campion et al., 2020; Hendriks et al., 2020; Janz, 2015; Open Science Collaboration, 2015), but some are unique to predictive modeling (Kapoor & Narayanan, 2021). For example, certain steps in the process are completely out of the hands of the researchers so that identical start code and routines produce different results in the presence of different choice layers or graphics cards (GPUs) inherent to the software or computer being used (Vijayakumar & Cheung, 2019; Villa & Zimmerman, 2018).
Still more concerns relate to the environmental impact of computer energy consumption in larger and larger predictive models (Bender et al., 2021) and evidence that humans often can predict outcomes just as well as machine learning algorithms in sociological and psychological studies (Christodoulou et al., 2019; Dressel & Farid, 2018; Salganik et al., 2020; Saveski et al., 2021). One poignant example of this demonstrated that a human and machine algorithm was roughly identical in predicting unemployment spells but the machine algorithm relied on 10,000 variables while the human logistic regression needed only four (McKay, 2019). If human models generated by trained experts can perform just as well, then they are preferable because they use less degrees of freedom, require less computing power, cause less climate change, and are more cost effective in data requirements (e.g., the cost of a survey with four vs. 10,000 questions!).
Natural language processing in machine learning brings up some serious critical race and inequality issues. When machines code things in lieu of humans, they can reproduce existing social biases to further disadvantage already disadvantaged groups. The technical language used to categorize people could be coded with negative affect, for example, “Black” can be identified with negative sentiment contra “White,” and this certainly could lead to racial biases and harms from machine algorithms (Gebru, 2019). Thus, when policy makers or law enforcement use biased algorithms, they reinforce bias (Janssen et al., 2020). The same has been shown for phrases that describe persons with disabilities. Hutchinson et al. (2020) demonstrated that phrases used to describe persons with disabilities are coded by a (well-trained) machine as having high levels of “toxicity” (a negative affect sentiment), for example, “I am a person with mental illness” or “I am a deaf person” and even “I will fight for people who are deaf” would all have a high degree of toxicity in machine language processing. If used in monitoring or censoring social media, such algorithms could disadvantage mentally ill and mental illness support or advocate groups.
Integrative Modeling
The Hofman team foreshadows this approach as a potential new trend in social science. Integrative models would involve explanatory and predictive approaches in a single study. The single study might involve many smaller modeling steps, but they would all contribute collectively to an integrative model. The Hofman et al. (2021) team defines an integrative outcome as one that “[t]ests a claim both for causality and predictive accuracy” (p. 185) and could “help to formulate predictively accurate causal explanations” (p. 184). The Hofman team provides two examples, one from Athey et al. (2011) who come up with an explanatory model of bidding behavior in an auction and use it to predict outcomes that are then tested against the actual outcomes. The other example from coordinate ascent algorithms that iteratively alternate between predictive and explanatory models, in particular, this involves manipulating some aspect of the subjects while under study to help better explain the outcomes (Agrawal et al., 2020). Somehow, such models should provide benefits that are greater than explanatory or predictive models done in isolation because they can predict “magnitude and direction of individual outcomes under changes or interventions” (Hofman et al., 2021, table 2).
Because of the technical barriers to predictive modeling and the risks of inappropriate usage of explanatory and predictive modeling in isolation, it is possible that integration will simultaneously bring even less reliable outcomes. Pointed out by Lazer et al. (2009), most social science methods were developed to handle snapshots of data. This means that methodological developments are needed to keep pace with machine learning approaches and larger data sets with ongoing sampling. It is already a monumental achievement to analyze networked data with 10,000 nodes (with a potential 50 million network ties), it is another altogether to do this with 10,000 nodes over 10,000 days (with a potential 500 billion transactions across those daily ties). The technical skills and computing power needed to achieve integrative modeling is a serious concern and should be weighed against the positive potential benefits and new enthusiasms of social scientists to jump on the artificial intelligence bandwagon.
Another barrier is that social scientist are unlikely to have integrative goals. Studying a time- and place-specific phenomenon may mean that having predictive accuracy on out of sample data is irrelevant because the interest is only on that particular moment. Moreover, when bringing in new data, it is very likely the data generating model changed and this would require rethinking the theory rather than trying to maximize predictive accuracy. Again, a lack of data also precludes many integrative goals. For example, Altaweel (2021) developed a predictive natural language processing algorithm to classify cultural objects advertised on
Currently, all articles published in
There are exceptions in the broader literature and these exceptions will likely grow as a function of knowledge and discussion of best practices regarding machine learning, especially if social scientist heed the recommendations of the Hofman team. Sometimes when deployed with high technical skill, integrative modeling could identify explanatory and causal mechanisms that researchers simply cannot see under normal circumstances. In random, forest algorithms machines might help to identify combinations of variables that stand out as predictors of an outcome or make clear an otherwise suppressed relationship to an outcome after testing all other possible combinations and thus ruling out “luck” or random chance that a scholar arrived at such a result (Molina & Garip, 2019). Such a combination of variables might be a meaningful subgroup in a given society (Brand et al., 2021).
Currently, the social science I am familiar with has goals of description and explanation. Studies use machine learning in one stage to define a variable to use in their main explanatory model. They use
Integrative Lessons
Overall, the Hofman team demonstrate that social scientists (explanatory modelers) and computer scientists (predictive modelers) can learn from each other’s procedural differences. For example, the shift to open science leads social scientists to embrace methods insulating against analytical flexibility (Nosek et al., 2018) while computer scientists use crowdsourcing, such as the “common task framework,” to achieve larger modeling goals (Breznau, 2021b). 2 Cross-integration of these practices could help both types of science to become more reliable, hack-proof, reproducible, and generalizable in scope. Social science gains are already emerging in “many analysts” studies which mimic crowdsourcing competitions of computer scientists but achieve goals of explanation not just developing a better (meta-)algorithm (Botvinik-Nezer et al., 2020; Breznau et al., 2021; Silberzahn et al., 2018). At the same time, if predictive models were preregistered and peer reviewed, it could help improve their efficiency, for example, by avoiding redundant testing of models on same data subsets introducing bias loops and possibly overstating predictive accuracy. This would in turn benefit modelers who try to use prediction to serve explanation goals but may not be as skilled as computer scientists in predictive modeling. Preregistration peer reviews could greatly reduce shoddy machine learning research practices.
The Hofman team’s recommendations come at a critical moment when more and more researchers are employed to do computer science in service of social science goals. These researchers will struggle if they only pursue predictive modeling. In the end, social science is about explanation and this requires theory. In fact, it is social scientists who can teach computer scientists to understand that prediction itself requires basic assumptions, and assumptions are the building blocks of theory. For example, knowledge and assumptions about human sentiments are necessary before supervising a machine to arrive at usefully coded sentiments (Watanabe & Zhou, 2020). Goals of theoretical explanation can help resolve the reproducibility crisis currently facing social science (Gervais, 2021) if not the ethical crises facing computer science. Social scientists often try to maximize
It was my intention in this communication to raise awareness for computational social scientists about the risk-reward trade-offs in integrating predictive modeling. As such I would argue the Hofman team’s “Summary of Suggestions” (p. 187) should be a standard reference for integration, because it calls social scientists to (1) integrate explanatory and predictive modeling with explicit goals of testing generalizability and developing new methods, (2) clearly labeling contributions by model type and granularity, and (3) to standardize open science practices across social and computer sciences should be standard practice in the new post–machine learning social science era that we just entered. Underlying the many benefits of these goals is the possibility to improve social science through better theory production. First, generalizability and new (better) methods improve the quality of theory and theory testing. Second, clear delineation of a model and its level of granularity in a way that is interpretable by another social scientist is an exercise in reflective logic. Spending more time logically reflection on a model provides an opportunity for scholars to better develop their theory. Third, open science practices are there to remove barriers and promote a more robust and reliable social science. With fewer barriers, there are more opportunities for theoretical testing and development, and with more robust findings social scientists will spend less time recycling poorly supported findings and theories.
