Sage Journals: Discover world-class research

Abstract

Survey data sets are often wider than they are long. This high ratio of variables to observations raises concerns about overfitting during prediction, making informed variable selection important. Recent applications in computer science have sought to incorporate human knowledge into machine-learning methods to address these problems. The authors implement such a “human-in-the-loop” approach in the Fragile Families Challenge. The authors use surveys to elicit knowledge from experts and laypeople about the importance of different variables to different outcomes. This strategy offers the option to subset the data before prediction or to incorporate human knowledge as scores in prediction models, or both together. The authors find that human intervention is not obviously helpful. Human-informed subsetting reduces predictive performance, and considered alone, approaches incorporating scores perform marginally worse than approaches that do not. However, incorporating human knowledge may still improve predictive performance, and future research should consider new ways of doing so.

Keywords

Fragile Families Challenge machine learning surveys prediction missing data

Social science survey data sets are often wider than they are long. Resource limitations demand that surveys ask many questions of the minimum number of respondents needed for statistical analyses. Moreover, social scientists are often interested in hard-to-reach populations, accentuating the need to ask many questions of few respondents. These difficulties characterize the Fragile Families and Child Wellbeing Study (FFCWS), which follows a cohort of nearly 5,000 children born in large U.S. cities between 1998 and 2000, roughly three quarters of them to unmarried parents (Reichman et al. 2001). The study collects a wealth of information about this disadvantaged group, including children’s physical and mental health, cognitive function, schooling, and living and family conditions. Overall, the FFCWS data set contains nearly 13,000 variables.

The breadth of the variables contained in the FFCWS data set presents opportunities for a prediction task such as the Fragile Families Challenge (FFC). The FFC asked participants to use a data set containing variables collected from the child’s birth until year 9, and some training data from year 15, to predict six outcomes in the year 15 data: grade point average (GPA) and grit of the child,¹ material hardship and eviction of the family, layoff of the primary caregiver, and whether the primary caregiver participated in a job skills program. Although there is considerable information on each child, there are few children in the data set. As a result, new problems arise. Specifically, the high ratio of variables to observations increases the possibility of overfitting, that is, of fitting a complex model to statistical noise in a way that yields less useful out-of-sample predictions. In this article, we explore whether human-informed variable selection and parameter tuning can help solve this problem.

Machine-learning (ML) methods have been increasingly applied to data with a high ratio of variables to observations to help with these same tasks (so-called feature selection). They provide ways to effectively use vast amounts of information contained in high-dimensional data sets (Donoho 2017). In contrast to substantive social science approaches, ML methods are less concerned with theoretical informativeness and favor data-driven predictive performance. Social scientists, on the other hand, usually draw on knowledge about the underlying data-generating process linking variables to outcomes.

Increasingly, a number of applications in computer science have sought to incorporate human knowledge into ML methods (e.g., Branson et al. 2010). However, applications of these “human-in-the-loop” approaches are rare in the social sciences. In this article, we implement a human-in-the-loop approach to the FFC’s prediction tasks. We surveyed a scholarly community of social scientists as well as an anonymous community of laypeople to elicit their beliefs about which variables in the FFCWS data set would best predict each of the six outcomes. We used the information from these surveys in different ways. First, we subsetted the FFCWS data set preemptively, using either the variables identified by these surveys or a preexisting set of variables identified by the Fragile Families team. Second, we used information on scores assigned to particular variables to assign weights in the ML method. In effect, our ML approach was more likely to use variables with higher scores. We contrasted these human-in-the-loop approaches to a data-driven ML approach making use of the full data set of nearly 13,000 variables.

The article proceeds as follows. First we outline how we elicited scholarly expertise and lay judgments. To use the extensive collection of variables in the FFCWS for our modeling approaches, we needed to address the issue of missing values in the data set. Next we describe how we addressed missingness. Thereafter we describe the models used, present results, and conclude.

Using Expert and Crowd-Sourced Knowledge

There might be several ways to collect knowledge about the predictors of the outcomes in the FFC. One could screen publications or conduct interviews with individuals familiar with the FFCWS. We leveraged computational tools to retrieve insights from scholars in three steps. First, we used Amazon Mechanical Turk (MTurk) to retrieve the contact information on every author who had published using the FFCWS (786 authors). Then, we administered online surveys to each author to identify relevant predictors of each outcome. Expert surveys have been used for a variety of predictive or forecasting tasks, from projections of fertility, mortality, and immigration (Billari, Graziani, and Melilli 2012; Bijak and Wiśniowski 2010) to measuring the quality of democracy (Pemstein et al. 2015) and to school planning (Raftery et al. 2012). Experienced researchers carry a wealth of knowledge about the relationships between variables and outcomes in these data, not all of which is published. By surveying researchers, we hoped to recover knowledge that was otherwise inaccessible at relatively low cost and over no more than few days. We also fielded the same survey to a comparison sample of laypeople that we crowd-sourced using MTurk.

To elicit expert and lay beliefs, we used a wiki survey. We chose this to maximize accessibility, efficiency, and openness to new knowledge (Salganik and Levy 2015). We asked participants to choose which of two randomly selected predictors were likely to best predict a given outcome. These predictors were initially drawn from a list of 27 predictors suggested by a group of researchers familiar with the FFC, but participants were given the opportunity to add candidate predictors to the list (which would then be voted on by subsequent participants). As we explain in the Appendix, these predictors were higher-level concepts rather than specific variables. We used the data from the online surveys to generate an ordered list of candidate predictors; we scored each variable as the number of times it was voted for divided by the number of times it appeared in a pair. Further details about the surveys are included in Appendix A.

Overall, 104 of 786 sampled experts participated, generating 2,651 votes. Seven hundred laypeople participated in our MTurk surveys, generating 27,221 votes. We used the variables identified through the expert and MTurk surveys in two different ways for our predictions. First, we used it to subset the data. Together, the expert and MTurk surveys yielded 68 higher-level concepts, which we associated with 271 variables from the FFCWS data set. We took these 271 variables as a single, wiki survey–generated subset.² Second, we used the rankings generated by the expert and MTurk surveys directly, as information passed to an ML algorithm. In this case, this yielded two approaches rather than one: one that used expert scores and one that used lay scores. Details are provided in the section “Models.”

Imputation

Because most ML approaches require a numeric and complete data set, processing the FFCWS data to handle missingness was a crucial step in preparing variables for modeling. To appreciate the extent of this problem, note that all observations had some missingness on some variables, which implies that there would have been no observations left with listwise deletion. Data were missing for different reasons, including unwillingness to respond, “don’t know” responses, logical skips, panel attrition, anonymization of sensitive information, and error. Roughly 74 percent of the data were missing in a way that posed problems for prediction (Figure 1). In a complex study such as this, the problems posed by missingness are particularly acute. We thus explored different imputation approaches with trade-offs in terms of efficiency and effectiveness (Appendix B). Because our different imputation strategies make different assumptions, we produced five distinctly imputed data sets on the basis of three unique approaches.

Figure 1.

Missing data.

Models

We modeled the six outcomes with regularized regression. Regularization is an ML technique that can improve prediction on new data by avoiding overfitting on training data (James et al. 2017). Regularized models can be fit with large numbers of variables and relatively few observations. Regularized regression biases or shrinks model coefficients toward zero, relative to their maximum likelihood estimators, by applying a penalty to the likelihood function. Each nonzero coefficient has an associated cost.

Absent other information, this cost is the same for every variable. If outside information warrants, however, the penalty can be relaxed for specific variables. The human knowledge of variable rankings captured through the scores from our survey is precisely this kind of information, and we drew on these scores to relax the penalties for the associated variables to differing degrees. For each scored variable, the global shrinkage parameter λ, which determines the overall degree of regularization, was multiplied by a local, variable-specific penalty factor ranging between 0 and 1. The wiki survey scores, ranging from 0 to 100, were mapped onto penalty factors in an inverse linear fashion to determine an appropriate local penalty factor for each variable. For instance, a score of 100 mapped to a penalty factor of 0, producing an unpenalized coefficient, while a score of 0 mapped to a penalty factor of 1 and full application of the global shrinkage parameter λ. Depending on the model, variables without scores were either treated as having a wiki survey score of 0 or else excluded entirely. Although simpler, this approach takes inspiration from Bayesian approaches to global and local shrinkage (Carvalho, Polson, and Scott 2009; Lee et al. 2010; Piironen and Vehtari 2016).

We fit linear regressions for the continuous outcomes (GPA, grit, and material hardship) and logistic regressions for the binary outcomes (eviction, layoff, and job training). We used the implementation of regularized regression, with an “elasticnet” penalty, from the glmnet R package (Friedman, Hastie, and Tibshirani 2010). Appendix C describes the statistical and mathematical details of our models.

Results

In sum, we explored a total of 25 different approaches to prediction, distinguished by choices made at the following stages: (1) how we imputed missing observations, (2) whether we subsetted the data set prior to prediction and in what way, and (3) whether we incorporated outside knowledge into our modeling and in what way. As discussed, we considered five types of imputed data sets, three approaches to subsetting (no subsetting, subsetting to the variables identified by our wiki survey, and subsetting to the constructed variables identified by the Fragile Families team³), and three approaches to incorporating scores (expert scores, MTurk scores, and no scores). There were thus 45 possible permutations across these methods; of these, we focused on 25. Limitations of time and other resources narrowed the models we could run. For instance, the multiple imputation (MI) method we chose could not be run on the full data set of 13,000 variables using available computational resources.

These 25 approaches can be compared in terms of mean squared error (Figure 2). However, because we did not fill the permutation space, it is complicated to rank the performance of choices at any given stage. In an unfilled permutation space, an unrestricted comparison of any set of choices does not hold all other strategies constant. For example, the fact that we used mean imputation with six subsetting and scoring approaches, but MI with only three, skews any comparison of the five imputation choices. Because the analytic choices we made affect our predictions, this kind of comparison is invalid. Therefore, when considering the best strategy in any given dimension, we restrict ourselves to that part of the permutation space in which we can compare across the relevant choices (Figure 3). We identify the best approach as the choice which minimizes the average or median mean squared error (MSE) across all other approaches and outcomes (Figure 4). This illustrates the relative rankings of these approaches, but the differences in performance also vary in magnitude. Therefore, we also illustrate the improvement made by any given approach, which we calculate as the average percentage improvement in MSE relative to the outcome-specific baseline MSE (Figure 5).^4,5

Figure 2.

MSE’s from approaches relevant to human-in-the-loop rankings.

Figure 3.

Permutation space of possible and relevant approaches.

Figure 4.

Rankings by lowest average and median MSE.

Figure 5.

Average percentage reduction in MSE.

In what follows we consider what our results suggest for four different questions: (1) how to impute, (2) whether to subset, (3) whether to incorporate scores, and finally (4) whether it makes sense to include humans in the loop at all (whether by informed subsetting, or scoring, or both).⁶

Imputation

How should researchers approach issues of missingness? Overall, our results suggest that MI is best. If researchers have the computational power to pursue this approach, they should. Note, though, that by the metric of average MSE, the next best strategy is simple mean imputation and that the dividends to MI are not obviously enormous (Figure 4a). MI results in a 4.94 percent reduction in MSE relative to baseline, on average, whereas mean imputation results in a 4.61 percent reduction (Figure 5a). So, where resource constraints are an issue, mean imputation may be a viable alternative.⁷ Also, regression-based imputation methods do not clearly outperform simple mean imputation, which is noteworthy given their additional computational costs.

Subsetting

Does it make sense to preemptively subset the data before modeling? Most social science researchers who use these data no doubt do, because it is impossible for humans to make much sense of thousands of variables. It is thus tempting to do the same in a prediction exercise of this kind. Yet our results suggest that human-informed subsetting worsens rather than improves predictive performance. Of all of the results in this article, this is the clearest: human-informed subsetting discards useful information. In this domain, human loses to machine.

Interestingly, the two strategies that involve subsetting are not clearly distinguishable in terms of their predictive performance. By average MSE, it seems preferable to subset to the variables from our wiki survey, but by median MSE, the constructed variables fare better. In one sense, this is as encouraging as it is surprising. The constructed variables represent the considered judgment of people with experience in the field and with the FFCWS, whereas the wiki survey variables were selected in a few days and at low cost by an anonymous community of experts and laypeople. Of course, the wiki survey was fielded within the context of the FFC with the clearly assigned task of identifying predictors for the outcomes, whereas the constructed variables were not generated explicitly for this prediction task. Nevertheless, we find there is not much to distinguish them, and if anything, the wiki survey variables perform better (Figure 5b).

Scoring

Is it useful to incorporate human knowledge into the modeling process, as described earlier? Not really, according to either of the metrics we use to rank approaches. Whether measured by average or median MSE, approaches that ignore scores altogether outperform approaches that use expert or lay scores. For advocates of an approach that marries the powers of machines to human wisdom, this is disheartening. However, there are at least two caveats. First, the differences in performance are very small. On average, as Figure 5c shows, approaches that do not use scores reduce MSE relative to baseline by about 5.39 percent, compared with 5.29 percent and 5.25 percent for experts and MTurk users, respectively. Second, as we argue below, our approach to knowledge incorporation was ad hoc. As long as it is possible to imagine better ways of incorporating human knowledge into the loop, future research should consider them.

Humans in the Loop?

Does all this suggest that there is no role for humans in the loop? Not entirely. By average MSE, the best approach overall is one that does not subset and does not incorporate outside knowledge. Yet, again, the differences between this and the next best (and, indeed, the third best approach) are slight: a reduction of 7.67 percent versus 7.59 percent. Furthermore, if ranked by lowest median MSE, our best performing approach does enlist humans: one that incorporates expert scores while not subsetting the data set (Figure 4d). The discrepancy between average and median MSE rankings is explained by the very poor performance of a no subsetting and no scores approach in predicting the layoff of a child’s primary caregiver. This may suggest that outside information is useful for some outcomes but not others. Furthermore, one possible interpretation of this result is that strategies using expert scores are more robust to bad performance in a single outcome.

What is clear from our results is that if humans are to enter the loop, it ought not to be by preemptively subsetting the data but rather by incorporating their wisdom into an approach that still leverages ML to extract information from the full data set. Making use of the full data set may not always be possible, as exemplified by the computational constraints we faced for generating a full data set with MI. However when it is possible, it can usefully augment prediction. Our approach incorporating scores on the basis of expert surveys fared better as a human-in-the-loop strategy. Although neither our approach to generating scores from the wiki surveys nor incorporating them in the models is dispositive, we believe that such approaches with further refinements may hold promise for human-in-the-loop strategies. In short, although there is obviously important information that only machines pick up, strategies that incorporate human knowledge to tune parameters in a model merit further exploration.⁸

Conclusions

In this article, we considered different ways of tackling a difficulty faced by researchers seeking to use survey data sets for prediction, namely, that the large ratio of variables to observations makes informed variable selection difficult. To tackle this problem, we proposed a low-cost way to mine a scholarly community for insights. We considered ways to use this information to subset a data set preemptively or at the modeling stage (or both together).

What did we find? First, our results do not recommend preemptively subsetting the data. This is common practice in social science research, which is understandable, because social scientists are often more concerned with description and explanation rather than prediction, and humans cannot make any theoretical sense of thousands of variables. But for prediction purposes, this approach obviously discards useful information. Approaches that relied on this strategy fared worse than approaches that did not. Second, we find some evidence that human insight may be useful if fed into ML approaches at the modeling stage, but we have not demonstrated this beyond doubt. By average MSE, our best performing approach is one that neither subsets nor incorporates scores. But it does not perform much better than one that incorporates expert scores, and this approach actually ranks best by median MSE across outcomes.

What, then, is the future of humans in the loop? We believe that future research should consider at least two types of improvements to our approach. First, the response rate of our expert survey was low: improving this would make it much easier to compare the dividends of surveying experts rather than laypeople. We expect that experts bring knowledge that laypeople do not, but our results do not clearly demonstrate this. Second, future work should consider alternative ways to incorporate human knowledge into ML models. We did so in an ad hoc way, but better formalization of our intuition and better use of the scores in modeling will surely help in deciding the place of humans in the loop, going forward.

In closing, this project considered whether approaches from the tradition of informative, human-centered modeling can be usefully combined with ML techniques. We found that their combination is not always profitable but also that their judicious combination may yet be useful.

Supplemental Material

SRD-17-0125 – Supplemental material for Humans in the Loop: Incorporating Expert and Crowd-sourced Knowledge for Predictions Using Survey Data

Supplemental material, SRD-17-0125 for Humans in the Loop: Incorporating Expert and Crowd-sourced Knowledge for Predictions Using Survey Data by Anna Filippova, Connor Gilroy, Ridhi Kashyap, Antje Kirchner, Allison C. Morgan, Kivan Polimis, Adaner Usmani and Tong Wang in Socius

Footnotes

The results in this article were created with software written in R 3.4.3 ( R Core Team 2017 ) using the following packages: glmnet 2.0.13 ( Friedman et al. 2010 ),Amelia 1.7.4 ( Honaker et al. 2011 ),caret 6.0.78 ( Kuhn 2017 ),polywog 0.4.0 ( Kenkel and Signorino 2014 ),Matrix 1.2.12 ( Bates and Maechler 2017 ),doParallel 1.0.11 ( Microsoft Corporation and Weston 2017 ),parallel 3.4.3 ( R Core Team 2017 ),dplyr 0.7.4 ( Wickham,François,et al. 2017 ),forcats 0.3.0 ( Wickham 2018a ),haven 1.1.0 ( Wickham and Miller (2017),labelled 1.0.1 ( Larmarange 2017 ),purrr 0.2.4 ( Henry and Wickham 2017 ),readr 1.1.1 ( Wickham,Hester,and François 2017 ),stringr 1.3.0 ( Wickham 2018b ),tidyr 0.8.0 ( Wickham and Henry 2018 ),devtools 1.13.5 ( Wickham,Hester,and Chang 2018 ),rmarkdown 1.9 ( Allaire et al. 2018 ),rprojroot 1.3-2 ( Müller 2018 ),ggplot2 2.2.1 ( Wickham 2009 ),plyr 1.8.4 ( Wickham 2011 ),and data. table 1 .10.4-3 (

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: Funding for the FFCWS was provided by the Eunice Kennedy Shriver National Institute of Child Health and Human Development through grants R01HD36916,R01HD39135,and R01HD40421 and by a consortium of private foundations,including the Robert Wood Johnson Foundation. Funding for the FFC,and for the MTurk fees associated with this project,was provided by the Russell Sage Foundation. Support for the computational resources for this research came from a Eunice Kennedy Shriver National Institute of Child Health and Human Development research infrastructure grant (P2C HD042828) to the Center for Studies in Demography and Ecology at the University of Washington.

ORCID iD

Kivan Polimis

Supplemental Material

Supplemental material for this article is available with the manuscript on the Socius website.

Author Biographies

Anna Filippova is a postdoctoral researcher with the Institute for Software Research at Carnegie Mellon University,where she works toward supporting sustainable open collaborative community development,particularly in the context of free and open-source software and Wikipedia communities. She received her PhD from the National University of Singapore. Her research interests include social norms and conflict in virtual environments,inclusive group processes in diverse teams,and the role of face-to-face events in supporting the development of online peer production communities. She has also been involved in organizing free and open-source community events,such as the Abstractions conference and Ruby monthly meet-ups.

Connor Gilroy is a PhD student in sociology at the University of Washington. He studies LGBTQ communities and populations to understand social processes of visibility,acceptance,and assimilation. His current research investigates patterns of sociodemographic change in gay neighborhoods. Additionally,he has projects on improving demographic estimates of queer populations with social media data and on using agent-based models to explore the macro-level impacts of the interpersonal process of coming out as LGBTQ.

Ridhi Kashyap is an associate professor of social demography and fellow of Nuffield College at the University of Oxford. She finished her DPhil in sociology jointly affiliated with the University of Oxford and the Max Planck Institute for Demographic Research in 2017. Her research spans a number of substantive areas in demography and sociology,including gender,mortality and health,the diversification of family forms,and ethnicity and migration. Her work has sought to adopt computational innovations both in terms of modeling approaches such as agent-based models and digital trace data from Web and social media platforms to study social and demographic processes. She is currently leading a Data2X and UN Foundation–supported project that uses big data from the Web,in particular large-scale online advertising data that provide information on the aggregate numbers of users of online platforms by demographic characteristics,to measure sustainable development and gender inequality indicators.

Antje Kirchner is a research survey methodologist at RTI International and an adjunct research assistant professor at the University of Nebraska–Lincoln. Her research addresses challenges in survey methodology,including ways to examine nonresponse bias using ML techniques,adaptive and responsive design,assessing the quality of survey and administrative data,eliciting and analyzing answers to sensitive questions,detecting problems in the respondent-interviewer interaction,and how to improve response quality in Web surveys using paradata. Her research has been published in journals such as Public Opinion Quarterly,the Journal of Survey Statistics and Methodology,and Journal of the American Statistical Association .

Allison C. Morgan is pursuing her PhD in computer science at the University of Colorado Boulder. She is interested in using data mining,ML,and social network analysis to develop and test hypotheses about the origins and effects of gender imbalance within academia. She is supported by the National Science Foundation’s Graduate Research Fellowship. Prior to graduate school,Allison worked as a data scientist for two years at a small tech startup in Portland,Oregon. She earned her BA in physics from Reed College.

Kivan Polimis is a data scientist at Maana in Houston,Texas. Kivan is interested in structural inequality,natural language processing,and developing programming solutions to social problems. He was previously with Bocconi University’s Center for Social Dynamics and Public Policy and Institute for Data Science and Analytics as a postdoctoral researcher and affiliate. Prior to Bocconi,Kivan was the program coordinator for the University of Washington’s Data Science for Social Good program and a civic technology fellow with Microsoft. Kivan is originally from Saint Lucia and holds degrees in sociology from Princeton University (BA),the University of North Carolina at Chapel Hill (MA),and the University of Washington (PhD).

Adaner Usmani is a postdoctoral fellow at the Watson Institute for International and Public Affairs at Brown University. His dissertation examines the rise and fall of labor movements over the twentieth and early twenty-first centuries and considers the effects of these facts on politics and public opinion. In other work,he has written about American mass incarceration,with an eye on the racial politics of its origins and reproduction.

Tong Wang is an assistant professor of management sciences at the Tippie College of Business,University of Iowa. She received her PhD in computer science from the Massachusetts Institute of Technology in 2016. Her general research interests include interpretable ML and applied data mining,with its application in computational criminology,health care,social marketing,and other areas. Her research on crime data mining is the second place winner in “Doing Good with Good OR” at INFORMS 2015. Her work on crime data mining has been reported in multiple media,including Wikipedia.

References

Allaire

Xie

Yihui

McPherson

Jonathan

Luraschi

Javier

Ushey

Kevin

Atkins

Aron

Wickham

Hadley

Cheng

Joe

Chang

Winston

. 2018. “rmarkdown: Dynamic Documents for R.” R package version 1.9. (https://CRAN.R-project.org/package=rmarkdown).

Bates

Douglas

Maechler

Martin

. 2017. “Matrix: Sparse and Dense Matrix Classes and Methods.” R package version 1.2.12. (https://CRAN.R-project.org/package=Matrix).

Bijak

Jakub

Wiśniowski

Arkadiusz

. 2010. “Bayesian Forecasting of Immigration to Selected European Countries by Using Expert Knowledge.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 173(4):775–96.

Billari

Francesco C.

Graziani

Rebecca

Melilli

Eugenio

. 2012. “Stochastic Population Forecasts Based on Conditional Expert Opinions.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 175(2):491–511.

Branson

Steve

Wah

Catherine

Schroff

Florian

Babenko

Boris

Welinder

Peter

Perona

Pietro

Belongie

Serge

. 2010. “Visual Recognition with Humans in the Loop.” Retrieved December 8, 2018 (https://pdfs.semanticscholar.org/b42c/4b804d69a031aac797346acc337f486e4a09.pdf).

Carvalho

Carlos M.

Polson

Nicholas G.

Scott

James G.

2009. “Handling Sparsity via the Horseshoe.” Pp. 73–80 in Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics. Retrieved December 8, 2018 (http://proceedings.mlr.press/v5/carvalho09a/carvalho09a.pdf).

Donoho

David

. 2017. “50 Years of Data Science.” Journal of Computational and Graphical Statistics 26(4):745–66.

Dowle

Matt

Srinivasan

Arun

. 2017. “data.table: Extension of ‘data.frame.’” R package version 1.10.4-3. (https://CRAN.R-project.org/package=data.table).

Friedman

Jerome

Hastie

Trevor

Tibshirani

Rob

. 2010. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software 33(1):1–22.

10.

Henry

Lionel

Wickham

Hadley

. 2017. “purrr: Functional Programming Tools.” R package version 0.2.4. (https://CRAN.R-project.org/package=purrr).

11.

Honaker

James

King

Gary

Blackwell

Matthew

. 2011. “Amelia II: A Program for Missing Data.” Journal of Statistical Software 45(7):1–47.

12.

James

Gareth

Witten

Daniela

Hastie

Trevor

Tibshirani

Robert

. 2017. An Introduction to Statistical Learning: With Applications in R. New York: Springer.

13.

Kenkel

Brenton

Signorino

Curtis S.

2014. “Package ‘polywog.’” Retrieved December 8, 2018 (https://cran.r-project.org/web/packages/polywog/index.html).

14.

Kuhn

Max

. 2017. “caret: Classification and Regression Training.” R package version 6.0.78. (https://CRAN.R-project.org/package=caret).

15.

Larmarange

Joseph

. 2017. “labelled: Manipulating Labelled Data.” R package version 1.0.1. (https://CRAN.R-project.org/package=labelled).

16.

Lee

Anthony

Caron

Francois

Doucet

Arnaud

Holmes

Chris

. 2010. “A Hierarchical Bayesian Framework for Constructing Sparsity-inducing Priors.” arXiv. Retrieved December 8, 2018 (https://arxiv.org/abs/1009.1914).

17.

Microsoft Corporation, and Weston

Steve

. 2017. “doParallel: Foreach Parallel Adaptor for the ‘parallel’ Package.” R package version 1.0.11. (https://CRAN.R-project.org/package=doParallel).

18.

Müller

Kirill

. 2018. “rprojroot: Finding Files in Project Sub-directories.” R package version 1.3-2. (https://CRAN.R-project.org/package=rprojroot).

19.

Pemstein

Daniel

Marquardt

Kyle L.

Tzelgov

Eitan

Wang

Yi-ting

Miri

Farhad

. 2015. “The V-Dem Measurement Model: Latent Variable Analysis for Cross-national and Cross-temporal Expert-coded Data.” Varieties of Democracy Institute Working Paper No. 21. Retrieved (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2704787).

20.

Piironen

Juho

Vehtari

Aki

. 2016. “On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior.” arXiv. Retrieved December 8, 2018 (https://arxiv.org/abs/1610.05559).

21.

Raftery

Adrian E.

Nan

Ševčíková

Hana

Gerland

Patrick

Heilig

Gerhard K.

2012. “Bayesian Probabilistic Population Projections for All Countries.” Proceedings of the National Academy of Sciences 109(35):13915–21.

22.

R Core Team. 2017. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. (https://www.R-project.org/).

23.

Reichman

Nancy E.

Teitler

Julien O.

Garfinkel

Irwin

McLanahan

Sara S.

2001. “Fragile Families: Sample and Design.” Children and Youth Services Review 23(4–5):303–26.

24.

Salganik

Matthew J.

Levy

Karen E. C.

2015. “Wiki Surveys: Open and Quantifiable Social Data Collection.” PLoS ONE 10(5):e0123483.

25.

Wickham

Hadley

. 2009. ggplot2: Elegant Graphics for Data Analysis. New York: Springer.

26.

Wickham

Hadley

. 2011. “The Split-Apply-Combine Strategy for Data Analysis.” Journal of Statistical Software 40(1):1–29.

27.

Wickham

Hadley

. 2018a. “forcats: Tools for Working with Categorical Variables (Factors).” R package version 0.3.0. (https://CRAN.R-project.org/package=forcats).

28.

Wickham

Hadley

. 2018b. “stringr: Simple, Consistent Wrappers for Common String Operations.” R package version 1.3.0. (https://CRAN.R-project.org/package=stringr).

29.

Wickham

Hadley

François

Romain

Henry

Lionel

Müller

Kirill

. 2017. “dplyr: A Grammar of Data Manipulation.” R package version 0.7.4. (https://CRAN.R-project.org/package=dplyr).

30.

Wickham

Hadley

Henry

Lionel

. 2018. “tidyr: Easily Tidy Data with ‘spread()’ and ‘gather()’ Functions.” R package version 0.8.0. (https://CRAN.R-project.org/package=tidyr).

31.

Wickham

Hadley

Hester

Jim

Chang

Winston

. 2018. “devtools: Tools to Make Developing R Packages Easier.” R package version 1.13.5. (https://CRAN.R-project.org/package=devtools).

32.

Wickham

Hadley

Hester

Jim

François

Romain

. 2017. “readr: Read Rectangular Text Data.” R package version 1.1.1. (https://CRAN.R-project.org/package=readr).

33.

Wickham

Hadley

Miller

Evan

. 2017. “haven: Import and Export ‘SPSS’, ‘Stata’ and ‘SAS’ Files.” R package version 1.1.0. (https://CRAN.R-project.org/package=haven).

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.40 MB

Humans in the Loop: Incorporating Expert and Crowd-Sourced Knowledge for Predictions Using Survey Data

Abstract

Keywords

Using Expert and Crowd-Sourced Knowledge

Imputation

Models

Results

Imputation

Subsetting

Scoring

Humans in the Loop?

Conclusions

Supplemental Material

SRD-17-0125 – Supplemental material for Humans in the Loop: Incorporating Expert and Crowd-sourced Knowledge for Predictions Using Survey Data

Footnotes

Funding

ORCID iD

Supplemental Material

Author Biographies

References

Supplementary Material