Sage Journals: Discover world-class research

Abstract

Machine learning (ML) methods for causal inference have gained popularity due to their flexibility to predict the outcome model and the propensity score. In this article, we provide a within-group approach for ML-based causal inference methods in order to robustly estimate average treatment effects in multilevel studies when there is cluster-level unmeasured confounding. We focus on one particular ML-based causal inference method based on the targeted maximum likelihood estimation (TMLE) with an ensemble learner called SuperLearner. Through our simulation studies, we observe that training TMLE within groups of similar clusters helps remove bias from cluster-level unmeasured confounders. Also, using within-group propensity scores estimated from fixed effects logistic regression increases the robustness of the proposed within-group TMLE method. Even if the propensity scores are partially misspecified, the within-group TMLE still produces robust ATE estimates due to double robustness with flexible modeling, unlike parametric-based inverse propensity weighting methods. We demonstrate our proposed methods and conduct sensitivity analyses against the number of groups and individual-level unmeasured confounding to evaluate the effect of taking an eighth-grade algebra course on math achievement in the Early Childhood Longitudinal Study.

Keywords

causal inference machine learning methods unmeasured variables omitted variable bias cluster-level unmeasured confounders fixed effects models targeted maximum likelihood estimation

Get full access to this article

View all access options for this article.

References

Anderson

Chang

(2011). Mathematics course-taking in rural high schools. Journal of Research in Rural Education, 26(1), 1–10. http://sites.psu.edu/jrre/wp-content/uploads/sites/6347/2014/02/26-1.pdf

Arkhangelsky

Imbens

(2019). The role of the propensity score in fixed effect models. https://doi.org/10.3386/w24814

Arpino

Cannas

(2016). Propensity score matching with clustered data. an application to the estimation of the impact of caesarean section on the Apgar score. Statistics in Medicine, 35(12), 2074–2091. https://doi.org/10.1002/sim.6880

Arpino

Mealli

(2011). The specification of the propensity score in multilevel observational studies. Computational Statistics & Data Analysis, 55(4), 1770–1780. https://doi.org/10.1016/j.csda.2010.11.008

Athey

Imbens

(2016). Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113(27), 7353–7360. https://doi.org/10.1073/pnas.1510489113

Athey

Tibshirani

Wager

(2019). Generalized random forests. The Annals of Statistics, 47 (2), 1148–1178. https://doi.org/10.1214/18-AOS1709

Athey

Wager

(2019). Estimating treatment effects with causal forests: An application. Observational Studies, 5(2), 37–51. https://doi.org/10.1353/obs.2019.0001

Balzer

L. B.

Zheng

van der Laan

M. J.

Petersen

M. L.

(2019). A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure. Statistical Methods in Medical Research, 28(6), 1761–1780. https://doi.org/10.1177/0962280218774936

Bates

Mächler

Bolker

Walker

(2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01

10.

Chang

Stuart

E. A.

(2022). Propensity score methods for observational studies with clustered data: A review. Statistics in Medicine, 41(18), 3612–3626. https://doi.org/10.1002/sim.9437

11.

Chernozhukov

Cinelli

Newey

Sharma

Syrgkanis

(2021). Omitted variable bias in machine learned causal models. arXiv. https://arxiv.org/pdf/2112.13398.pdf

12.

Clogg

C. C.

(1995). Latent class models. In Arminger

Clogg

(Eds.), Handbook of statistical modeling for the social and behavioral sciences (pp. 311–359). Springer.

13.

Cogan

L. S.

Schmidt

W. H.

Wiley

D. E.

(2001). Who takes what math and in which track? Using TIMSS to characterize us students’ eighth-grade mathematics learning opportunities. Educational Evaluation and Policy Analysis, 23(4), 323–341. https://doi.org/10.3102/01623737023004323

14.

Dorie

Hill

Shalit

Scott

Cervone

(2019). Automated versus do-it-yourself methods for causal inference: Lessons learned from a data analysis competition. Statistical Science, 34 (1), 43–68. https://doi.org/10.1214/18-STS667

15.

Gruber

van der Laan

M. J.

(2012). tmle: An R package for targeted maximum likelihood estimation. Journal of Statistical Software, 51(13), 1–35. https://doi.org/10.18637/jss.v051.i13

16.

Hartigan

J. A.

Wong

M. A.

(1979). Algorithm as 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), 100–108. https://doi.org/10.2307/2346830

17.

Hastie

Tibshirani

Friedman

J. H.

Friedman

J. H.

(2009). The elements of statistical learning: Data mining, inference, and prediction (Vol. 2). Springer. https://doi.org/10.1007/978-0-387-84858-7

18.

(2018). Inverse conditional probability weighting with clustered data in causal inference. arXiv. https://doi.org/10.48550/arXiv.1808.01647

19.

Hernan

M. A.

Robins

J. M.

(2020). Causal inference: What if. Chapman & Hall/CRC. https://cdn1.sph.harvard.edu/wp-content/uploads/sites/1268/2021/03/ciwhatif_hernanrobins_30mar21.pdf

20.

Hill

J. L.

(2011). Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20(1), 217–240. https://doi.org/10.1198/jcgs.2010.08162

21.

Imai

Kim

I. S.

(2019). When should we use unit fixed effects regression models for causal inference with longitudinal data? American Journal of Political Science, 63(2), 467–490. https://doi.org/10.1111/ajps.12417

22.

Imai

Ratkovic

(2013). Estimating treatment effect heterogeneity in randomized program evaluation. The Annals of Applied Statistics, 7(1), 443–470. https://doi.org/10.1214/12-AOAS593

23.

Imbens

G. W.

Rubin

D. B.

(2015). Causal inference in statistics, social, and biomedical sciences. Cambridge University Press. https://doi.org/10.1017/CBO9781139025751

24.

Kaufman

Rousseeuw

P. J.

(2009). Finding groups in data: An introduction to cluster analysis. John Wiley & Sons.

25.

Kim

J.-S.

Seltzer

(2007). Causal inference in multilevel settings in which selection processes vary across schools. (Working paper 708), ERIC—Institute of Education Sciences.

26.

Kim

J.-S.

Steiner

P. M.

(2015). Multilevel propensity score methods for estimating causal effects: A latent class modeling strategy. In Ark

L. van der

Bolt

D. M.

Wang

W.-C.

Douglas

J. A.

Wiberg

(Eds.), Quantitative psychology research: The 80th annual meeting of the psychometric society (pp. 293–306). Springer. https://doi.org/10.1007/978-3-319-19977-1_21

27.

Künzel

S. R.

Sekhon

J. S.

Bickel

P. J.

(2019). Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the National Academy of Sciences, 116 (10), 4156–4165. https://doi.org/10.1073/pnas.1804597116

28.

Lee

Nguyen

T. Q.

Stuart

E. A.

(2021). Partially pooled propensity score models for average treatment effect estimation with multilevel data. Journal of the Royal Statistical Society: Series A (Statistics in Society), 0(0), 1–21. https://doi.org/10.1111/rssa.12741

29.

Leite

W. L.

Jimenez

Kaya

Stapleton

L. M.

MacInnes

J. W.

Sandbach

(2015). An evaluation of weighting methods based on propensity scores to reduce selection bias in multilevel observational studies. Multivariate Behavioral Research, 50(3), 265–284. https://doi.org/10.1080/00273171.2014.991018

30.

Zaslavsky

A. M.

Landrum

M. B.

(2013). Propensity score weighting with multilevel data. Statistics in Medicine, 32(19), 3373–3387. https://doi.org/10.1002/sim.5786

31.

Luque-Fernandez

M. A.

Schomaker

Rachet

Schnitzer

M. E.

(2018). Targeted maximum likelihood estimation for a binary treatment: A tutorial. Statistics in Medicine, 37 (16), 2530–2546.

32.

MacQueen

(1967). Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1(14), 281–297.

33.

McLachlan

G. J.

Peel

(2000). Finite mixture models. Wiley. https://doi.org/10.1002/0471721182

34.

Neyman

J. S.

(1923). On the application of probability theory to agricultural experiments: Essay on principles. section 9 (with discussion). Statistical Science, 4, 465–480.

35.

Opdenakker

M.-C.

Van Damme

(2001). Relationship between school composition and characteristics of school process and their effect on mathematics achievement. British Educational Research Journal, 27(4), 407–432. https://doi.org/10.1080/01411920120071434

36.

Polley

LeDell

Kennedy

van der Laan

(2021). Superlearner: Super learner prediction. [R package version 2.0-28]. https://CRAN.R-project.org/package=SuperLearner

37.

Porter

K. E.

Gruber

Van Der Laan

M. J.

Sekhon

J. S.

(2011). The relative performance of targeted maximum likelihood estimators. The International Journal of Biostatistics, 7(1), 31. https://doi.org/10.2202/1557-4679.1308

38.

Rickles

J. H.

(2013). Examining heterogeneity in the effect of taking algebra in eighth grade. The Journal of Educational Research, 106(4), 251–268. https://doi.org/10.1080/00220671.2012.692731

39.

Rickles

J. H.

Seltzer

(2014). A two-stage propensity score matching strategy for treatment effect estimation in a multisite observational study. Journal of Educational and Behavioral Statistics, 39(6), 612–636. https://doi.org/10.3102/1076998614559748

40.

Rosenbaum

P. R.

Rubin

D. B.

(1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55. https://doi.org/10.1093/biomet/70.1.41

41.

Rubin

D. B.

(1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701. https://doi.org/10.1037/h0037350

42.

Rubin

D. B.

(1986). Comment: Which ifs have causal answers. Journal of the American Statistical Association, 81(396), 961–962. https://doi.org/10.1080/01621459.1986.10478355

43.

Schuler

M. S.

Chu

Coffman

(2016). Propensity score weighting for a continuous exposure with multilevel data. Health Services and Outcomes Research Methodology, 16(4), 271–292. https://doi.org/10.1007/s10742-016-0157-5

44.

Steiner

P. M.

Cook

(2013). Matching and propensity scores. In Little

(Ed.), The oxford handbook of quantitative methods (p. 236–258). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199934874.013.0013

45.

Tsai

C.-L.

Wang

Nickerson

D. M.

(2009). Subgroup analysis via recursive partitioning. Journal of Machine Learning Research, 10(2), 141–158. https://doi.org/10.2139/ssrn.1341380

46.

Suk

Kang

(2022a). Robust machine learning for treatment effects in multilevel observational studies under cluster-level unmeasured confounding. Psychometrika, 87(1), 310–343. https://doi.org/10.1007/s11336-021-09805-x

47.

Suk

Kang

(2022b). Tuning random forests for causal inference under cluster-level unmeasured confounding. Multivariate Behavioral Research, 0(0), 1–33. https://doi.org/10.1080/00273171.2021.1994364

48.

Suk

Kang

Kim

J.-S.

(2021). Random forests approach for causal inference with clustered observational data. Multivariate Behavioral Research, 56(6), 829–852. https://doi.org/10.1080/00273171.2020.1808437

49.

Suk

Kim

J.-S.

(2019). Measuring the heterogeneity of treatment effects with multilevel observational data. In Wiberg

Culpepper

Janssen

González

Molenaar

(Eds.), Quantitative psychology (pp. 265–277). Springer. https://doi.org/10.1007/978-3-030-01310-3_24

50.

Suk

Kim

J.-S.

Kang

(2021). Hybridizing machine learning methods and finite mixture models for estimating heterogeneous treatment effects in latent classes. Journal of Educational and Behavioral Statistics, 46(3), 323–347. https://doi.org/10.3102/1076998620951983

51.

Thoemmes

F. J.

West

S. G.

(2011). The use of propensity scores for nonrandomized designs with clustered data. Multivariate Behavioral Research, 46(3), 514–543. https://doi.org/10.1080/00273171.2011.569395

52.

Tibshirani

Walther

Hastie

(2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411–423. https://doi.org/10.1111/1467-9868.00293

53.

Tourangeau

Nord

Sorongon

A. G.

Chapman

. (2009). Early Childhood Longitudinal Study, kindergarten class of 1998–99 (ECLS-K). Eighth-grade methodology report. https://nces.ed.gov/pubs2009/2009003.pdf

54.

van der Laan

M. J.

Polley

E. C.

Hubbard

A. E.

(2007). Super learner. Statistical Applications in Genetics and Molecular Biology, 6(1), 25. https://doi.org/10.2202/1544-6115.1309

55.

van der Laan

M. J.

Rose

(2011). Targeted learning: Causal inference for observational and experimental data. Springer Science & Business Media.

56.

Wager

Athey

(2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 1228–1242. https://doi.org/10.1080/01621459.2017.1319839

57.

Walston

McCarroll

J. C.

(2010). Eighth-grade algebra: Findings from the eighth-grade round of the early childhood longitudinal study, kindergarten class of 1998-99 (ECLS-K). statistics in brief. NCES 2010-016. National Center for Education Statistics.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

6.86 MB

A Within-Group Approach to Ensemble Machine Learning Methods for Causal Inference in Multilevel Studies

Abstract

Keywords

Get full access to this article

References

Supplementary Material