Sage Journals: Discover world-class research

Abstract

Guarding against false positive selections is important in many applications. We discuss methods based on subsampling and sample splitting for controlling the expected number of false positives and assigning p-values. They are generic and especially useful for high-dimensional settings. We review encouraging results for regression, and we discuss new adaptations and remaining challenges for selecting relevant variables, based on observational data, having a causal or interventional effect on a response of interest.

Keywords

high-dimensional causal inference high-dimensional regression Lasso observational data PC-algorithm stability selection

Get full access to this article

View all access options for this article.

References

Tibshirani

. Regression analysis and selection via the Lasso. J Royal Stat Soc B 1996; 58: 267–288.

Fan

. Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 2001; 96: 1348–1360.

Zou

Hastie

. Regularization and variable selection via the Elastic Net. J Royal Stat Soc B 2005; 67: 301–320.

Meinshausen

Bühlmann

. High-dimensional graphs and variable selection with the Lasso. Ann Stat 2006; 34: 1436–1462.

Zou

. The adaptive Lasso and its oracle properties. J Am Stat Assoc 2006; 101: 1418–1429.

Zhao

. On model selection consistency of Lasso. J Mach Le Res 2006; 7: 2541–2563.

Meinshausen

. Relaxed Lasso. Comput Stat Data Anal 2007; 52: 374–393.

Zou

. One-step sparse estimates in nonconcave penalized likelihood models (with discussion). Ann Stat 2008; 36: 1509–1566.

Zhang

. Nearly unbiased variable selection under minimax concave penalty. Ann Stat 2010; 38: 894–942.

10.

Bühlmann

Kalisch

Maathuis

. Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm. Biometrika 2010; 97: 261–278.

11.

Maathuis

Kalisch

Bühlmann

. Estimating high-dimensional intervention effects from observational data. Ann Stat 2009; 37: 3133–3164.

12.

Efron B. Large-Scale inference: empirical Bayes methods for estimation, testing, and prediction. Institute of Mathematical Statistics Monographs. Cambridge:, Cambridge University Press, Second edition, 2010.

13.

Goeman

van de Geer

van Houwelingen

. Testing against a high dimensional alternative. J R Stat Soc B 2006; 68(3): 477–493.

14.

Meinshausen

Bühlmann

. Stability selection (with discussion). J Royal Stat Soc B 2010; 72: 417–473.

15.

Meinshausen

Meier

Bühlmann

. P-values for high-dimensional regression. J Am Stat Assoc 2009; 104: 1671–1681.

16.

Bühlmann

van de Geer

. Statistics for high-dimensional data: methods, theory and applications, New York: Springer, 2011.

17.

Conlon

Liu

Lieb

Liu

. Integrating regulatory motif discovery and genome-wide expression analysis. Proc Nat Acad Sci 2003; 100: 3339–3344.

18.

Liu

Brutlag

Liu

. An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nat Biotechnol 2002; 20: 835–839.

19.

Freedman

. A remark on the difference between sampling with and without replacement. J Am Stat Assoc 1977; 72: 681–681.

20.

Bühlmann

. Analyzing bagging. Ann Stat 2002; 30: 927–961.

21.

Bühlmann

. Boosting for high-dimensional linear models. Ann Stat 2006; 34: 559–583.

22.

Wasserman

Roeder

. High dimensional variable selection. Ann Stat 2009; 37: 2178–2201.

23.

van de Geer

Bühlmann

. On the conditions used to prove oracle results for the Lasso. Electron J Stat 2009; 3: 1360–1392.

24.

Spirtes P, Glymour C and Scheines R. Causation, prediction, and search. Cambridge, MA MIT Press, Second edition, 2000.

25.

Pearl

. Causality: models, reasoning and inference, Cambridge: Cambridge University Press, 2000.

26.

Maathuis

Colombo

Kalisch

Bühlmann

. Predicting causal effects in large-scale systems from observational data. Nat Methods 2010; 7: 247–248.

27.

Wille

Bühlmann

. Low-order conditional independence graphs for inferring genetic networks. Stat Appl Genet Mol Biol 2006; 5: 1–32.

28.

Kalisch

Bühlmann

. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J Mach Learn Res 2007; 8: 613–636.

29.

Hughes

Marton

Jones

Roberts

Stoughton

Armour

HA . Functional discovery via a compendium of expression profiles. Cell 2000; 102: 109–126.

30.

R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2011. ISBN 3-900051-07-0.

31.

Friedman

Hastie

Tibshirani

. Regularization paths for generalized linear models via coordinate descent. J Stat Software 2010; 33(1): 1–22.

32.

Kalisch M, Mächler M, Colombo D, Maathuis MH and Bühlmann P. Causal inference using graphical models with the R package pcalg. Preprint, available at http://cran.rproject.org/web/packages/pcalg/vignettes/pcalgDoc.pdf, 2010.