Random forests are ensemble learning methods introduced by Breiman [Mach. Learn. 45 (2001) 5–32] that operate by averaging several decision trees built on a randomly selected subspace of the data set. Despite their widespread use in practice, the respective roles of the different mechanisms at work in Breiman’s forests are not yet fully understood, neither is the tuning of the corresponding parameters. In this paper, we study the influence of two parameters, namely the subsampling rate and the tree depth, on Breiman’s forests performance. More precisely, we prove that quantile forests (a specific type of random forests) based on subsampling and quantile forests whose tree construction is terminated early have similar performances, as long as their respective parameters (subsampling rate and tree depth) are well chosen. Moreover, experiments show that a proper tuning of these parameters leads in most cases to an improvement of Breiman’s original forests in terms of mean squared error.
Mots-clés : Random forests, randomization, parameter tuning, subsampling, tree depth
@article{PS_2018__22__96_0, author = {Duroux, Roxane and Scornet, Erwan}, title = {Impact of subsampling and tree depth on random forests}, journal = {ESAIM: Probability and Statistics}, pages = {96--128}, publisher = {EDP-Sciences}, volume = {22}, year = {2018}, doi = {10.1051/ps/2018008}, mrnumber = {3891755}, zbl = {1409.62072}, language = {en}, url = {http://archive.numdam.org/articles/10.1051/ps/2018008/} }
TY - JOUR AU - Duroux, Roxane AU - Scornet, Erwan TI - Impact of subsampling and tree depth on random forests JO - ESAIM: Probability and Statistics PY - 2018 SP - 96 EP - 128 VL - 22 PB - EDP-Sciences UR - http://archive.numdam.org/articles/10.1051/ps/2018008/ DO - 10.1051/ps/2018008 LA - en ID - PS_2018__22__96_0 ER -
Duroux, Roxane; Scornet, Erwan. Impact of subsampling and tree depth on random forests. ESAIM: Probability and Statistics, Tome 22 (2018), pp. 96-128. doi : 10.1051/ps/2018008. http://archive.numdam.org/articles/10.1051/ps/2018008/
Analysis of Purely Random Forests Bias. Preprint (2014). | arXiv
and ,Analysis of a random forests model. J. Mach. Learn. Res. 13 (2012) 1063–1095. | MR | Zbl
,Cellular tree classifiers, in Algorithmic Learning Theory. Springer, Cham (2014) 8–17. | MR | Zbl
and ,Consistency of random forests and other averaging classifiers. J. Mach. Learn. Res. 9 (2008) 2015–2033. | MR | Zbl
, and ,Random forests. Mach. Learn. 45 (2001) 5–32. | DOI | Zbl
,Classification and Regression Trees. Chapman & Hall, CRC, Boca Raton (1984). | Zbl
, , and ,Bagging, boosting and ensemble methods, in Handbook of Computational Statistics. Springer, Berlin, Heidelberg (2012) 985–1022. | DOI | MR
,, and , Consistency of Online Random Forests. Vol. 28 of Proc. of ICML’13 Proceedings of the 30th International Conference on International Conference on Machine Learning, Atlanta, GA, USA June 6–21 (2013) 1256–1264.
Narrowing the gap: random forests in theory and in practice, in International Conference on Machine Learning (ICML) (2014).
, and ,A Probabilistic Theory of Pattern Recognition. Springer, New York (1996). | DOI | MR | Zbl
, and ,Gene selection and classification of microarray data using random forest. BMC Bioinform. 7 (2006) 1–13. | DOI
and ,Do we need hundreds of classifiers to solve real world classification problems. J. Mach. Learn. Res. 15 (2014) 3133–3181. | MR | Zbl
, , and ,Variance reduction in purely random forests. J. Nonparametric Stat. 24 (2012) 543–562. | DOI | MR | Zbl
,Variable selection using random forests. Pattern Recognit. Lett. 31 (2010) 2225–2236. | DOI
, and ,Consistency of random survival forests. Stat. Probab. Lett. 80 (2010) 1056–1064. | DOI | MR | Zbl
and ,High-dimensional additive modeling. Ann. Stat. 37 (2009) 3779–3821. | DOI | MR | Zbl
, and ,Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. J. Mach. Learn. Res. 17 (2015) 841–881. | MR
and ,Random forest for bioinformatics, in Ensemble Machine Learning. Springer, Boston, MA (2012) 307–323.
,Randomized trees for human pose detection, in IEEE Conference on Computer Vision and Pattern Recognition (2008) 1–8.
, , , and ,Improving the Robustness of Bagging with Reduced Sampling Size. Universitécatholique de Louvain (2014).
, and ,On the asymptotics of random forests. J. Multivar. Anal. 146 (2016) 72–83. | DOI | MR | Zbl
,Consistency of random forests. Ann. Stat. 43 (2015) 1716–1741. | DOI | MR | Zbl
, and ,Optimal rates of convergence for nonparametric estimators. Ann. Stat. 8 (1980) 1348–1360. | DOI | MR | Zbl
,Optimal global rates of convergence for nonparametric regression. Ann. Stat. 10 (1982) 1040–1053. | DOI | MR | Zbl
,Super learner. Stat. Appl. Genet. Mol. Biol. 6 (2007). | DOI | MR | Zbl
, and ,Asymptotic Theory for Random Forests. Preprint (2014). | arXiv
,Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. (2018) 1–15. | MR | Zbl
and ,Adaptive Concentration of Regression Trees, With Application to Random Forests (2015).
and ,Effect of subsampling rate on subbagging and related ensembles of stable classifiers, in International Conference on Pattern Recognition and Machine Intelligence. Springer (2009) 44–49. | DOI
and ,Cité par Sources :