Yingying Fan, University of Southern California

Asymptotic Properties of High-Dimensional Random Forests
Date
Apr 19, 2022, 4:30 pm5:30 pm
Location
101 - Sherrerd Hall
Event Description

As a flexible nonparametric learning tool, random forests algorithm has been widely applied to various real applications with appealing empirical performance, even in the presence of highdimensional feature space. Unveiling the underlying mechanisms has led to some important recent theoretical results on the consistency of the random forests algorithm and its variants. However, to our knowledge, all existing works concerning random forests consistency in high dimensional setting were established for various modified random forests models where the splitting rules are independent of the response. In light of this, in this paper we derive the consistency rates for the random forests algorithm associated with the sample CART splitting criterion, which is the one used in the original version of the algorithm (Breiman2001), in a general high-dimensional nonparametric regression setting through a bias-variance decomposition analysis. Our new theoretical results show that random forests can indeed adapt to high dimensionality and allow for discontinuous regression function. Our bias analysis characterizes explicitly how the random forests bias depends on the sample size, tree height, and column subsampling parameter. Some limitations on our current results are also discussed.


Short bio: Yingying Fan is Centennial Chair in Business Administration and Professor in Data Sciences and Operations Department of the Marshall School of Business at the University of Southern California. She received her Ph.D. in Operations Research and Financial Engineering from Princeton University in 2007. She was Lecturer in the Department of Statistics at Harvard University from 2007-2008 and Dean's Associate Professor in Business Administration at USC from 2018-2021. Her research interests include statistics, data science, machine learning, economics, big data and business applications. Her latest works have focused on statistical inference for networks, and AI models empowered by some most recent developments in random matrix theory and statistical learning theory.