Decision trees and their ensembles are endowed with a rich set of diagnostic tools for ranking and screening variables in a predictive model. Despite the widespread use of tree based variable importance measures, pinning down their theoretical properties has been challenging and therefore largely unexplored. To address this gap between theory and practice, we derive finite sample performance guarantees for variable selection in nonparametric models using a single-level CART decision tree (a decision stump). Surprisingly, we find that even though they are highly inaccurate for estimation, decision stumps can still be used to perform consistent model selection.
Bio: Jason Klusowski received his Ph.D. in Statistics and Data Science from Yale University. He was formerly an Assistant Professor in Statistics at Rutgers University, New Brunswick for two years. Jason is broadly interested in statistical learning theory and methodology, information theory, and probability.