Jelena Bradic, University of California, San Diego

Support Recovery Via Randomized Maximum-Contrast Subagging
Dec 10, 2014, 12:30 pm1:30 pm
101 - Sherrerd Hall


Event Description

We study subsample aggregating (subagging) in the context of variable selection in sparse and large-scale regression settings, where both the number of parameters and the number of samples can be extremely large. We develop theory that identifies the subsample settings under which traditional subagging fails to retrieve the sparsity set of interest. For such settings, we introduce its randomized and smoothed alternative that successfully recovers the sparsity set. The proposed method is based on running many randomized estimators on the subsamples of the data, each consisting only of a small portion of the original data, and aggregating the results with a novel multiple voting scheme. Our theoretical results show that, in addtion to the computational speedup, statistical optimality is still retained: the proposed method achieves minimax rates for approximate recovery over all estimators using the full set of samples. Furthermore, our results allow the number of subsamples to grow with the subsample size. Experiments on simulated data show that our method outperforms traditional subagging whenever the subsample size is of the smaller order relative to the original sample size.

Event Category
S. S. Wilks Memorial Seminar in Statistics