A proliferation of emerging data science applications require efficient extraction of information from complex data. The unprecedented scale of relevant features, however, often overwhelms the volume of available samples, which dramatically complicates statistical inference and decision making. In this talk, we present two vignettes on how to improve sample efficiency in high-dimensional statistical problems.

The first vignette explores inference based on the Lasso estimator, in the scenario where the sample size is on the same order of the sparsity level and where the covariates might be correlated. Classical asymptotic statistics fail due to two fundamental reasons: (1) The regularized risk is non-smooth; (2) The discrepancy between the estimator and the true parameter vector cannot be neglected. We pin down precisely the distribution of the Lasso, as well as its debiased version, under a broad class of Gaussian correlated designs. Our findings suggest that a careful degree-of-freedom correction is crucial for computing valid confidence intervals in this challenging regime.

In the second vignette, we turn to reinforcement learning with a generative model. Despite a number of prior work tackling this problem, prior results suffer from a sample size barrier, in the sense that their claimed statistical guarantees hold only when the sample size exceeds a certain level. We overcome this barrier by certifying the minimax optimality of model-based reinforcement learning (a perturbed model-based planning algorithm). This provides the first algorithm that accommodates the entire range of sample sizes (beyond which finding a meaningful policy is information-theoretically impossible).

Bio: Yuting Wei is currently an assistant professor in the Statistics and Data Science department at Carnegie Mellon University. Prior to that, she was a Stein Fellow at Stanford University, and she received her Ph.D. in statistics from the University of California, Berkeley in 2018, under the supervision of Martin Wainwright and Aditya Guntuboyina. She was the recipient of the 2018 Erich L. Lehmann Citation from the Berkeley statistics department for her Ph.D. dissertation in theoretical statistics. Her research interests include high-dimensional and non-parametric statistics, statistical machine learning, and reinforcement learning.