Stein's formula states that a random variable of the form z′f(z)−divf(z) is mean-zero for all functions divf with integrable gradient. Here, divfdivf is the divergence of the function f and z is a standard normal vector. We develop Second-Order Stein formulae for statistical inference with high-dimensional data. In the simplest form, the Second-Order Stein formula characterizes the variance of z′f(z)−divf(z). It also implies bounds on the variance of a function f(z) of a standard normal vector and these bounds are of a different nature than the classical Poincare or log-Sobolev inequalities.

The motivation that led to the above probabilistic results was the study of degrees-of-freedom adjustments in high-dimensional inference problems. A well-understood degrees-of-freedom adjustment appears in Stein's Unbiased Risk Estimate (SURE) to construct an unbiased estimate of the mean square risk of almost any estimator mu; here the divergence of μ plays the role of degrees-of-freedom or the estimator. A first application of the Second Order Stein formula is an Unbiased Risk Estimate of the risk of SURE itself (SURE for SURE): a simple unbiased estimate provides information about the squared distance between SURE and the squared estimation error of μ.

A novel analysis reveals that degrees-of-freedom adjustments play a major role in de-biasing methodologies to construct confidence intervals in high-dimension. We will see that in sparse linear regression for the Lasso for Gaussin designs, existing de-biasing schemes need to be modified with an adjustment that accounts for the degrees-of-freedom of the Lasso. This degrees-of-freedom adjustment is necessary for statistical efficiency in the regime s>>>n2/3.

Joint work with Cun-Hui Zhang (Rutgers).