Ziwei Zhu, University of Cambridge

High-Dimensional Principal Component Analysis with Missing Values
May 1, 2019, 4:30 pm5:30 pm
101 - Sherrerd Hall
Event Description

Abstract: In this talk, I will focus on the effect of missing data in Principal Component Analysis (PCA). Under a homogeneous missingness mechanism, the leading eigenspaces of a Hadamard reweighted sample covariance matrix are shown to achieve the minimax optimal rate up to logarithmic factors. A new phase transition phenomenon is identified. When the true leading eigenspaces satisfy the incoherence assumption, we can embrace much more flexible missingness mechanisms. We derive the statistical rate of the Hadamard-reweighting-based estimator under arbitrary deterministic observation regime. We then feed this estimator to a new tuning-free iterative algorithm called primePCA to refine its statistical performance. We show that under the noiseless setting, primePCA achieves exact recovery of the true leading eigenspaces with geometric convergence, provided that the initializer is close to the truth. Simulation study shows that primePCA enjoys similar performance as softImpute with oracle tuning under different heterogeneity levels of observation probabilities.

Short Bio: Ziwei Zhu is currently a post-doc researcher at the Statistical Laboratory at the University of Cambridge, hosted by Professor Richard Samworth. He received his Ph.D. in operations research and financial engineering from Princeton University, advised by Professor Jianqing Fan. His research focuses on distributed statistical inference, robust statistics and low-rank matrix estimation.