High-Dimensional Principal Component Analysis with Heterogeneous Missingness

发布者:季洁发布时间:2020-05-14浏览次数:255

SpeakerZhu Ziwei, University of Michigan
HostLi Xudong, School of Data Science, Fudan University
Time9:30-11:30, May 18, 2020

Zoom Meeting ID

98285590674
Zoom Meeting Code509721
Abstract

In this talk, I will focus on the effect of missing data in Principal Component Analysis (PCA). In simple, homogeneous missingness settings with a noise level of constant order, we show that an existing inverse-probability weighted (IPW) estimator of the leading principal components can (nearly) attain the minimax optimal rate of convergence, and discover a new phase transition phenomenon along the way. However, deeper investigation reveals both that, particularly in more realistic settings where the missingness mechanism is heterogeneous, the empirical performance of the IPW estimator can be unsatisfactory, and moreover that, in the noiseless case, it fails to provide exact recovery of the principal components.  Our main contribution, then, is to introduce a new method for high-dimensional PCA, called ``primePCA'', that is designed to cope with situations where observations may be missing in a heterogeneous manner. Our numerical studies on both simulated and real data reveal that ``primePCA'' exhibits very encouraging performance across a wide range of scenarios.

Bio

Ziwei Zhu is currently an Assistant Professor at the Department of Statistics at the University of Michigan, Ann Arbor. His research interests include distributed statistical inference, robust statistics, low-rank matrix estimation and missing data. Prior to UMich, he was a research associate at the Statistical Laboratory at the University of Cambridge, hosted by Professor Richard Samworth. He obtained his Ph.D. degree at the Department of Operations Research and Financial Engineering (ORFE) at Princeton University, advised by Professor Jianqing Fan.