Xi Chen, NYU

Statistical Estimation and Sequential Analysis for Crowdsourcing
Date
Nov 20, 2015, 12:30 pm1:30 pm
Location
213 - Sherrerd Hall
Event Description

Abstract: Crowdsourcing is a popular paradigm for effectively collecting labels at low cost. In this talk, we discuss two important statistical problems in crowdsourcing for categorical labeling tasks: (1) estimation of true labels and workers’ quality from the static noisy labels provided by non-expert crowdsourcing workers; (2) the optimal stopping and worker selection in a sequential labeling process, which can improve the labeling accuracy while saving the labeling cost. The MLE-based Dawid-Skene estimator has been widely used for the first estimation problem. However, it is hard to theoretically justify its performance due to the non-convexity of log-likelihood function. We propose a two-stage algorithm where the first stage uses the spectral method to obtain an initial estimate and the second stage refines the estimation via the EM algorithm. We show that our algorithm achieves the optimal convergence rate up to a logarithmic factor. For the second sequential labeling problem in crowdsourcing, we propose an adaptive sequential probability ratio test (Ada-SPRT) that obtains the optimal worker selection rule, stopping time and final decision rule under a single Bayesian decision framework.

Bio: Xi Chen is an assistant professor at Department of Information, Operations, and Management Sciences at Stern School of Business at New York University. Before that, he was a Postdoc in the group of Prof. Michael Jordan at UC Berkeley. He obtained his Ph.D. from the Machine Learning Department at Carnegie Mellon University (CMU); and his Masters degree in Industry Administration and Operations Research from the Tepper School of Business at CMU.

He studies machine learning, high-dimensional statistics and operations research. He is developing parametric and non-parametric statistical methods as well as optimization algorithms to address challenges in high-dimensional data analysis. He investigates machine learning foundations and sequential analysis for crowdsourcing. He also studies operations research/management problems, such as the optimal network design in process flexibility, and data-driven revenue management. He received Simons-Berkeley Research Fellowship and Google Faculty Research Award.

Event Category
S. S. Wilks Memorial Seminar in Statistics