Cong Ma *20, University of Chicago

Batched Nonparametric Contextual Bandits
Apr 8, 2024, 12:25 pm1:25 pm


Event Description

In this talk, we consider nonparametric contextual bandits with batch constraints, where the expected reward for each action is modeled as a smooth function of covariates, and the policy updates are made at the end of each batch of observations. This setting finds numerous applications in clinical trials, online recommender systems, and crowdsourcing.

We establish a minimax regret lower bound for this setting and propose Batched Successive Elimination with Dynamic Binning (BaSEDB) that achieves optimal regret (up to logarithmic factors). In essence, BaSEDB dynamically splits the covariate space into smaller bins, carefully aligning their widths with the batch size. We also show the suboptimality of static binning under batch constraints, highlighting the necessity of dynamic binning. Additionally, our results suggest that a nearly constant number of policy updates can attain optimal regret in the fully online setting.

Event Category
S. S. Wilks Memorial Seminar in Statistics