Details
Large language models (LLMs) such as GPT-4 sometimes appeared to be creative, solving novel tasks with a few demonstrations in the prompt. These tasks require the pre-trained models to generalize on distributions different from those from training data---which is known as out-of-distribution (OOD) generalization. For example, in symbolized language reasoning where names/labels are replaced by arbitrary symbols, yet the model can infer the names/labels without any finetuning.
In this talk, I will offer some new angles for understanding the emergent phenomena in LLMs, which hopefully provide empirical foundations for statistical theory for LLMs. By focusing on induction heads, which are a type of pervasive components within LLMs, I will show that learning the right compositional structure is a key to OOD generalization, and this learning process exhibits sharp transitions in training dynamics. Further, I propose the "common bridge representation hypothesis" as a compositional mechanism in Transformers, where a latent subspace in the embedding space acts as a bridge to align multiple attention heads across early and later layers.
Short Bio: Yiqiao Zhong is currently an assistant professor at the University of Wisconsin—Madison, Department of Statistics. Prior to joining UW Madison, Yiqiao was a postdoc at Stanford University, advised by Prof. Andrea Montanari and Prof. David Donoho. His research interest includes analysis of large language models, deep learning theory, and high-dimensional statistics. Yiqiao Zhong obtained his Ph.D. in 2019 from Princeton University, where he was advised by Prof. Jianqing Fan.