The success of machine learning relies heavily on data. On one hand, machine learning data is all around us: we are generating unprecedented amounts on our personal devices, in smart homes and cities, and within organizations such as hospitals or financial institutions. However, this data is often siloed—residing in the phones, sensors, or organizations that generated it. Federated learning aims to enable efficient and trustworthy access to siloed data through decentralized training. In this talk, I discuss our work developing foundational tools for federated learning, including techniques that improve the accuracy and efficiency of learning across data silos; mitigate risk and protect data privacy and ownership; and incorporate social and economic principles that incentivize data sharing and provide trustworthy cooperative learning schemes. I explore applications that have been successfully powered by our work—such as language modeling in mobile phones, smart home anomaly detection, and pandemic forecasting—and conclude with an outlook on missing pieces that are needed to enable the next generation of collaborative learning systems.
Virginia Smith is an assistant professor in the Machine Learning Department at Carnegie Mellon University. Her research spans machine learning, optimization, and distributed systems. Virginia’s current work addresses challenges related to optimization, privacy, and robustness in distributed settings to enable trustworthy federated learning at scale. Virginia’s work has been recognized by an NSF CAREER Award, MIT TR35 Innovator Award, Intel Rising Star Award, and faculty awards from Google, Apple, and Meta. Prior to CMU, Virginia was a postdoc at Stanford University and received a Ph.D. in Computer Science from UC Berkeley.