Data scientists are often faced with the challenge of understanding a high dimensional data set organized as a table. These tables may have columns of different (sometimes, non-numeric) types, and often have many missing entries. This talk surveys methods based on low rank models to analyze these big messy data sets. We show that low rank models perform well — indeed, suspiciously well — across a wide range of data science applications, including in social science, medicine, and machine learning. This good performance demands (and this talk provides) a simple mathematical explanation for their effectiveness, which identifies when low rank models perform well and when to look beyond low rank.
Short Bio: Madeleine Udell is Assistant Professor of Operations Research and Information Engineering and Richard and Sybil Smith Sesquicentennial Fellow at Cornell University. She studies optimization and machine learning for large scale data analysis and control, with applications in marketing, demographic modeling, medical informatics, engineering system design, and automated machine learning. Her awards include an Alfred P. Sloan Research Fellowship (2021), a National Science Foundation CAREER award (2020), an Office of Naval Research (ONR) Young Investigator Award (2020), a Cornell Engineering Research Excellence Award (2020), an INFORMS Optimization Society Best Student Paper Award (as advisor) (2019), and INFORMS Doing Good with Good OR (2018). Her work is supported by grants from the NSF, ONR, DARPA, the Canadian Institutes of Health, and Capital One.