Details
Determining an explainable model is crucial for exploring mitigation strategies and understanding natural selection. Such model determination depends on three key actions: the selection of features, variable transformations, and models, in addition to the quality of data. However, most existing strategies focus on one action with assumptions about the others. This presentation introduces our proof-of-concept work on a triathlon learning approach for explainable models. This approach can handle many features or variables and incorporate complex transformation needs with some sufficiency guarantee, and uses an ensemble criterion of prediction accuracy, stability, and conformal inference at the final step. We demonstrate its successful application in understanding the social, physiological, and genetic contributions to the reproductive success of Tibetan women. Additionally, we will illustrate key concepts in dealing with missing data and its relation to selection biases or AI fairness as part of data curation and real data analysis strategies and conclude with some challenges to this approach and possible future work. This is joint work with Shenghao Ye, Mary Meyer, and Cynthia Beall.
Bio: Jiayang Sun is a Professor, Bernard Dunn Eminent Scholar, and Chair in the Department of Statistics at George Mason University. She has published in top statistical and computational journals, including AOS, JASA, AOP, Biometrika, Statistica Sinica, Biometrical Journal, Statistics in Medicine, JCGS, and SIAM J Sci. & Stat. Comp, as well as other statistical and scientific journals. Her statistical research has included simultaneous inference or multiple comparisons, biased sampling, measurement errors, mixtures, machine learning, causal inference, crowdsourcing, EHR, computing, high-dimensional and big data and random fields. Her interdisciplinary work is broad. She is an elected Fellow of ASA, IMS, and an elected member of ISI. Her work has been supported by awards from the NSF, NIH, NSA, DOD, DOE, VA, ASA, and INOVA.