You are here


  • Munther Dahleh, MIT
    Title: A Marketplace for Data: An Algorithmic Solution

    Machine Learning and Data Science (ML) is starting to take the place in industry that "Information Technology" had in the late 1990s: businesses of all sizes and in all sectors, are recognizing how necessary it has become to develop predictive capabilities for continued profitability of their core competencies. To be effective, ML algorithms rely on high-quality training data – and not just any data, but data that is specific to the business problem that ML is applied to. Obtaining relevant training data can be very difficult for firms to do themselves, especially those early in their path towards incorporating ML into their operations. This problem is only further exacerbated, as businesses increasingly need to solve these prediction problems in real-time (e.g. a ride-share company setting prices, retailers/restaurants sending targeted coupons to clear inventory), which means that data gets “stale” quickly. Therefore, it is imperative that there are real-time market structures for the buying and selling of training data for ML. Further it is insufficient to view ML performance metrics (e.g. RMSE) in isolation of real-world applications; for example, a 10% increase in prediction accuracy means very different things for a hedge fund maximizing profits vs. a retailer decreasing inventory costs vs. a hospital trying to save lives. Hence the value of a dataset will necessarily have to consider more than simply the prediction accuracy it provides. Domain knowledge will be just as essential, if not more so, if we aim to view data as an asset and create a rigorous method to define its value.

    In this work, we aim to create a data marketplace – a robust matching mechanism to efficiently buy and sell data while optimizing social welfare and maximizing revenue. While the monetization of data and pre-trained models is an essential focus by many industries and vendors today, there does not exist a market mechanism that can price data and match buyers to vendors while still addressing the (computational and other) complexity associated with creating a market platform. The challenge in creating such a marketplace stems from the very nature of data as an asset: (i) it can be replicated at zero marginal cost; (ii) its value to a firm is inherently combinatorial (i.e. the value of a particular dataset depends on what other (potentially correlated) datasets are available); (iii) its value to a firm is dependent on which other firms get access to the same data; (iv) prediction tasks and the value of an increase in prediction accuracy vary widely between different firms, and so it is not obvious how to set prices for a collection of datasets with correlated signals; (v) finally, the authenticity and truthfulness of data is difficult to verify a priori without first applying it to a prediction task. Our proposed marketplace will take a holistic view of this problem and provide an algorithmic solution combining concepts from statistical machine learning, economics of data with respect to various application domains, algorithmic market design, and mathematical optimization under uncertainty. We will discuss some examples motivating this work.

    This is joint work with Anish Agarwal, Tuhin Sarkar, and Devavrat Shah.

    Bio: Munther A. Dahleh received his Ph.D. degree from Rice University, Houston, TX, in 1987 in Electrical and Computer Engineering. Since then, he has been with the Department of Electrical Engineering and Computer Science (EECS), MIT, Cambridge, MA, where he is now the William A. Coolidge Professor of EECS. He is also a faculty affiliate of the Sloan School of Management. He is the founding director of the newly formed MIT Institute for Data, Systems, and Society (IDSS). Previously, he held the positions of Associate Department Head of EECS, Acting Director of the Engineering Systems Division, and Acting Director of the Laboratory for Information and Decision Systems. He was a visiting Professor at the Department of Electrical Engineering, California Institute of Technology, Pasadena, CA, for the Spring of 1993. He has consulted for various national research laboratories and companies. Dr. Dahleh is interested in Networked Systems with applications to Social and Economic Networks, financial networks, Transportation Networks, Neural Networks, and the Power Grid. Specifically, he focuses on the development of foundational theory necessary to understand, monitor, and control systemic risk in interconnected systems. His work draws from various fields including game theory, optimal control, distributed optimization, information theory, and distributed learning. His collaborations include faculty from all five schools at MIT. Dr. Dahleh is the co-author (with Ignacio Diaz-Bobillo) of the book Control of Uncertain Systems: A Linear Programming Approach, published by Prentice-Hall, and the co-author (with Nicola Elia) of the book Computational Methods for Controller Design, published by Springer. He is four-time recipient of the George Axelby outstanding paper award for best paper in IEEE Transactions on Automatic Control. He is also the recipient of the Donald P. Eckman award from the American Control Council in 1993 for the best control engineer under 35. He is a fellow of IEEE and IFAC. He has given many keynote lectures at major conferences.