# Invited Speakers

In this talk I will present one of the most versatile tools for efficient, distributed computing: factor graphs. Factor graphs nicely combine three different steps in solving a real-world task: (1) *(Statistical) Modeling* of relationship between (unobserved) parameters and (observed) data, (2) *Efficient Inference & Optimization* for computing the parameter marginals by conditioning on the data, (3) *Distributed Systems and Storage* by having explicit representations of storage and compute. After a short tutorial introduction into Factor Graphs and the Sum-Product-Algorithm, I will present a number of nontrivial applications including inference in Graphical Models, distributed computations such as MapReduce and building an large-scale linear algebra (e.g. matrix transposition).

Ralf is Director of Machine Learning at Amazon Berlin, Germany. In 2011, he worked at Facebook leading the Unified Ranking and Allocation team. From 2009 to 2011, he was Director of Microsoft's Future Social Experiences (FUSE) Lab UK working on the development of computational intelligence technologies on large online data collections. He holds both a diploma degree in Computer Science in 1997 and a Ph.D. degree in Statistics in 2000 from TU berlin. Ralf's research interests include Bayesian inference and decision making, computer games, kernel methods and statistical learning theory. Ralf is one of the inventors of the Drivatars system in the Forza Motorsport series as well as the TrueSkill ranking and matchmaking system in Xbox 360 Live. He also co-invented the adPredictor click-prediction technology launched in 2009 in Bing's online advertising system.

One of the frustrations of machine learning theory is that many of the underlying algorithmic problems are provably intractable (e.g., NP-hard or worse) or presumed to be intractable (e.g., the many open problems in Valiant's model). This talk will suggest that this seeming intractability may arise because many models used in machine learning are more general than they need to be. Careful reformulation as well as willingness to consider new models may allow progress. We will use examples from recent work: Nonnegative matrix factorization, Learning Topic Models, ICA with noise, etc.

(Based upon joint works with Rong Ge, Ravi Kannan, Ankur Moitra, Sushant Sachdeva, as well as papers that are not mine)

Sanjeev Arora is best known for his work on probabilistically checkable proofs and, in particular, the PCP theorem. He is currently the Charles C. Fitzmorris Professor of Computer Science at Princeton University, and his research interests include computational complexity theory, uses of randomness in computation, probabilistically checkable proofs, computing approximate solutions to NP-hard problems, geometric embeddings of metric spaces and machine learning.

His Ph.D. thesis on probabilistically checkable proofs received the ACM Doctoral Dissertation Award in 1995. He was awarded the Gödel Prize for his work on the PCP theorem in 2001 and again in 2010 for the discovery (concurrently with Joseph S. B. Mitchell) of a polynomial time approximation scheme for the euclidean travelling salesman problem. In 2008 he was inducted as a Fellow of the Association for Computing Machinery. In 2011 he was awarded the ACM Infosys Foundation Award, given to mid-career researchers in Computer Science. Arora has been awarded the Fulkerson Prize for 2012 for his work on improving the approximation ratio for graph separators and related problems (jointly with Satish Rao and Umesh Vazirani).

He is a coauthor (with Boaz Barak) of the book "Computational Complexity: A Modern Approach" and is a founder, and on the Executive Board, of Princeton's Center for Computational Intractability. * (Source: Wikipedia) *

Perceptual tasks such as vision and audition require the construction of good features, or good internal representations of the input. Deep Learning designates a set of supervised and unsupervised methods to construct feature hierarchies automatically by training systems composed of multiple stages of trainable modules.

The recent history of OCR, speech recognition, and image analysis indicates that deep learning systems yield higher accuracy than systems that rely on hand-crafted features or "shallow" architectures whenever more training data and more computational resources become available. Deep learning systems, particularly convolutional nets, hold the performances record in a wide variety of benchmarks and competition, including object recognition in image, semantic image labeling (2D and 3D), acoustic modeling for speech recognition, drug design, handwriting recognition, pedestrian detection, road sign recognition, etc. The most recent speech recognition and image analysis systems deployed by Google, IBM, Microsoft, Baidu, NEC and others all use deep learning and many use convolutional nets.

While the practical successes of deep learning are numerous, so are the theoretical questions that surround it. What can circuit complexity theory tell us about deep architectures with their multiple sequential steps of computation, compared to, say, kernel machines with simple kernels that have only two steps? What can learning theory tell us about unsupervised feature learning? What can theory tell us about the properties of deep architectures composed of layers that expand the dimension of their input (e.g. like sparse coding), followed by layers that reduce it (e.g. like pooling)? What can theory tell us about the properties of the non-convex objective functions that arise in deep learning? Why is it that the best-performing deep learning systems happen to be ridiculously over-parameterized with regularization so aggressive that it borders on genocide?

Yann LeCun is the Silver Professor of Computer Science, Neural Science, and Electrical and Computer Engineering,

The Courant Institute of Mathematical Sciences, New-York University. He has made contributions in machine learning, computer vision, mobile robotics and computational neuroscience. He is well known for his work on optical character recognition and computer vision using convolutional neural networks. He is also one of the main creators of the DjVu image compression technology (together with Léon Bottou and Patrick Haffner). He co-developed the Lush programming language with Léon Bottou.

Yann LeCun is general chair and organizer of the "Learning Workshop" held every year since 1986 in Snowbird, Utah. He is a member of the Science Advisory Board of the Institute for Pure and Applied Mathematics at UCLA, and a scientific adviser of KXEN Inc., and Vidient Systems. * (Source: Wikipedia)*