Yann LeCun is the Silver Professor of Computer Science, Neural Science, and Electrical and Computer Engineering,
The Courant Institute of Mathematical Sciences, New-York University. He has made contributions in machine learning, computer vision, mobile robotics and computational neuroscience. He is well known for his work on optical character recognition and computer vision using convolutional neural networks. He is also one of the main creators of the DjVu image compression technology (together with Léon Bottou and Patrick Haffner). He co-developed the Lush programming language with Léon Bottou.
Yann LeCun is general chair and organizer of the "Learning Workshop" held every year since 1986 in Snowbird, Utah. He is a member of the Science Advisory Board of the Institute for Pure and Applied Mathematics at UCLA, and a scientific adviser of KXEN Inc., and Vidient Systems. (Source: Wikipedia)
Perceptual tasks such as vision and audition require the construction of good features, or good internal representations of the input. Deep Learning designates a set of supervised and unsupervised methods to construct feature hierarchies automatically by training systems composed of multiple stages of trainable modules.
The recent history of OCR, speech recognition, and image analysis indicates that deep learning systems yield higher accuracy than systems that rely on hand-crafted features or "shallow" architectures whenever more training data and more computational resources become available. Deep learning systems, particularly convolutional nets, hold the performances record in a wide variety of benchmarks and competition, including object recognition in image, semantic image labeling (2D and 3D), acoustic modeling for speech recognition, drug design, handwriting recognition, pedestrian detection, road sign recognition, etc. The most recent speech recognition and image analysis systems deployed by Google, IBM, Microsoft, Baidu, NEC and others all use deep learning and many use convolutional nets.
While the practical successes of deep learning are numerous, so are the theoretical questions that surround it. What can circuit complexity theory tell us about deep architectures with their multiple sequential steps of computation, compared to, say, kernel machines with simple kernels that have only two steps? What can learning theory tell us about unsupervised feature learning? What can theory tell us about the properties of deep architectures composed of layers that expand the dimension of their input (e.g. like sparse coding), followed by layers that reduce it (e.g. like pooling)? What can theory tell us about the properties of the non-convex objective functions that arise in deep learning? Why is it that the best-performing deep learning systems happen to be ridiculously over-parameterized with regularization so aggressive that it borders on genocide?