Non-Linear Dictionary Approximation, Greedy Algorithms, and Applications to Neural Networks

Given a dictionary D of functions in a Banach space, we study the problem of how efficiently its convex hull can be approximated by n-term linear combinations of elements of D. In addition, we develop techniques for bounding the metric entropy and n-widths of the convex hull of D. These problems are important in analyzing statistical methods, such as L2-boosting and shallow neural networks, which perform regression using non-linear dictionary expansions. Our results generalize existing methods by taking the smoothness of the dictionary D into account, and in particular give sharp estimates for shallow ReLU^k neural networks. Consequences of these results include: the optimal approximation rates which can be attained for shallow ReLU^k networks, that shallow ReLU^k networks dramatically outperform linear methods of approximation, and that shallow ReLU^k networks outperform all continuous methods of approximation on the convex hull of single neuron functions. Finally, we discuss greedy algorithms for constructing non-linear approximations of target functions in the convex hull of D. We give optimal convergence rates for the orthogonal greedy algorithm when the convex hull of D has small metric entropy. A consequence of this is that the orthogonal greedy algorithm can train asymptotically optimal shallow neural networks.

Short Bio: Jonathan Siegel is currently an assistant research professor at Penn State University working with Prof. Jinchao Xu, and will be joining the mathematics department at Texas A&M University in the Fall of 2022. He attended graduate school at UCLA and received a Ph.D. under the guidance of Prof. Russel Caflisch in 2018. In his dissertation, he studied optimization on manifolds and its applications to electronic structure calculations, for which he won the Pacific Journal of Mathematics Dissertation Prize. Since then, he has been a postdoc at Penn State working on optimization theory and the approximation properties of neural networks.