Details
Majority of continuous optimization methods developed in the last two decades, especially in application to ML training, are developed under the assumption that approximate first order information is available to the method in some form. The assumption on the quality and reliability of this information can vary substantially from method to method and from setting to setting. We will quickly overview some of these assumptions and then focus on a class of stochastic methods that utilize (inexact) function value information to adapt the step size parameter and use it to dictate the accuracy required or stochastic approximations. The requirements on stochastic approximations are, thus, also adaptive and in principle can be biased and even inconsistent. The step size parameters in these methods can increase and decrease based on the perceived progress, but unlike the deterministic case they are not bounded away from zero. This creates obstacles in complexity analysis of such methods. We will show how by viewing such algorithms as stochastic processes with martingale behavior we can derive bounds on expected complexity that also apply in high probability. We also show that it is possible to derive a lower bound on step size parameters in high probability for the methods in this general framework. We will discuss various stochastic settings, where the framework easily applies, such as expectation minimization, black box and simulation optimization, expectation minimization with corrupt samples, etc.