Details
We present data-driven algorithms to tune and deploy control policies involving the solution of convex optimization problems in real-time on embedded devices. These methods are very common for controlling fast dynamical systems such as robotic platforms and self-driving vehicles. In the first part of the talk, we introduce a machine learning approach to accelerate the OSQP solver. OSQP is a popular open-source quadratic optimization solver based on the alternating direction method of multipliers (ADMM). It is numerically robust and can be made division-free, thereby suitable for embedded applications. However, being a first-order method, OSQP can exhibit slow convergence to high accuracy solutions in case of badly conditioned data. To overcome these limitations, we use reinforcement learning to train a new constraint-wise ADMM step-size update rule and reduce the average runtime by 30% on several optimization benchmarks. In the second part of the talk, we introduce a learning architecture to tune control policies by varying the parameters of the underlying optimization problem. Our method relies on new techniques to differentiate through convex optimization problems. With this approach, we can automatically design control policies from closed-loop performance specifications using simulation data.