Ryan Boustany will defend his thesis on Monday 31st March at 02:30 PM (Auditorium 6, bâtiment TSE)
Title: On deep network training: complexity, robustness of nonsmooth backpropagation, and inertial algorithms
Supervisor: Professor Jérôme BOLTE
To attend the conference, please contact the secretariat of the TSE Doctoral school.
Memberships are:
- Pierre ABLIN – Apple, ex-CNRS – Examinateur
- Samir ADLY – XLIM-DMI, University of Limoges – Rapporteur
- Jérôme BOLTE – University of Toulouse 1 Capitole – Supervisor
- Peter OCHS – Saarland University – Rapporteur
- Edouard PAUWELS – Toulouse School of Economics – Co supervisor
- Audrey REPETTI – Heriot-Watt University – Rapporteure
Abstract :
Learning based on neural networks relies on the combined use of first-order non-convex optimization techniques, subsampling approximation, and algorithmic differentiation, which is the automated numerical application of differential calculus. These methods are fundamental to modern computing libraries such as TensorFlow, PyTorch and JAX. However, these libraries use algorithmic differentiation beyond their primary focus on basic differentiable operations. Often, models incorporate non-differentiable activation functions like ReLU or generalized derivatives for complex objects (solutions to sub-optimization problems). Consequently, understanding the behavior of algorithmic differentiation and its impact on learning has emerged as a key issue in the machine learning community.
To address this, a new concept of nonsmooth differentiation, called conservative gradients, has been developed to model nonsmooth algorithmic differentiation in modern learning contexts. This concept also facilitates the formulation of learning guarantees and the stability of algorithms in deep neural networks as they are practically implemented.
In this context, we propose two extensions of the conservative calculus, finding a wide range of applications in machine learning. The first result provides a simple model to estimate the computational costs of the backward and forward modes of algorithmic differentiation for a wide class of nonsmooth programs. A second result focuses on the reliability of automatic differentiation for nonsmooth neural networks operating with floating-point numbers. Finally, we focus on building a new optimizer algorithm exploiting second-order information only using noisy first-order nonsmooth nonconvex automatic differentiation. Starting from a dynamical system (an ordinary differential equation), we build INNAprop, derived from a combination of INNA and RMSprop.