Ryan Boustany soutiendra publiquement sa thèse en mathématiques le lundi 31 mars 2025 à 14h30 (Auditorium 6, bâtiment TSE)
Titre de la soutenance : On deep network training: complexity, robustness of nonsmooth backpropagation, and inertial algorithms
Directeur de thèse : Professeur Jérôme BOLTE
Pour assister à la soutenance, merci de contacter l'école doctorale de TSE
Membres du jury :
- Pierre ABLIN – Apple, ex-CNRS – Examinateur
- Samir ADLY – XLIM-DMI, University of Limoges – Rapporteur
- Jérôme BOLTE – University of Toulouse 1 Capitole – Supervisor
- Peter OCHS – Saarland University – Rapporteur
- Edouard PAUWELS – Toulouse School of Economics – Co directeur
- Audrey REPETTI – Heriot-Watt University – Rapporteure
Résumé (en anglais) :
Learning based on neural networks relies on the combined use of first-order non-convex optimization techniques, subsampling approximation, and algorithmic differentiation, which is the automated numerical application of differential calculus. These methods are fundamental to modern computing libraries such as TensorFlow, PyTorch and JAX. However, these libraries use algorithmic differentiation beyond their primary focus on basic differentiable operations. Often, models incorporate non-differentiable activation functions like ReLU or generalized derivatives for complex objects (solutions to sub-optimization problems). Consequently, understanding the behavior of algorithmic differentiation and its impact on learning has emerged as a key issue in the machine learning community.
To address this, a new concept of nonsmooth differentiation, called conservative gradients, has been developed to model nonsmooth algorithmic differentiation in modern learning contexts. This concept also facilitates the formulation of learning guarantees and the stability of algorithms in deep neural networks as they are practically implemented.
In this context, we propose two extensions of the conservative calculus, finding a wide range of applications in machine learning. The first result provides a simple model to estimate the computational costs of the backward and forward modes of algorithmic differentiation for a wide class of nonsmooth programs. A second result focuses on the reliability of automatic differentiation for nonsmooth neural networks operating with floating-point numbers. Finally, we focus on building a new optimizer algorithm exploiting second-order information only using noisy first-order nonsmooth nonconvex automatic differentiation. Starting from a dynamical system (an ordinary differential equation), we build INNAprop, derived from a combination of INNA and RMSprop.