SATW: Universal Aspects in the Landscape and Dynamics of Deep Neural Networks

Universal Aspects in the Landscape and Dynamics of Deep Neural Networks

Lay summary

In sintesi

Le reti neurali profonde (RNP) sono modelli computazionali che
permettono di risolvere problemi complessi, come riconoscere
immagini. Hanno avuto un enorme successo negli ultimi anni,
grazie alla loro estrema capacità predittiva, che ha permesso di
sviluppare applicazioni impensabili solo pochi anni fa.

Tuttavia, non è ancora chiaro come funzionano i meccanismi che
permettono loro di essere così efficaci. Di conseguenza non è mai
chiaro a priori se un problema specifico è risolvibile con questi
modelli, e anche quando lo è, non è chiaro come va costruita e
addestrata la RNP per ottenere risultati ottimali. Oggi, ci si
affida a metodi empirici di prova ed errore, che rendono lo
sviluppo delle RNP una vera e propria arte.

Soggetto e Obiettivo

Il nostro principale scopo è di comprendere aspetti fondamentali
sulla dinamica delle RNP, attraverso il paragone con modelli noto
nella fisica statistica dei sistemi disordinati (modelli p-spin),
che presentano delle forti similitudini con le RNP.

Paragoni sistematici tra la dinamica di addestramento delle RNP e
la dinamica fuori dall’equilibrio dei p-spin, permetteranno di
capire quando e perché i due sistemi si comportano in modo
qualitativamente differente (o uguale). Particolare enfasi sarà
posta nella presenza di transizioni di fase e nei limiti che la
quantità di rumore nella dinamica impone all’apprendimento.

Contesto socio-scientifico

Lo studio permetterà di comprendere somiglianze tra sistemi
apparentemente molto diversi, e di capire fino a che punto viene
mantenuta una universalità che accomuni le caratteristiche base
di modelli fisici e di machine learning.

De facto, conoscere i limiti di funzionamento delle RNP
permetterà di sviluppare nuovi algoritmi e metodi per addestrare
modelli in modo più sistematico ed efficiente.

Abstract

Deep neural networks (DNNs) have shown impressive empirical performance but they are still nevertheless a black-box function modeling data. This is often a significant barrier for practitioners, whose choices most often rely on trial and error, and raises many fundamental questions for theorists regarding why and in which circumstances we can expect these models to perform well.The learning process of a DNN consists of two major ingredients: a loss function that needs to be minimized, and an optimization algorithm used to find an optimum, in a landscape constituted by the values that the loss assumes for each configuration of the model parameters. Understanding the loss landscape and how the dynamics takes place on it is therefore a fundamental matter which would have a significant impact in machine learning.Here, we build upon some existing connections between machine learning and statistical physics to unravel the interplay between landscape and dynamics in a series of different contexts: (a) off-equilibrium, (b) equilibrium and steady state, (c) criticality, i.e. emergent collective behavior arising from the competition between energy and entropy. From a practitioner’s point of view, these three aspects will provide precious knowledge on (a) the learning process, (b) the preconditioning of models, and (c) hyperparameter bounds for learning. The approach we propose uses methods and ideas from the statistical mechanics of disordered systems, and will provide a new bridge between machine learning and physics.We will perform an extensive analysis of the roles of dynamics and loss landscape. More concretely, in each of the contexts described above, we will compare the behavior of Langevin versus Stochastic Gradient Descent (SGD) dynamics, and the landscape of DNNs with that of a paradigmatic spin glass model. We expect the following broad outcomes:• A systematic comparison between Langevin and SGD dynamics.• A systematic comparison between DNN models and some affine complex systems.• Understanding how the noise in the dynamics affects the effective landscape that is visited, and how this can give rise to emergent collective behaviors of the parameters.• Using the current understanding of SGD to avoid Hessian calculations in second-order algorithms for the optimization of complex potential energy functionals.

Last updated:10.06.2022

SNSF
Project funding (Div. I-III)
Original data source 196902 i

Information Technology
Mathematics, Natural- and Engineering Sciences;Engineering Sciences

1 People

Prof.Thomas Hofmann

We help you find the perfect fit.

Lay summary

Abstract