Quynh Nguyen



Provable guarantees for training neural networks

  • On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths
    Quynh Nguyen. ICML 2021

    This article provides a short proof for the global convergence of GD in training deep ReLU networks. For arbitrary labels, it is shown that a linear, quadratic or cubic width suffices to prove the result (depending on the initilization).

  • Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology
    Quynh Nguyen and Marco Mondelli. ICML 2020

    Earlier works showed that gradient descent (GD) can find a global solution when all the hidden layers are polynomially wide. However this condition makes neural networks operate in a kernel regime. This article shows that global convergence can be proved for deep pyramidal nets -- a much more empirically relevant architecture where only the first hidden layer needs to be wide and the remaining layers have constant and non-increasing widths. For this pyramidal network of constant widths, GD provably moves the feature representations of the network by at least Ω(1), and hence training goes beyond the NTK/lazy regime where this change is typically o(1).

  • Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods
    Antoine Gautier, Quynh Nguyen and Matthias Hein. NIPS 2016

    We cast the optimization problem of a polynomial network as a nonlinear eigenvalue problem. We study the uniqueness and global optimality of the solution, and propose a generalized power method to solve it.

Loss surface, optimization landscape, sublevel sets

  • A Note on Connectivity of Sublevel Sets in Deep Learning
    Quynh Nguyen. Technical note, 2021

    For shallow networks, it is shown that having N+1 hidden neurons is both necessary and sufficient for the training loss function of neural networks to have connected sublevel sets. For deeper architecture, this condition is shown to be sufficient. However, whether it is necessary or not for multilayer networks is still an open problem.

  • When Are Solutions Connected in Deep Networks?
    Quynh Nguyen, Pierre Brechet and Marco Mondelli. NeurIPS 2021

    This article gives a condition for which certain points in parameter space can be connected by a continuous path along which there are no barriers or jumps in the loss landscape. This sounds similar to results on connected sublevel sets, but it is weaker in the sense that the connectivity is only shown for a subset of solutions. At the same time, the requirement on over-parameterization is also weaker. Empirically, it is found that the provided condition can capture well the mode connectivity phenomenon concerning SGD solutions in deep learning.

  • On Connected Sublevel Sets in Deep Learning
    Quynh Nguyen. ICML 2019

    This article proves the connectivity of sublevel sets of the loss function of deep pyramidal networks.

  • On the Loss Landscape of a Class of Deep Neural Networks with No Bad Local Valleys
    Quynh Nguyen, Mahesh Chandra Mukkamala and Matthias Hein. ICLR 2019

    Consider neural networks as a directed acyclic graph, this article shows that the loss function has no spurious valleys as long as there are enough skip-connections from lower layers to the output layer. Empirically, it is shown that adding random skip-connections from lower layers to the output can remove not only spurious valleys but also vanishing gradient issues, which makes the training of very deep networks much more stable and efficient.

  • Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions
    Quynh Nguyen, Mahesh Mukkamala and Matthias Hein. ICML 2018

    In order for neural networks to learn disconnected decision regions in the input space, at least one of the hidden layers should have more neurons than the input dimension.

  • Optimization Landscape and Expressivity of Deep CNNs
    Quynh Nguyen and Matthias Hein. ICML 2018

    This article shows that a standard convolutional layer suffices to memorize any N samples as long as the number of parameters exceeds N. It also provides a condition for global optimality of critical points in deep CNNs.

  • The Loss Surface of Deep and Wide Neural Networks
    Quynh Nguyen and Matthias Hein. ICML 2017

    This article studies the global optimality of local minima for deep nonlinear networks. The proof exploits Implicit Function Theorem to characterize the optimality of local minima in terms of their non-degenerate conditions.

Neural tangent kernel

  • Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks
    Quynh Nguyen, Marco Mondelli and Guido Montufar. ICML 2021
  • The spectrum of the NTK has found applications in proving memorization capacity, convergence of GD and generalization bounds in certain regimes. This article provides tight lower bounds on the smallest eigenvalue of the NTK matrix for Gaussian weights, both in the limit of infinitely wide networks, and for finite-width networks.

Initialization of deep networks

  • A Fully Rigorous Proof of the Derivation of Xavier and He's Initialization for Deep ReLU Networks
    Quynh Nguyen. Technical note, 2021
  • He's and LeCun's initialization are very popular methods for initializing neural network weights in deep learning. However, the formulas of these initializations in the original papers have been only derived under the assumption that all the hidden neurons are somewhat independent -- a condition known to be satisfied only for infinitely wide networks. This article provides a rigorous derivation for the case of networks with flinite abeit large widths.

Nonconvex optimization

Computer vision

Teaching

Convex Optimization

Advanced Topics in Machine Learning (Seminar)

Talks

Loss surface of deep and wide neural networks
Math Machine Learning seminar at MPI-MIS and UCLA, (virtual) 2020

Optimization landscape of deep neural networks
Simons Institute for the Theory of Computing, Berkeley, California. 2019

Optimization landscape of deep CNNs
Microsoft Research Redmond (MSR) 2018