Community Jameel

Adaptive optimisation methods, which perform local optimisation with a metric constructed from the history of iterates, are becoming increasingly popular for training deep neural networks. Examples include AdaGrad, RMSProp, and Adam. We show that for simple overparameterized problems, adaptive methods often find drastically different solutions than gradient descent (GD) or stochastic gradient descent (SGD). We construct an illustrative binary classification problem where the data is linearly separable, GD and SGD achieve zero test error, and AdaGrad, Adam, and RMSProp attain test errors arbitrarily close to half. We additionally study the empirical generalisation capability of adaptive methods on several state-of-the-art deep learning models. We observe that the solutions found by adaptive methods generalise worse (often significantly worse) than SGD, even when these solutions have better training performance. These results suggest that practitioners should reconsider the use of adaptive methods to train neural networks.

The marginal value of adaptive gradient methods in machine learning

Details

author(s)

publication date

source

related programme

Link to publication

Generative AI in the era of 'alternative facts'

External data and AI are making each other more valuable

Removing biases from molecular representations via information maximisation

Effective human-AI teams via learned natural language rules and onboarding

A deep dive into single-cell RNA sequencing foundation models

Antibiotic identified by AI

LLM-grounded video diffusion models

Successful Development of a Natural Language Processing Algorithm for Pancreatic Neoplasms and Associated Histologic Features

Leveraging artificial intelligence in the fight against infectious diseases

BioAutoMATED: An end-to-end automated machine learning tool for explanation and design of biological sequences

Conformal language modeling

Comparison of mammography AI algorithms with a clinical risk model for 5-year breast cancer risk prediction: An observational study

Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii

Algorithmic pluralism: A structural approach towards equal opportunity

Artificial intelligence and machine learning in lung cancer screening

Wide and deep neural networks achieve consistency for classification

Autocatalytic base editing for RNA-responsive translational control

DiffDock: Diffusion steps, twists and turns for molecular docking

Sybil: A Validated Deep Learning Model to Predict Future Lung Cancer Risk From a Single Low-Dose Chest Computed Tomography

Sequential multi-dimensional self-supervised learning for clinical time series

Queueing theory: Classical and modern methods

Toward robust mammography-based models for breast cancer risk

The age of AI: And our human future

Uniform priors for data-efficient transfer

Machine learning under a modern optimisation lens

The marginal value of adaptive gradient methods in machine learning

Efficient graph-based image segmentation