Kaifeng Lyu

research

∙ 07/27/2023

The Marginal Value of Momentum for Small Learning Rate SGD

Momentum is known to accelerate the convergence of gradient descent in s...

0 Runzhe Wang, et al. ∙

research

∙ 03/02/2023

Why (and When) does Local SGD Generalize Better than SGD?

Local SGD is a communication-efficient variant of SGD for large-scale tr...

0 Xinran Gu, et al. ∙

research

∙ 01/27/2023

Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing

It is believed that Gradient Descent (GD) induces an implicit bias towar...

0 Jikai Jin, et al. ∙

research

∙ 11/05/2022

New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound

Saliency methods compute heat maps that highlight portions of an input t...

0 Arushi Gupta, et al. ∙

research

∙ 06/14/2022

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction

Normalization layers (e.g., Batch Normalization, Layer Normalization) we...

0 Kaifeng Lyu, et al. ∙

research

∙ 05/20/2022

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms

Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differen...

9 Sadhika Malladi, et al. ∙

research

∙ 10/26/2021

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias

The generalization mystery of overparametrized deep nets has motivated e...

6 Kaifeng Lyu, et al. ∙

research

∙ 12/17/2020

Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning

Matrix factorization is a simple and natural test-bed to investigate the...

0 Zhiyuan Li, et al. ∙

research

∙ 10/06/2020

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

Recent works (e.g., (Li and Arora, 2020)) suggest that the use of popula...

0 Zhiyuan Li, et al. ∙

research

∙ 06/13/2019

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

Recent works on implicit regularization have shown that gradient descent...

0 Kaifeng Lyu, et al. ∙

research

∙ 12/10/2018

Theoretical Analysis of Auto Rate-Tuning by Batch Normalization

Batch Normalization (BN) has become a cornerstone of deep learning acros...

20 Sanjeev Arora, et al. ∙

research

∙ 08/31/2018

Single-Source Bottleneck Path Algorithm Faster than Sorting for Sparse Graphs

In a directed graph G=(V,E) with a capacity on every edge, a bottleneck ...

0 Ran Duan, et al. ∙

research

∙ 05/07/2018

Fine-grained Complexity Meets IP = PSPACE

In this paper we study the fine-grained complexity of finding exact and ...

0 Lijie Chen, et al. ∙

Kaifeng Lyu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro