Dimitris Papailiopoulos

research

∙ 07/12/2023

Mini-Batch Optimization of Contrastive Loss

Contrastive learning has gained significant attention as a method for se...

0 Jaewoong Cho, et al. ∙

research

∙ 07/07/2023

Teaching Arithmetic to Small Transformers

Large language models like GPT-4 exhibit emergent capabilities across ge...

0 Nayoung Lee, et al. ∙

research

∙ 05/30/2023

Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs

Chain-of-thought (CoT) is a method that enables language models to handl...

0 Yingcong Li, et al. ∙

research

∙ 05/08/2023

Prompted LLMs as Chatbot Modules for Long Open-domain Conversation

In this paper, we propose MPC (Modular Prompted Chatbot), a new approach...

0 Gibbeum Lee, et al. ∙

research

∙ 05/04/2023

Cuttlefish: Low-Rank Model Training without All the Tuning

Recent research has shown that training low-rank neural networks can eff...

0 Hongyi Wang, et al. ∙

research

∙ 02/15/2023

The Expressive Power of Tuning Only the Norm Layers

Feature normalization transforms such as Batch and Layer-Normalization h...

0 Angeliki Giannou, et al. ∙

research

∙ 01/30/2023

Looped Transformers as Programmable Computers

We present a framework for using transformer networks as universal compu...

0 Angeliki Giannou, et al. ∙

research

∙ 01/17/2023

Transformers as Algorithms: Generalization and Implicit Model Selection in In-context Learning

In-context learning (ICL) is a type of prompting where a transformer mod...

0 Yingcong Li, et al. ∙

research

∙ 10/06/2022

A Better Way to Decay: Proximal Gradient Training Algorithms for Neural Nets

Weight decay is one of the most widely used forms of regularization in d...

5 Liu Yang, et al. ∙

research

∙ 06/14/2022

LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks

Fine-tuning pretrained language models (LMs) without making any architec...

8 Tuan Dinh, et al. ∙

research

∙ 05/23/2022

Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment

Word translation without parallel corpora has become feasible, rivaling ...

0 Tuan Dinh, et al. ∙

research

∙ 02/24/2022

Rare Gems: Finding Lottery Tickets at Initialization

It has been widely observed that large neural networks can be pruned to ...

5 Kartik Sreenivasan, et al. ∙

research

∙ 01/07/2022

GenLabel: Mixup Relabeling using Generative Models

Mixup is a data augmentation method that generates new data points by mi...

0 Jy-yong Sohn, et al. ∙

research

∙ 10/18/2021

Finding Everything within Random Binary Networks

A recent work by Ramanujan et al. (2020) provides significant empirical ...

0 Kartik Sreenivasan, et al. ∙

research

∙ 06/14/2021

An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks

It is well known that modern deep neural networks are powerful enough to...

0 Shashank Rajput, et al. ∙

research

∙ 03/05/2021

Pufferfish: Communication-efficient Models At No Extra Cost

To mitigate communication overheads in distributed model training, sever...

10 Hongyi Wang, et al. ∙

research

∙ 02/28/2021

On the Utility of Gradient Compression in Distributed Training Systems

Rapid growth in data sets and the scale of neural network architectures ...

16 Saurabh Agarwal, et al. ∙

research

∙ 02/19/2021

Permutation-Based SGD: Is Random Optimal?

A recent line of ground-breaking results for permutation-based SGD has c...

0 Shashank Rajput, et al. ∙

research

∙ 10/29/2020

Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification

Distributed model training suffers from communication bottlenecks due to...

1 Saurabh Agarwal, et al. ∙

research

∙ 07/09/2020

Attack of the Tails: Yes, You Really Can Backdoor Federated Learning

Due to its decentralized nature, Federated Learning (FL) lends itself to...

18 Hongyi Wang, et al. ∙

research

∙ 06/14/2020

Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization is Sufficient

The strong lottery ticket hypothesis (LTH) postulates that one can appro...

0 Ankit Pensia, et al. ∙

research

∙ 02/24/2020

Closing the convergence gap of SGD without replacement

Stochastic gradient descent without replacement sampling is widely used ...

0 Shashank Rajput, et al. ∙

research

∙ 02/15/2020

Federated Learning with Matched Averaging

Federated learning allows edge devices to collaboratively learn a shared...

0 Hongyi Wang, et al. ∙

research

∙ 07/29/2019

DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation

To improve the resilience of distributed training to worst-case, or Byza...

0 Shashank Rajput, et al. ∙

research

∙ 06/06/2019

Bad Global Minima Exist and SGD Can Reach Them

Several recent works have aimed to explain why severely overparameterize...

0 Shengchao Liu, et al. ∙

research

∙ 05/22/2019

Convergence and Margin of Adversarial Training on Separable Data

Adversarial training is a technique for training robust machine learning...

0 Zachary Charles, et al. ∙

research

∙ 05/08/2019

Does Data Augmentation Lead to Positive Margin?

Data augmentation (DA) is commonly used during model training, as it sig...

0 Shashank Rajput, et al. ∙

research

∙ 03/29/2019

SysML: The New Frontier of Machine Learning Systems

Machine learning (ML) techniques are enjoying rapidly increasing adoptio...

0 Alexander Ratner, et al. ∙

research

∙ 01/28/2019

ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding

We present ErasureHead, a new approach for distributed gradient descent ...

0 Hongyi Wang, et al. ∙

research

∙ 11/08/2018

A Geometric Perspective on the Transferability of Adversarial Directions

State-of-the-art machine learning models frequently misclassify inputs t...

0 Zachary Charles, et al. ∙

research

∙ 06/11/2018

ATOMO: Communication-efficient Learning via Atomic Sparsification

Distributed model training suffers from communication overheads due to f...

0 Hongyi Wang, et al. ∙

research

∙ 06/11/2018

The Effect of Network Width on the Performance of Large-batch Training

Distributed implementations of mini-batch stochastic gradient descent (S...

0 Lingjiao Chen, et al. ∙

research

∙ 05/25/2018

Gradient Coding via the Stochastic Block Model

Gradient descent and its many variants, including mini-batch stochastic ...

0 Zachary Charles, et al. ∙

research

∙ 03/27/2018

DRACO: Robust Distributed Training via Redundant Gradients

Distributed model training is vulnerable to worst-case system failures a...

0 Lingjiao Chen, et al. ∙

research

∙ 11/17/2017

Approximate Gradient Coding via Sparse Random Graphs

Distributed algorithms are often beset by the straggler effect, where th...

0 Zachary Charles, et al. ∙

research

∙ 10/23/2017

Stability and Generalization of Learning Algorithms that Converge to Global Optima

We establish novel generalization bounds for learning algorithms that co...

0 Zachary Charles, et al. ∙

research

∙ 05/31/2016

CYCLADES: Conflict-free Asynchronous Machine Learning

We present CYCLADES, a general framework for parallelizing stochastic op...

0 Xinghao Pan, et al. ∙

research

∙ 03/09/2016

Bipartite Correlation Clustering -- Maximizing Agreements

In Bipartite Correlation Clustering (BCC) we are given a complete bipart...

0 Megasthenis Asteris, et al. ∙

research

∙ 08/04/2015

Sparse PCA via Bipartite Matchings

We consider the following multi-component sparse PCA problem: given a se...

0 Megasthenis Asteris, et al. ∙

research

∙ 04/06/2014

Provable Deterministic Leverage Score Sampling

We explain theoretically a curious empirical phenomenon: "Approximating ...

0 Dimitris Papailiopoulos, et al. ∙

Dimitris Papailiopoulos

Featured Co-authors

Sign in with Google

Consider DeepAI Pro