Aditya Krishna Menon

research

∙ 07/19/2023

The importance of feature preprocessing for differentially private linear optimization

Training machine learning models with differential privacy (DP) has rece...

0 Ziteng Sun, et al. ∙

research

∙ 07/06/2023

When Does Confidence-Based Cascade Deferral Suffice?

Cascades are a classical strategy to enable inference cost to vary adapt...

0 Wittawat Jitkrittum, et al. ∙

research

∙ 02/03/2023

ResMem: Learn what you can and memorize the rest

The impressive generalization performance of modern neural networks is a...

2 Zitong Yang, et al. ∙

research

∙ 01/30/2023

On student-teacher deviations in distillation: does it pay to disobey?

Knowledge distillation has been widely-used to improve the performance o...

10 Vaishnavh Nagarajan, et al. ∙

research

∙ 01/29/2023

Learning to reject meets OOD detection: Are all abstentions created equal?

Learning to reject (L2R) and out-of-distribution (OOD) detection are two...

3 Harikrishna Narasimhan, et al. ∙

research

∙ 01/28/2023

Supervision Complexity and its Role in Knowledge Distillation

Despite the popularity and efficacy of knowledge distillation, there is ...

8 Hrayr Harutyunyan, et al. ∙

research

∙ 01/27/2023

EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval

Large neural models (such as Transformers) achieve state-of-the-art perf...

12 Seungyeon Kim, et al. ∙

research

∙ 10/28/2022

When does mixup promote local linearity in learned representations?

Mixup is a regularization technique that artificially produces new sampl...

0 Arslan Chaudhry, et al. ∙

research

∙ 06/13/2022

Robust Distillation for Worst-class Performance

Knowledge distillation has proven to be an effective technique in improv...

12 Serena Wang, et al. ∙

research

∙ 04/27/2022

ELM: Embedding and Logit Margins for Long-Tail Learning

Long-tail learning is the problem of learning under skewed label distrib...

9 Wittawat Jitkrittum, et al. ∙

research

∙ 10/19/2021

When in Doubt, Summon the Titans: Efficient Inference with Large Models

Scaling neural networks to "large" sizes, with billions of parameters, h...

5 Ankit Singh Rawat, et al. ∙

research

∙ 07/09/2021

Training Over-parameterized Models with Non-decomposable Objectives

Many modern machine learning applications come with complex and nuanced ...

0 Harikrishna Narasimhan, et al. ∙

research

∙ 06/19/2021

Teacher's pet: understanding and mitigating biases in distillation

Knowledge distillation is widely used as a means of improving the perfor...

5 Michal Lukasik, et al. ∙

research

∙ 05/12/2021

Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

Negative sampling schemes enable efficient training given a large number...

2 Ankit Singh Rawat, et al. ∙

research

∙ 04/16/2021

Interval-censored Hawkes processes

This work builds a novel point process and tools to use the Hawkes proce...

10 Marian-Andrei Rizoiu, et al. ∙

research

∙ 02/13/2021

Distilling Double Descent

Distillation is the technique of training a "student" model based on exa...

0 Andrew Cotter, et al. ∙

research

∙ 10/06/2020

SupMMD: A Sentence Importance Model for Extractive Summarization using Maximum Mean Discrepancy

Most work on multi-document summarization has focused on generic summari...

0 Umanga Bista, et al. ∙

research

∙ 05/21/2020

Why distillation helps: a statistical perspective

Knowledge distillation is a technique for improving the performance of a...

41 Aditya Krishna Menon, et al. ∙

research

∙ 04/23/2020

Doubly-stochastic mining for heterogeneous retrieval

Modern retrieval problems are characterised by training sets with potent...

6 Ankit Singh Rawat, et al. ∙

research

∙ 04/21/2020

Federated Learning with Only Positive Labels

We consider learning a multi-class classification model in the federated...

9 Felix X. Yu, et al. ∙

research

∙ 03/05/2020

Does label smoothing mitigate label noise?

Label smoothing is commonly used in training deep learning models, where...

11 Michal Lukasik, et al. ∙

research

∙ 02/10/2020

Supervised Learning: No Loss No Cry

Supervised learning requires the specification of a loss function to min...

0 Richard Nock, et al. ∙

research

∙ 09/20/2019

Online Hierarchical Clustering Approximations

Hierarchical clustering is a widely used approach for clustering dataset...

10 Aditya Krishna Menon, et al. ∙

research

∙ 01/30/2019

Noise-tolerant fair classification

Fair machine learning concerns the analysis and design of learning algor...

0 Alexandre Louis Lamy, et al. ∙

research

∙ 01/24/2019

Fairness risk measures

Ensuring that classifiers are non-discriminatory or fair with respect to...

0 Robert C. Williamson, et al. ∙

research

∙ 01/18/2019

Cold-start Playlist Recommendation with Multitask Learning

Playlist recommendation involves producing a set of songs that a user mi...

0 Dawei Chen, et al. ∙

research

∙ 12/06/2018

Comparative Document Summarisation via Classification

This paper considers extractive summarisation in a comparative setting: ...

0 Umanga Bista, et al. ∙

research

∙ 10/10/2018

Complementary-Label Learning for Arbitrary Losses and Models

In contrast to the standard classification paradigm where the true (or p...

0 Takashi Ishida, et al. ∙

research

∙ 06/08/2018

Monge beats Bayes: Hardness Results for Adversarial Training

The last few years have seen extensive empirical study of the robustness...

0 Zac Cranko, et al. ∙

research

∙ 02/18/2018

Anomaly Detection using One-Class Neural Networks

We propose a one-class neural network (OC-NN) model to detect anomalies ...

1 Raghavendra Chalapathy, et al. ∙

research

∙ 07/14/2017

f-GANs in an Information Geometric Nutshell

Nowozin et al showed last year how to extend the GAN principle to all f-...

0 Richard Nock, et al. ∙

research

∙ 04/22/2017

Robust, Deep and Inductive Anomaly Detection

PCA is a classical statistical technique whose simplicity and maturity h...

0 Raghavendra Chalapathy, et al. ∙

research

∙ 07/01/2016

A scaled Bregman theorem with applications

Bregman divergences play a central role in the design and analysis of a ...

0 Richard Nock, et al. ∙

research

∙ 06/04/2015

An Average Classification Algorithm

Many classification algorithms produce a classifier that is a weighted a...

0 Brendan van Rooyen, et al. ∙

research

∙ 09/17/2012

Textual Features for Programming by Example

In Programming by Example, a system attempts to infer a program from inp...

0 Aditya Krishna Menon, et al. ∙

Aditya Krishna Menon

Featured Co-authors

Sign in with Google

Consider DeepAI Pro