Training machine learning models with differential privacy (DP) has rece...
Cascades are a classical strategy to enable inference cost to vary adapt...
The impressive generalization performance of modern neural networks is
a...
Knowledge distillation has been widely-used to improve the performance o...
Learning to reject (L2R) and out-of-distribution (OOD) detection are two...
Despite the popularity and efficacy of knowledge distillation, there is
...
Large neural models (such as Transformers) achieve state-of-the-art
perf...
Mixup is a regularization technique that artificially produces new sampl...
Knowledge distillation has proven to be an effective technique in improv...
Long-tail learning is the problem of learning under skewed label
distrib...
Scaling neural networks to "large" sizes, with billions of parameters, h...
Many modern machine learning applications come with complex and nuanced
...
Knowledge distillation is widely used as a means of improving the perfor...
Negative sampling schemes enable efficient training given a large number...
This work builds a novel point process and tools to use the Hawkes proce...
Distillation is the technique of training a "student" model based on exa...
Most work on multi-document summarization has focused on generic
summari...
Knowledge distillation is a technique for improving the performance of a...
Modern retrieval problems are characterised by training sets with potent...
We consider learning a multi-class classification model in the federated...
Label smoothing is commonly used in training deep learning models, where...
Supervised learning requires the specification of a loss function to
min...
Hierarchical clustering is a widely used approach for clustering dataset...
Fair machine learning concerns the analysis and design of learning algor...
Ensuring that classifiers are non-discriminatory or fair with respect to...
Playlist recommendation involves producing a set of songs that a user mi...
This paper considers extractive summarisation in a comparative setting: ...
In contrast to the standard classification paradigm where the true (or
p...
The last few years have seen extensive empirical study of the robustness...
We propose a one-class neural network (OC-NN) model to detect anomalies ...
Nowozin et al showed last year how to extend the GAN
principle to all f-...
PCA is a classical statistical technique whose simplicity and maturity h...
Bregman divergences play a central role in the design and analysis of a ...
Many classification algorithms produce a classifier that is a weighted
a...
In Programming by Example, a system attempts to infer a program from inp...