Fine-tuning pretrained self-supervised language models is widely adopted...
We investigate the effects of post-training quantization and
quantizatio...
Meta-embedding (ME) learning is an emerging approach that attempts to le...
While various avenues of research have been explored for iterative pruni...
Pruning aims to reduce the number of parameters while maintaining perfor...
Counterfactual statements describe events that did not or cannot take pl...
Negative sampling is a limiting factor w.r.t. the generalization of
metr...
Multi-step ahead prediction in language models is challenging due to the...
This paper proposes layer fusion - a model compression technique
that di...
Overparameterized networks trained to convergence have shown impressive
...
The Conversational Question Answering (CoQA) task involves answering a
s...
Task-specific scores are often used to optimize for and evaluate the
per...
In this paper we propose a novel neural language modelling (NLM) method ...
This paper carries out an empirical analysis of various dropout techniqu...
The task of multi-step ahead prediction in language models is challengin...
Word embeddings have been shown to benefit from ensembling several word
...
Ensembling word embeddings to improve distributed word representations h...
Capsule Networks have shown encouraging results on defacto benchmark
com...
In natural language understanding, many challenges require learning
rela...
A key challenge in the legal domain is the adaptation and representation...