Large-scale pre-trained language models such as GPT-3 have shown remarka...
Non-autoregressive (NAR) models can generate sentences with less computa...
Impressive performance of Transformer has been attributed to self-attent...
In the perspective of a layer normalization (LN) position, the architect...
Subword regularizations use multiple subword segmentations during traini...
Grammatical Error Correction (GEC) should not focus only on high accurac...
Neural models trained with large amount of parallel data have achieved
i...
Since traditional tokenizers are isolated from a downstream task and mod...
We propose a parameter sharing method for Transformers (Vaswani et al.,
...
We often use perturbations to regularize neural models. For neural
encod...
We present a multi-task learning framework for cross-lingual abstractive...
Most studies on abstractive summarization re-port ROUGE scores between s...
In neural network-based models for natural language processing (NLP), th...
This paper proposes a novel Recurrent Neural Network (RNN) language mode...
Neural encoder-decoder models have been successful in natural language
g...
This paper proposes a state-of-the-art recurrent neural network (RNN)
la...
The encoder-decoder model is widely used in natural language generation
...
This paper proposes a reinforcing method that refines the output layers ...
Learning distributed representations for relation instances is a central...