Due to the unbalanced training data distribution, the language ability o...
Relative positional encoding is widely used in vanilla and linear
transf...
Multilingual transfer ability, which reflects how well the models fine-t...
Neural machine translation has achieved promising results on many transl...
Instruction tuning has significantly advanced large language models (LLM...
Large language models (LLMs) present an intriguing avenue of exploration...
Language models (LMs) gradually become general-purpose interfaces in the...
While large language models (LLMs) bring not only performance but also
c...
Generative Language Models (GLMs) have demonstrated capabilities to stor...
When communicating with elders with cognitive impairment, cognitive
stim...
Sequence modeling has important applications in natural language process...
Data augmentation has been established as an efficacious approach to
sup...
Large language models (LLMs) have demonstrated remarkable potential in
h...
We explore a new task for audio-visual-language modeling called fine-gra...
With promising yet saturated results in high-resource settings, low-reso...
Protein language models have excelled in a variety of tasks, ranging fro...
This work studies discrete diffusion probabilistic models with applicati...
Large pretrained language models (LMs) have shown impressive In-Context
...
Traditional multilingual neural machine translation (MNMT) uses a single...
Despite the surprising few-shot performance of in-context learning (ICL)...
Explaining the black-box predictions of NLP models naturally and accurat...
Though linguistic knowledge emerges during large-scale language model
pr...
Recently, dataset-generation-based zero-shot learning has shown promisin...
Linear transformers aim to reduce the quadratic space-time complexity of...
Recently, diffusion models have emerged as a new paradigm for generative...
Transformer has achieved remarkable success in language, image, and spee...
Vision transformers have shown great success on numerous computer vision...
Recently, contrastive learning attracts increasing interests in neural t...
Nowadays, owing to the superior capacity of the large pre-trained langua...
Open-ended text generation tasks, such as dialogue generation and story
...
Recently, random feature attentions (RFAs) are proposed to approximate t...
Transformer has shown great successes in natural language processing,
co...
Recently over-smoothing phenomenon of Transformer-based models is observ...
There is a growing interest in dataset generation recently due to the
su...
Text generation is of great importance to many natural language processi...
Unsupervised sentence embedding aims to obtain the most appropriate embe...
We examine the extent to which, in principle, linguistic graph
represent...
Transformer architectures have achieved state-of-the-art results on a va...
Transformer architectures are now central to modeling in natural languag...
Transformers have advanced the field of natural language processing (NLP...
Transformers are state-of-the-art models for a variety of sequence model...
We present a language model that combines a large parametric neural netw...
Textual representation learners trained on large amounts of data have
ac...
We show state-of-the-art word representation learning methods maximize a...
We show that Bayes' rule provides a compelling mechanism for controlling...
We introduce a lifelong language learning setup where a model needs to l...
We define general linguistic intelligence as the ability to reuse previo...
We present a new theoretical perspective of data noising in recurrent ne...
The meaning of a sentence is a function of the relations that hold betwe...
In this paper, we propose Neural Phrase-to-Phrase Machine Translation
(N...