Endowing chatbots with a consistent persona is essential to an engaging
...
Given a prefix (context), open-ended generation aims to decode texts tha...
Large language models are trained in two stages: (1) unsupervised pretra...
It is well established in neuroscience that color vision plays an essent...
It is no secret that deep learning models exhibit undesirable behaviors ...
Fine-tuning over large pretrained language models (PLMs) has established...
The design choices in the Transformer attention mechanism, including wea...
A recent family of techniques, dubbed as lightweight fine-tuning methods...
One of the most impressive results of recent NLP history is the ability ...
Despite the high performance achieved by deep neural networks on various...
Fine-tuning large pre-trained language models on downstream tasks has be...
A central goal of machine learning is to learn robust representations th...
The quadratic computational and memory complexities of the Transformer's...
Commonsense reasoning is intuitive for humans but has been a long-term
c...
Having engaging and informative conversations with users is the utmost g...
In this paper, we introduce Apollo, a quasi-Newton method for nonconvex
...
Cross-lingual transfer learning has become an important weapon to battle...
Most sequence-to-sequence (seq2seq) models are autoregressive; they gene...
Despite impressive empirical successes of neural machine translation (NM...
Cross-lingual transfer, where a high-resource transfer language is used ...
Recent approaches to cross-lingual word embedding have generally been ba...
Flow-based generative models, conceptually attractive due to tractabilit...
Variational Autoencoder (VAE), a simple and effective deep generative mo...
Cross-lingual transfer is the major means toleverage knowledge from
high...
We introduce Texar, an open-source toolkit aiming to support the broad s...
We introduce a novel architecture for dependency parsing: stack-pointer
...
Reward augmented maximum likelihood (RAML), a simple and effective learn...
Knowledge bases are important resources for a variety of natural languag...
In this paper, we propose a probabilistic parsing model, which defines a...
Dropout, a simple and effective way to train deep neural networks, has l...
Combining deep neural networks with structured logic rules is desirable ...
Coreference resolution is one of the first stages in deep language
under...
State-of-the-art sequence labeling systems traditionally require large
a...
This paper presents generalized probabilistic models for high-order
proj...