Large Language Models (LLMs) present strong general capabilities, and a
...
Large language models (LLMs) are capable of performing conditional seque...
N-gram matching-based evaluation metrics, such as BLEU and chrF, are wid...
Recently, DeepNorm scales Transformers into extremely deep (i.e., 1000
l...
Although pre-trained sequence-to-sequence models have achieved great suc...
Token-level adaptive training approaches can alleviate the token imbalan...
Scheduled sampling is widely used to mitigate the exposure bias problem ...
This paper introduces WeChat AI's participation in WMT 2021 shared news
...
Scheduled sampling is an effective method to alleviate the exposure bias...
Recently, token-level adaptive training has achieved promising improveme...
The Neural Machine Translation (NMT) model is essentially a joint langua...
We participate in the WMT 2020 shared news translation task on Chinese t...
The vanilla Transformer conducts a fixed number of computations over all...
The Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph
...
Spoken Language Understanding (SLU) mainly involves two tasks, intent
de...
Current state-of-the-art systems for sequence labeling are typically bas...