We study a streamable attention-based encoder-decoder model in which eit...
Automatic speech recognition (ASR) systems typically use handcrafted fea...
Building competitive hybrid hidden Markov model (HMM) systems for automa...
Document-level context for neural machine translation (NMT) is crucial t...
Compared to sentence-level systems, document-level neural machine transl...
The integration of language models for neural machine translation has be...
Modern public ASR tools usually provide rich support for training variou...
This paper summarizes our contributions to the document-grounded dialog ...
Neural speaker embeddings encode the speaker's speech characteristics th...
Recently, RNN-Transducers have achieved remarkable results on various
au...
The pre-training of masked language models (MLMs) consumes massive
compu...
In this work, we present a model for document-grounded response generati...
Automatic speech recognition (ASR) has been established as a well-perfor...
We introduce a novel segmental-attention model for automatic speech
reco...
Language barriers present a great challenge in our increasingly connecte...
Currently, in speech translation, the straightforward approach - cascadi...
Encoder-decoder architecture is widely adopted for sequence-to-sequence
...
Checkpoint averaging is a simple and effective method to boost the
perfo...
In this work, we compare from-scratch sequence-level cross-entropy (full...
Speaker adaptation is important to build robust automatic speech recogni...
As one of the most popular sequence-to-sequence modeling approaches for
...
In this work, we show that a factored hybrid hidden Markov model (FH-HMM...
This paper summarizes our submission to Task 2 of the second track of th...
To mitigate the problem of having to traverse over the full vocabulary i...
The recently proposed conformer architecture has been successfully used ...
To improve the performance of state-of-the-art automatic speech recognit...
Sequence discriminative training is a great tool to improve the performa...
The mismatch between an external language model (LM) and the implicitly
...
Data processing is an important step in various natural language process...
Pivot-based neural machine translation (NMT) is commonly used in low-res...
Complex natural language applications such as speech translation or pivo...
This paper summarizes our entries to both subtasks of the first DialDoc
...
The peaky behavior of CTC models is well known experimentally. However, ...
As the vocabulary size of modern word-based language models becomes ever...
Subword units are commonly used for end-to-end automatic speech recognit...
With the advent of direct models in automatic speech recognition (ASR), ...
Attention-based encoder-decoder (AED) models learn an implicit internal
...
Recent publications on automatic-speech-recognition (ASR) have a strong ...
Acoustic modeling of raw waveform and learning feature extractors as par...
We present our transducer model on Librispeech. We study variants to inc...
High-performance hybrid automatic speech recognition (ASR) systems are o...
End-to-end models reach state-of-the-art performance for speech recognit...
This paper summarizes our work on the first track of the ninth Dialog Sy...
A cascaded speech translation model relies on discrete and non-different...
Neural translation models have proven to be effective in capturing suffi...
To join the advantages of classical and end-to-end approaches for speech...
Context-aware neural machine translation (NMT) is a promising direction ...
To encourage intra-class compactness and inter-class separability among
...
Sequence-to-sequence models with an implicit alignment mechanism (e.g.
a...
Common end-to-end models like CTC or encoder-decoder-attention models us...