What does it take to create the Babel Fish, a tool that can help individ...
Speech-to-speech translation (S2ST) enables spoken communication between...
Hidden-unit BERT (HuBERT) is a widely-used self-supervised learning (SSL...
Transducer and Attention based Encoder-Decoder (AED) are two widely used...
It has been known that direct speech-to-speech translation (S2ST) models...
ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitat...
We introduce MuAViC, a multilingual audio-visual corpus for robust speec...
The gap between speech and text modalities is a major challenge in
speec...
Direct speech-to-speech translation (S2ST), in which all components can ...
We study speech-to-speech translation (S2ST) that translates speech from...
We present SpeechMatrix, a large-scale multilingual corpus of
speech-to-...
The amount of labeled data to train models for speech tasks is limited f...
We describe a method to jointly pre-train speech and text in an
encoder-...
Direct speech-to-speech translation (S2ST) models suffer from data scarc...
We present a textless speech-to-speech translation (S2ST) system that ca...
This paper presents XLS-R, a large-scale model for cross-lingual speech
...
We present the first direct simultaneous speech-to-speech translation
(S...
In a speech-to-speech translation (S2ST) pipeline, the text-to-speech (T...
This paper presents fairseq S^2, a fairseq extension for speech synthesi...
In this paper, we describe our end-to-end multilingual speech translatio...
Pretraining and multitask learning are widely used to improve the speech...
We present a direct speech-to-speech translation (S2ST) model that trans...
Multi-head attention has each of the attention heads collect salient
inf...
Adapter modules were recently introduced as an efficient alternative to
...
In this paper, we improve speech translation (ST) through effectively
le...
We introduce VoxPopuli, a large-scale multilingual corpus providing 100K...
Simultaneous text translation and end-to-end speech translation have rec...
We introduce dual-decoder Transformer, a new model architecture that joi...
Transformer-based models have achieved state-of-the-art performance on s...
We propose an effective approach to utilize pretrained speech and text m...
Attention-based sequence-to-sequence modeling provides a powerful and el...
We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T)
m...
Simultaneous translation on both text and speech focuses on a real-time ...
Speech translation has recently become an increasingly popular topic of
...
End-to-end speech-to-text translation can provide a simpler and smaller
...
Transfer learning from high-resource languages is known to be an efficie...
One of the main challenges for end-to-end speech translation is data
sca...
We propose autoencoding speaker conversion for training data augmentatio...
Spoken language translation has recently witnessed a resurgence in
popul...
Simultaneous machine translation models start generating a target sequen...
For automatic speech translation (AST), end-to-end approaches are
outper...
For automatic speech translation (AST), end-to-end approaches are
outper...
We share the findings of the first shared task on improving robustness o...
The vast majority of language pairs in the world are low-resource becaus...