Jiatong Shi

research

∙ 09/18/2023

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

Text language models have shown remarkable zero-shot capability in gener...

0 Chien-yu Huang, et al. ∙

research

∙ 08/05/2023

A Systematic Exploration of Joint-training for Singing Voice Synthesis

There has been a growing interest in using end-to-end acoustic models fo...

0 Yuning Wu, et al. ∙

research

∙ 06/26/2023

The Singing Voice Conversion Challenge 2023

We present the latest iteration of the voice conversion challenge (VCC) ...

0 Wen-Chin Huang, et al. ∙

research

∙ 06/01/2023

Exploration on HuBERT with Multiple Resolutions

Hidden-unit BERT (HuBERT) is a widely-used self-supervised learning (SSL...

0 Jiatong Shi, et al. ∙

research

∙ 05/18/2023

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

Speech processing Universal PERformance Benchmark (SUPERB) is a leaderbo...

0 Jiatong Shi, et al. ∙

research

∙ 05/12/2023

Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation

Most of the speech translation models heavily rely on parallel data, whi...

0 Yu-Kuan Fu, et al. ∙

research

∙ 04/25/2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Large language models (LLMs) have exhibited remarkable capabilities acro...

7 Rongjie Huang, et al. ∙

research

∙ 04/10/2023

Enhancing Speech-to-Speech Translation with Multiple TTS Targets

It has been known that direct speech-to-speech translation (S2ST) models...

0 Jiatong Shi, et al. ∙

research

∙ 04/10/2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitat...

0 Brian Yan, et al. ∙

research

∙ 03/15/2023

PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor

Singing voice synthesis (SVS), as a specific task for generating the voc...

0 Yuning Wu, et al. ∙

research

∙ 02/24/2023

Improving Massively Multilingual ASR With Auxiliary CTC Objectives

Multilingual Automatic Speech Recognition (ASR) models have extended the...

0 William Chen, et al. ∙

research

∙ 12/21/2022

4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders

The network architecture of end-to-end (E2E) automatic speech recognitio...

0 Yui Sudo, et al. ∙

research

∙ 11/30/2022

EURO: ESPnet Unsupervised ASR Open-source Toolkit

This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EU...

0 Dongji Gao, et al. ∙

research

∙ 11/06/2022

Bridging Speech and Textual Pre-trained Models with Unsupervised ASR

Spoken language understanding (SLU) is a task aiming to extract high-lev...

0 Jiatong Shi, et al. ∙

research

∙ 10/16/2022

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning

We present the SUPERB challenge at SLT 2022, which aims at learning self...

0 Tzu-hsun Feng, et al. ∙

research

∙ 10/13/2022

On Compressing Sequences for Self-Supervised Speech Models

Compressing self-supervised models has become increasingly necessary, as...

0 Yen Meng, et al. ∙

research

∙ 08/03/2022

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States

Beam search, which is the dominant ASR decoding algorithm for end-to-end...

0 Jiatong Shi, et al. ∙

research

∙ 04/19/2022

Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation

Although Transformers have gained success in several speech processing t...

0 Keqi Deng, et al. ∙

research

∙ 04/05/2022

Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation

Self-Supervised Learning (SSL) models have been successfully applied in ...

0 Dan Berrebbi, et al. ∙

research

∙ 03/31/2022

SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy

Deep learning based singing voice synthesis (SVS) systems have been demo...

0 Shuai Guo, et al. ∙

research

∙ 03/14/2022

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

Transfer learning has proven to be crucial in advancing the state of spe...

0 Hsiang-Sheng Tsai, et al. ∙

research

∙ 11/02/2021

Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

Speech processing systems currently do not support the vast majority of ...

6 Peter Wu, et al. ∙

research

∙ 10/15/2021

ESPnet2-TTS: Extending the Edge of TTS Research

This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS)...

0 Tomoki Hayashi, et al. ∙

research

∙ 07/01/2021

ESPnet-ST IWSLT 2021 Offline Speech Translation System

This paper describes the ESPnet-ST group's IWSLT 2021 submission in the ...

0 Hirofumi Inaguma, et al. ∙

research

∙ 05/03/2021

SUPERB: Speech processing Universal PERformance Benchmark

Self-supervised learning (SSL) has proven vital for advancing research i...

0 Shu-wen Yang, et al. ∙

research

∙ 01/26/2021

Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yoloxóchitl Mixtec

"Transcription bottlenecks", created by a shortage of effective human tr...

0 Jiatong Shi, et al. ∙

research

∙ 11/26/2020

Improving RNN Transducer With Target Speaker Extraction and Neural Uncertainty Estimation

Target-speaker speech recognition aims to recognize target-speaker speec...

0 Jiatong Shi, et al. ∙

research

∙ 10/26/2020

Recent Developments on ESPnet Toolkit Boosted by Conformer

In this study, we present recent developments on ESPnet: End-to-End Spee...

0 Pengcheng Guo, et al. ∙

research

∙ 10/22/2020

Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss

The neural network (NN) based singing voice synthesis (SVS) systems requ...

0 Jiatong Shi, et al. ∙

research

∙ 08/19/2020

Context-aware Goodness of Pronunciation for Computer-Assisted Pronunciation Training

Mispronunciation detection is an essential component of the Computer-Ass...

0 Jiatong Shi, et al. ∙

Jiatong Shi

Featured Co-authors

Sign in with Google

Consider DeepAI Pro