Sheng Zhao

research

∙ 08/09/2023

VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer

Current talking face generation methods mainly focus on speech-lip synch...

0 Liyang Chen, et al. ∙

research

∙ 07/27/2023

The detection and rectification for identity-switch based on unfalsified control

The purpose of multi-object tracking (MOT) is to continuously track and ...

0 Junchao Huang, et al. ∙

research

∙ 07/03/2023

An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023

The task of synthetic speech generation is to generate language content ...

0 Sheng Zhao, et al. ∙

research

∙ 03/06/2023

FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model

Neural text-to-speech (TTS) generally consists of cascaded architecture ...

0 Ruiqing Xue, et al. ∙

research

∙ 02/22/2023

Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation

We previously proposed contextual spelling correction (CSC) to correct t...

1 Xiaoqiang Wang, et al. ∙

research

∙ 01/05/2023

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

We introduce a language modeling approach for text to speech synthesis (...

4 Chengyi Wang, et al. ∙

research

∙ 11/22/2022

PromptTTS: Controllable Text-to-Speech with Text Descriptions

Using a text description as prompt to guide the generation of text or im...

0 Zhifang Guo, et al. ∙

research

∙ 08/30/2022

MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks

Human usually composes music by organizing elements according to the mus...

11 Peiling Lu, et al. ∙

research

∙ 08/29/2022

StableFace: Analyzing and Improving Motion Stability for Talking Face Generation

While previous speech-driven talking face generation methods have made s...

13 Jun Ling, et al. ∙

research

∙ 07/11/2022

DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders

Current text to speech (TTS) systems usually leverage a cascaded acousti...

0 Yanqing Liu, et al. ∙

research

∙ 06/28/2022

RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion

This paper proposes a new "decompose-and-edit" paradigm for the text-bas...

0 Dacheng Yin, et al. ∙

research

∙ 05/30/2022

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis

Binaural audio plays a significant role in constructing immersive augmen...

1 Yichong Leng, et al. ∙

research

∙ 05/09/2022

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

Text to speech (TTS) has made rapid progress in both academia and indust...

18 Xu Tan, et al. ∙

research

∙ 04/01/2022

AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios

Adaptive text to speech (TTS) can synthesize new voices in zero-shot sce...

4 Yihan Wu, et al. ∙

research

∙ 03/02/2022

Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems

Contextual biasing is an important and challenging task for end-to-end a...

3 Xiaoqiang Wang, et al. ∙

research

∙ 02/08/2022

InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training

Denoising diffusion probabilistic models (diffusion models for short) re...

7 Zehua Chen, et al. ∙

research

∙ 11/18/2021

Transformer-S2A: Robust and Efficient Speech-to-Animation

We propose a novel robust and efficient Speech-to-Animation (S2A) approa...

0 Liyang Chen, et al. ∙

research

∙ 10/08/2021

A study on the efficacy of model pre-training in developing neural text-to-speech system

In the development of neural text-to-speech systems, model pre-training ...

3 Guangyan Zhang, et al. ∙

research

∙ 08/17/2021

A Light-weight contextual spelling correction model for customizing transducer-based speech recognition systems

It's challenging to customize transducer-based automatic speech recognit...

10 Xiaoqiang Wang, et al. ∙

research

∙ 07/06/2021

AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style

While recent text to speech (TTS) models perform very well in synthesizi...

9 Yuzi Yan, et al. ∙

research

∙ 04/20/2021

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

Text to speech (TTS) is widely used to synthesize personal voice for a t...

13 Yuzi Yan, et al. ∙

research

∙ 03/01/2021

AdaSpeech: Adaptive Text to Speech for Custom Voice

Custom voice, a specific text to speech (TTS) service in commercial spee...

25 Mingjian Chen, et al. ∙

research

∙ 02/27/2021

MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network

Mean opinion score (MOS) is a popular subjective metric to assess the qu...

0 Yichong Leng, et al. ∙

research

∙ 02/08/2021

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Text to speech (TTS) has been broadly used to synthesize natural and int...

14 Renqian Luo, et al. ∙

research

∙ 12/17/2020

DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling

While neural-based text to speech (TTS) models can synthesize natural an...

0 Chen Zhang, et al. ∙

research

∙ 08/09/2020

LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition

Speech synthesis (text to speech, TTS) and recognition (automatic speech...

0 Jin Xu, et al. ∙

research

∙ 07/30/2020

Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability

Because of its streaming nature, recurrent neural network transducer (RN...

0 Jinyu Li, et al. ∙

research

∙ 06/08/2020

MultiSpeech: Multi-Speaker Text to Speech with Transformer

Transformer-based text to speech (TTS) model (e.g., Transformer TTS <cit...

0 Mingjian Chen, et al. ∙

research

∙ 06/08/2020

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Advanced text to speech (TTS) models such as FastSpeech can synthesize s...

0 Yi Ren, et al. ∙

research

∙ 05/18/2020

MoBoAligner: a Neural Alignment Model for Non-autoregressive TTS with Monotonic Boundary Search

To speed up the inference of neural speech synthesis, non-autoregressive...

0 Naihan Li, et al. ∙

research

∙ 04/22/2020

A Study of Non-autoregressive Model for Sequence Generation

Non-autoregressive (NAR) models generate all the tokens of a sequence in...

0 Yi Ren, et al. ∙

research

∙ 12/06/2019

Semantic Mask for Transformer based End-to-End Speech Recognition

Attention-based encoder-decoder model has achieved impressive results fo...

0 Chengyi Wang, et al. ∙

research

∙ 05/22/2019

FastSpeech: Fast, Robust and Controllable Text to Speech

Neural network based end-to-end text to speech (TTS) has significantly i...

0 Yi Ren, et al. ∙

research

∙ 05/13/2019

Almost Unsupervised Text to Speech and Automatic Speech Recognition

Text to speech (TTS) and automatic speech recognition (ASR) are two dual...

0 Yi Ren, et al. ∙

research

∙ 04/06/2019

Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion

Grapheme-to-phoneme (G2P) conversion is an important task in automatic s...

0 Hao Sun, et al. ∙

research

∙ 09/19/2018

Close to Human Quality TTS with Transformer

Although end-to-end neural text-to-speech (TTS) methods (such as Tacotro...

0 Naihan Li, et al. ∙

Sheng Zhao

Featured Co-authors

Sign in with Google

Consider DeepAI Pro