Shiyin Kang

research

∙ 09/21/2023

Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts

Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speake...

0 Shun Lei, et al. ∙

research

∙ 08/31/2023

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information

This paper presents an end-to-end high-quality singing voice synthesis (...

0 Shaohuan Zhou, et al. ∙

research

∙ 08/31/2023

Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis

The spontaneous behavior that often occurs in conversations makes speech...

0 Weiqin Li, et al. ∙

research

∙ 08/31/2023

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) ...

0 Jie Chen, et al. ∙

research

∙ 07/29/2023

MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis

Expressive speech synthesis is crucial for many human-computer interacti...

0 Shun Lei, et al. ∙

research

∙ 04/25/2023

GTN-Bailando: Genre Consistent Long-Term 3D Dance Generation based on Pre-trained Genre Token Network

Music-driven 3D dance generation has become an intensive research topic ...

0 Haolin Zhuang, et al. ∙

research

∙ 04/19/2023

CB-Conformer: Contextual biasing Conformer for biased word recognition

Due to the mismatch between the source and target domains, how to better...

0 Yaoxun Xu, et al. ∙

research

∙ 04/13/2023

Context-aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis

Recent advances in text-to-speech have significantly improved the expres...

0 Shun Lei, et al. ∙

research

∙ 04/06/2022

Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

Previous works on expressive speech synthesis focus on modelling the mon...

0 Shun Lei, et al. ∙

research

∙ 03/24/2022

Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion

Non-parallel data voice conversion (VC) have achieved considerable break...

0 Xintao Zhao, et al. ∙

research

∙ 03/23/2022

Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

Previous works on expressive speech synthesis mainly focus on current se...

0 Shun Lei, et al. ∙

research

∙ 03/23/2022

FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement

Previously proposed FullSubNet has achieved outstanding performance in D...

0 Jun Chen, et al. ∙

research

∙ 07/07/2021

VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis

This paper describes a variational auto-encoder based non-autoregressive...

0 Hui Lu, et al. ∙

research

∙ 06/20/2020

Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams

Generating 3D speech-driven talking head has received more and more atte...

0 Huirong Huang, et al. ∙

research

∙ 01/06/2020

Audio-visual Recognition of Overlapped speech for the LRS2 dataset

Automatic recognition of overlapped speech remains a highly challenging ...

0 Jianwei Yu, et al. ∙

research

∙ 09/04/2019

DurIAN: Duration Informed Attention Network For Multimodal Synthesis

In this paper, we present a generic and robust multimodal synthesis syst...

0 Chengzhu Yu, et al. ∙

research

∙ 08/30/2019

Maximizing Mutual Information for Tacotron

End-to-end speech synthesis method such as Tacotron, Tacotron2 and Trans...

0 Peng Liu, et al. ∙

Shiyin Kang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro