Zhengqi Wen

research

∙ 05/23/2023

ADD 2023: the Second Audio Deepfake Detection Challenge

Audio deepfake detection is an emerging topic in the artificial intellig...

0 Jiangyan Yi, et al. ∙

research

∙ 03/02/2023

Learning From Yourself: A Self-Distillation Method for Fake Speech Detection

In this paper, we propose a novel self-distillation method for fake spee...

0 Jun Xue, et al. ∙

research

∙ 01/10/2023

UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion

Text-to-speech (TTS) and voice conversion (VC) are two different tasks b...

0 Haogeng Liu, et al. ∙

research

∙ 12/20/2022

Emotion Selectable End-to-End Text-based Speech Editing

Text-based speech editing allows users to edit speech by intuitively cut...

0 PetsTime, et al. ∙

research

∙ 10/20/2022

Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS

Current end-to-end code-switching Text-to-Speech (TTS) can already gener...

0 Chunyu Qiang, et al. ∙

research

∙ 08/02/2022

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features

Recently, pioneer research works have proposed a large number of acousti...

0 Jun Xue, et al. ∙

research

∙ 03/05/2022

NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband Excitation for Noise-Controllable Waveform Generation

The traditional vocoders have the advantages of high synthesis efficienc...

8 PetsTime, et al. ∙

research

∙ 02/21/2022

CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing

The text-based speech editor allows the editing of speech through intuit...

0 PetsTime, et al. ∙

research

∙ 02/17/2022

ADD 2022: the First Audio Deep Synthesis Detection Challenge

Audio deepfake detection is an emerging topic, which was included in the...

0 Jiangyan Yi, et al. ∙

research

∙ 02/16/2022

Singing-Tacotron: Global duration control attention and dynamic filter for End-to-end singing voice synthesis

End-to-end singing voice synthesis (SVS) is attractive due to the avoida...

0 PetsTime, et al. ∙

research

∙ 04/07/2021

FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization

Transducer-based models, such as RNN-Transducer and transformer-transduc...

0 Zhengkun Tian, et al. ∙

research

∙ 04/04/2021

TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition

The autoregressive (AR) models, such as attention-based encoder-decoder ...

0 Zhengkun Tian, et al. ∙

research

∙ 02/15/2021

Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT

Attention-based encoder-decoder (AED) models have achieved promising per...

11 Ye Bai, et al. ∙

research

∙ 11/11/2020

Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning

Recurrent neural networks (RNNs) have shown significant improvements in ...

0 Cunhang Fan, et al. ∙

research

∙ 11/09/2020

Gated Recurrent Fusion with Joint Training Framework for Robust End-to-End Speech Recognition

The joint training framework for speech enhancement and recognition meth...

0 Cunhang Fan, et al. ∙

research

∙ 10/28/2020

Decoupling Pronunciation and Language for End-to-end Code-switching Automatic Speech Recognition

Despite the recent significant advances witnessed in end-to-end (E2E) AS...

0 Shuai Zhang, et al. ∙

research

∙ 05/16/2020

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition

Non-autoregressive transformer models have achieved extremely fast infer...

0 Zhengkun Tian, et al. ∙

research

∙ 05/11/2020

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition

Although attention based end-to-end models have achieved promising perfo...

0 Ye Bai, et al. ∙

research

∙ 04/06/2020

Simultaneous Denoising and Dereverberation Using Deep Embedding Features

Monaural speech dereverberation is a very challenging task because no sp...

0 Cunhang Fan, et al. ∙

research

∙ 03/17/2020

Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method

In this paper, we propose an end-to-end post-filter method with deep att...

0 Cunhang Fan, et al. ∙

research

∙ 02/05/2020

Spatial and spectral deep attention fusion for multi-channel speech separation using deep embedding features

Multi-channel deep clustering (MDC) has acquired a good performance for ...

0 Cunhang Fan, et al. ∙

research

∙ 12/06/2019

Synchronous Transformers for End-to-End Speech Recognition

For most of the attention-based sequence-to-sequence models, the decoder...

0 Zhengkun Tian, et al. ∙

research

∙ 12/04/2019

Integrating Whole Context to Sequence-to-sequence Speech Recognition

Because an attention based sequence-to-sequence speech (Seq2Seq) recogni...

0 Ye Bai, et al. ∙

research

∙ 10/24/2019

Towards Fine-Grained Prosody Control for Voice Conversion

In a typical voice conversion system, prior works utilize various acoust...

0 Zheng Lian, et al. ∙

research

∙ 09/28/2019

Self-Attention Transducers for End-to-End Speech Recognition

Recurrent neural network transducers (RNN-T) have been successfully appl...

0 Zhengkun Tian, et al. ∙

research

∙ 07/23/2019

Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features

Deep clustering (DC) and utterance-level permutation invariant training ...

0 Cunhang Fan, et al. ∙

research

∙ 07/18/2019

Forward-Backward Decoding for Regularizing End-to-End TTS

Neural end-to-end TTS can generate very high-quality synthesized speech,...

0 Yibin Zheng, et al. ∙

research

∙ 07/13/2019

Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition

Integrating an external language model into a sequence-to-sequence speec...

0 Ye Bai, et al. ∙

research

∙ 02/20/2018

Distilling Knowledge Using Parallel Data for Far-field Speech Recognition

In order to improve the performance for far-field speech recognition, th...

0 Jiangyan Yi, et al. ∙

research

∙ 03/28/2016

Audio Visual Emotion Recognition with Temporal Alignment and Perception Attention

This paper focuses on two key problems for audio-visual emotion recognit...

0 Linlin Chao, et al. ∙

Zhengqi Wen

Featured Co-authors

Sign in with Google

Consider DeepAI Pro