Jianhua Tao

research

∙ 09/15/2023

Controllable Residual Speaker Representation for Voice Conversion

Recently, there have been significant advancements in voice conversion, ...

0 Le Xu, et al. ∙

research

∙ 09/13/2023

Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms

Recent strides in neural speech synthesis technologies, while enjoying w...

0 Chu Yuan Zhang, et al. ∙

research

∙ 09/07/2023

DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection

Auditory Attention Detection (AAD) aims to detect target speaker from br...

0 Cunhang Fan, et al. ∙

research

∙ 08/29/2023

Audio Deepfake Detection: A Survey

Audio deepfake detection is an emerging active topic. A growing number o...

0 Jiangyan Yi, et al. ∙

research

∙ 08/19/2023

Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake Speech Detection

The rhythm of synthetic speech is usually too smooth, which causes that ...

0 Cunhang Fan, et al. ∙

research

∙ 08/07/2023

Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection

Current fake audio detection algorithms have achieved promising performa...

0 Xiaohui Zhang, et al. ∙

research

∙ 07/17/2023

TST: Time-Sparse Transducer for Automatic Speech Recognition

End-to-end model, especially Recurrent Neural Network Transducer (RNN-T)...

0 Xiaohui Zhang, et al. ∙

research

∙ 07/05/2023

MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition

Dynamic facial expression recognition (DFER) is essential to the develop...

0 Licai Sun, et al. ∙

research

∙ 06/27/2023

Explainable Multimodal Emotion Reasoning

Multimodal emotion recognition is an active research topic in artificial...

0 Zheng Lian, et al. ∙

research

∙ 06/09/2023

Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion

Denoising Diffusion Probabilistic Models have shown extraordinary abilit...

0 Haogeng Liu, et al. ∙

research

∙ 06/09/2023

Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection

Self-supervised speech models are a rapidly developing research topic in...

0 Chenglong Wang, et al. ∙

research

∙ 06/08/2023

Adaptive Fake Audio Detection with Low-Rank Model Squeezing

The rapid advancement of spoofing algorithms necessitates the developmen...

0 Xiaohui Zhang, et al. ∙

research

∙ 05/23/2023

ADD 2023: the Second Audio Deepfake Detection Challenge

Audio deepfake detection is an emerging topic in the artificial intellig...

0 Jiangyan Yi, et al. ∙

research

∙ 05/23/2023

TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection

Current fake audio detection relies on hand-crafted features, which lose...

0 Chenglong Wang, et al. ∙

research

∙ 05/23/2023

Detection of Cross-Dataset Fake Audio Based on Prosodic and Pronunciation Features

Existing fake audio detection systems perform well in in-domain testing,...

0 Chenglong Wang, et al. ∙

research

∙ 05/03/2023

M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis

Conversational text-to-speech (TTS) aims to synthesize speech with prope...

0 Jinlong Xue, et al. ∙

research

∙ 04/18/2023

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning

Over the past few decades, multimodal emotion recognition has made remar...

0 Zheng Lian, et al. ∙

research

∙ 01/28/2023

DALI: Dynamically Adjusted Label Importance for Noisy Partial Label Learning

Noisy partial label learning (noisy PLL) is an important branch of weakl...

0 Mingyu Xu, et al. ∙

research

∙ 01/10/2023

UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion

Text-to-speech (TTS) and voice conversion (VC) are two different tasks b...

0 Haogeng Liu, et al. ∙

research

∙ 12/20/2022

Emotion Selectable End-to-End Text-based Speech Editing

Text-based speech editing allows users to edit speech by intuitively cut...

0 PetsTime, et al. ∙

research

∙ 11/11/2022

SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection

Previous databases have been designed to further the development of fake...

0 Jiangyan Yi, et al. ∙

research

∙ 11/10/2022

EmoFake: An Initial Dataset for Emotion Fake Audio Detection

There are already some datasets used for fake audio detection, such as t...

0 Yan Zhao, et al. ∙

research

∙ 11/09/2022

ARNet: Automatic Refinement Network for Noisy Partial Label Learning

Partial label learning (PLL) is a typical weakly supervised learning, wh...

0 Zheng Lian, et al. ∙

research

∙ 10/20/2022

Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS

Current end-to-end code-switching Text-to-Speech (TTS) can already gener...

0 Chunyu Qiang, et al. ∙

research

∙ 10/06/2022

An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era

Speech is the fundamental mode of human communication, and its synthesis...

0 Andreas Triantafyllopoulos, et al. ∙

research

∙ 08/21/2022

System Fingerprints Detection for DeepFake Audio: An Initial Dataset and Investigation

Many effective attempts have been made for deepfake audio detection. How...

0 Xinrui Yan, et al. ∙

research

∙ 08/20/2022

An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio

Many effective attempts have been made for fake audio detection. However...

0 Xinrui Yan, et al. ∙

research

∙ 08/20/2022

Fully Automated End-to-End Fake Audio Detection

The existing fake audio detection systems often rely on expert experienc...

0 Chenglong Wang, et al. ∙

research

∙ 08/16/2022

Efficient Multimodal Transformer with Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis

With the proliferation of user-generated online videos, Multimodal Senti...

0 Licai Sun, et al. ∙

research

∙ 08/02/2022

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features

Recently, pioneer research works have proposed a large number of acousti...

0 Jun Xue, et al. ∙

research

∙ 07/23/2022

Two-Aspect Information Fusion Model For ABAW4 Multi-task Challenge

In this paper, we propose the solution to the Multi-Task Learning (MTL) ...

0 Haiyang Sun, et al. ∙

research

∙ 07/12/2022

FAD: A Chinese Dataset for Fake Audio Detection

Fake audio detection is a growing concern and some relevant datasets hav...

0 Haoxin Ma, et al. ∙

research

∙ 04/26/2022

Adaptive Pseudo-Siamese Policy Network for Temporal Knowledge Prediction

Temporal knowledge prediction is a crucial task for the event early warn...

0 Pengpeng Shao, et al. ∙

research

∙ 03/25/2022

EmotionNAS: Two-stream Architecture Search for Speech Emotion Recognition

Speech emotion recognition (SER) is a crucial research topic in human-co...

3 Haiyang Sun, et al. ∙

research

∙ 03/05/2022

NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband Excitation for Noise-Controllable Waveform Generation

The traditional vocoders have the advantages of high synthesis efficienc...

8 PetsTime, et al. ∙

research

∙ 03/04/2022

GCNet: Graph Completion Network for Incomplete Multimodal Learning in Conversation

Conversations have become a critical data format on social media platfor...

0 Zheng Lian, et al. ∙

research

∙ 02/21/2022

CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing

The text-based speech editor allows the editing of speech through intuit...

0 PetsTime, et al. ∙

research

∙ 02/19/2022

MixKG: Mixing for harder negative samples in knowledge graph

Knowledge graph embedding (KGE) aims to represent entities and relations...

0 Feihu Che, et al. ∙

research

∙ 02/17/2022

ADD 2022: the First Audio Deep Synthesis Detection Challenge

Audio deepfake detection is an emerging topic, which was included in the...

0 Jiangyan Yi, et al. ∙

research

∙ 02/16/2022

Singing-Tacotron: Global duration control attention and dynamic filter for End-to-end singing voice synthesis

End-to-end singing voice synthesis (SVS) is attractive due to the avoida...

0 PetsTime, et al. ∙

research

∙ 01/28/2022

Reducing language context confusion for end-to-end code-switching automatic speech recognition

Code-switching is about dealing with alternative languages in the commun...

0 Shuai Zhang, et al. ∙

research

∙ 12/17/2021

Knowledge graph enhanced recommender system

Knowledge Graphs (KGs) have shown great success in recommendation. This ...

0 Zepeng Huai, et al. ∙

research

∙ 07/06/2021

Multi-Level Graph Contrastive Learning

Graph representation learning has attracted a surge of interest recently...

0 Pengpeng Shao, et al. ∙

research

∙ 04/15/2021

Continual Learning for Fake Audio Detection

Fake audio attack becomes a major threat to the speaker verification sys...

0 Haoxin Ma, et al. ∙

research

∙ 04/08/2021

Half-Truth: A Partially Fake Audio Detection Dataset

Diverse promising datasets have been designed to hold back the developme...

0 Jiangyan Yi, et al. ∙

research

∙ 04/07/2021

FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization

Transducer-based models, such as RNN-Transducer and transformer-transduc...

0 Zhengkun Tian, et al. ∙

research

∙ 04/04/2021

TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition

The autoregressive (AR) models, such as attention-based encoder-decoder ...

0 Zhengkun Tian, et al. ∙

research

∙ 02/15/2021

Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT

Attention-based encoder-decoder (AED) models have achieved promising per...

11 Ye Bai, et al. ∙

research

∙ 11/16/2020

Tucker decomposition-based Temporal Knowledge Graph Completion

Knowledge graphs have been demonstrated to be an effective tool for nume...

0 Pengpeng Shao, et al. ∙

research

∙ 11/11/2020

Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning

Recurrent neural networks (RNNs) have shown significant improvements in ...

0 Cunhang Fan, et al. ∙

Jianhua Tao

Featured Co-authors

Sign in with Google

Consider DeepAI Pro