Pengyuan Zhang

research

∙ 09/21/2023

The Impact of Silence on Speech Anti-Spoofing

The current speech anti-spoofing countermeasures (CMs) show excellent pe...

0 Yuxiang Zhang, et al. ∙

research

∙ 09/15/2023

One-Class Knowledge Distillation for Spoofing Speech Detection

The detection of spoofing speech generated by unseen algorithms remains ...

0 Jingze Lu, et al. ∙

research

∙ 09/15/2023

Improving Short Utterance Anti-Spoofing with AASIST2

The wav2vec 2.0 and integrated spectro-temporal graph attention network ...

0 Yuxiang Zhang, et al. ∙

research

∙ 08/25/2023

Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder

Neural networks have been able to generate high-quality single-sentence ...

0 Xuyuan Li, et al. ∙

research

∙ 08/12/2023

Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition

When labeled data is insufficient, semi-supervised learning with the pse...

0 Han Zhu, et al. ∙

research

∙ 05/22/2023

Progressive Sub-Graph Clustering Algorithm for Semi-Supervised Domain Adaptation Speaker Verification

Utilizing the large-scale unlabeled data from the target domain via pseu...

0 Zhuo Li, et al. ∙

research

∙ 05/22/2023

The HCCL system for VoxCeleb Speaker Recognition Challenge 2022

This report describes our submission to track1 and track3 for VoxCeleb S...

0 Zhenduo Zhao, et al. ∙

research

∙ 05/15/2023

ForkNet: Simultaneous Time and Time-Frequency Domain Modeling for Speech Enhancement

Previous research in speech enhancement has mostly focused on modeling t...

0 Feng Dang, et al. ∙

research

∙ 03/01/2023

PCF: ECAPA-TDNN with Progressive Channel Fusion for Speaker Verification

ECAPA-TDNN is currently the most popular TDNN-series model for speaker v...

0 Zhenduo Zhao, et al. ∙

research

∙ 02/26/2023

Speech Corpora Divergence Based Unsupervised Data Selection for ASR

Selecting application scenarios matching data is important for the autom...

0 Changfeng Gao, et al. ∙

research

∙ 02/18/2023

Multi-dimensional frequency dynamic convolution with confident mean teacher for sound event detection

Recently, convolutional neural networks (CNNs) have been widely used in ...

0 Shengchang Xiao, et al. ∙

research

∙ 01/19/2023

THLNet: two-stage heterogeneous lightweight network for monaural speech enhancement

In this paper, we propose a two-stage heterogeneous lightweight network ...

0 Feng Dang, et al. ∙

research

∙ 10/13/2022

Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score Fusion

This paper describes the deepfake audio detection system submitted to th...

0 Yuxiang Zhang, et al. ∙

research

∙ 10/12/2022

Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge

Code-switching automatic speech recognition becomes one of the most chal...

0 Shuhao Deng, et al. ∙

research

∙ 06/20/2022

Boosting Cross-Domain Speech Recognition with Self-Supervision

The cross-domain performance of automatic speech recognition (ASR) could...

0 Han Zhu, et al. ∙

research

∙ 06/18/2022

Decoupled Federated Learning for ASR with Non-IID Data

Automatic speech recognition (ASR) with federated learning (FL) makes it...

0 Han Zhu, et al. ∙

research

∙ 06/15/2022

Streaming non-autoregressive model for any-to-many voice conversion

Voice conversion models have developed for decades, and current mainstre...

0 Ziyi Chen, et al. ∙

research

∙ 04/25/2022

Audio-Visual Scene Classification Using A Transfer Learning Based Joint Optimization Strategy

Recently, audio-visual scene classification (AVSC) has attracted increas...

0 Chengxin Chen, et al. ∙

research

∙ 04/25/2022

Back-ends Selection for Deep Speaker Embeddings

Probabilistic Linear Discriminant Analysis (PLDA) was the dominant and n...

0 Zhuo Li, et al. ∙

research

∙ 03/31/2022

CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition

Previous research has looked into ways to improve speech emotion recogni...

0 Chengxin Chen, et al. ∙

research

∙ 02/22/2022

Improving CTC-based speech recognition via knowledge transferring from pre-trained language models

Recently, end-to-end automatic speech recognition models based on connec...

0 Keqi Deng, et al. ∙

research

∙ 01/29/2022

The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge

The voice conversion task is to modify the speaker identity of continuou...

0 Ziyi Chen, et al. ∙

research

∙ 01/25/2022

Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models

While Transformers have achieved promising results in end-to-end (E2E) a...

0 Keqi Deng, et al. ∙

research

∙ 12/23/2021

Data Augmentation based Consistency Contrastive Pre-training for Automatic Speech Recognition

Self-supervised acoustic pre-training has achieved amazing results on th...

0 Changfeng Gao, et al. ∙

research

∙ 10/09/2021

Wav2vec-S: Semi-Supervised Pre-Training for Speech Recognition

Self-supervised pre-training has dramatically improved the performance o...

0 Han Zhu, et al. ∙

research

∙ 04/27/2021

DPT-FSNet:Dual-path Transformer Based Full-band and Sub-band Fusion Network for Speech Enhancement

Recently, dual-path networks have achieved promising performance due to ...

0 Feng Dang, et al. ∙

research

∙ 04/12/2021

Improved Conformer-based End-to-End Speech Recognition Using Neural Architecture Search

Recently neural architecture search(NAS) has been successfully used in i...

0 Yukun Liu, et al. ∙

research

∙ 11/05/2020

Domain Adaptation Using Class Similarity for Robust Speech Recognition

When only limited target domain data is available, domain adaptation cou...

0 Han Zhu, et al. ∙

research

∙ 11/05/2020

Multi-Accent Adaptation based on Gate Mechanism

When only a limited amount of accented speech data is available, to prom...

0 Han Zhu, et al. ∙

research

∙ 10/20/2020

Power pooling: An adaptive pooling function for weakly labelled sound event detection

Access to large corpora with strongly labelled sound events is expensive...

0 Yuzhuo Liu, et al. ∙

research

∙ 07/01/2020

Exploring the time-domain deep attractor network with two-stream architectures in a reverberant environment

With the success of deep learning in speech signal processing, speaker-i...

0 Hangting Chen, et al. ∙

research

∙ 05/27/2020

ACGAN-based Data Augmentation Integrated with Long-term Scalogram for Acoustic Scene Classification

In acoustic scene classification (ASC), acoustic features play a crucial...

0 Hangting Chen, et al. ∙

research

∙ 05/23/2020

Power Pooling Operators and Confidence Learning for Semi-Supervised Sound Event Detection

In recent years, the involvement of synthetic strongly labeled data,weak...

0 Yuzhuo Liu, et al. ∙

research

∙ 01/15/2020

Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture

Recently, Transformer has gained success in automatic speech recognition...

0 Haoran Miao, et al. ∙

research

∙ 12/25/2019

Utterance-level Permutation Invariant Training with Latency-controlled BLSTM for Single-channel Multi-talker Speech Separation

Utterance-level permutation invariant training (uPIT) has achieved promi...

0 Lu Huang, et al. ∙

research

∙ 10/31/2019

CN-CELEB: a challenging Chinese speaker recognition dataset

Recently, researchers set an ambitious goal of conducting speaker recogn...

0 Yue Fan, et al. ∙

research

∙ 07/15/2019

Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling

This technical report describes the IOA team's submission for TASK1A of ...

0 Hangting Chen, et al. ∙

research

∙ 09/21/2015

Noise Robust IOA/CAS Speech Separation and Recognition System For The Third 'CHIME' Challenge

This paper presents the contribution to the third 'CHiME' speech separat...

0 Xiaofei Wang, et al. ∙

Pengyuan Zhang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro