Shi-Xiong Zhang

research

∙ 09/14/2023

M3-AUDIODEC: Multi-channel multi-speaker multi-spatial audio codec

We introduce M3-AUDIODEC, an innovative neural spatial audio codec desig...

0 Anton Ratnarajah, et al. ∙

research

∙ 03/09/2023

MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning

Audio-visual learning helps to comprehensively understand the world by f...

0 Ruize Xu, et al. ∙

research

∙ 02/27/2023

3D Neural Beamforming for Multi-channel Speech Separation Against Location Uncertainty

Multi-channel speech separation using speaker's directional information ...

0 Rongzhi Gu, et al. ∙

research

∙ 12/16/2022

Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation

Recently, frequency domain all-neural beamforming methods have achieved ...

0 Rongzhi Gu, et al. ∙

research

∙ 11/22/2022

Deep Neural Mel-Subband Beamformer for In-car Speech Separation

While current deep learning (DL)-based beamforming techniques have been ...

0 Vinay Kothapally, et al. ∙

research

∙ 05/20/2022

NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement

Acoustic echo cancellation (AEC) plays an important role in the full-dup...

0 Meng Yu, et al. ∙

research

∙ 03/31/2022

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers

In this paper, we present a novel framework that jointly performs speake...

0 Yushi Ueda, et al. ∙

research

∙ 12/05/2021

Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI

Recently, End-to-End (E2E) frameworks have achieved remarkable results o...

0 Jinchuan Tian, et al. ∙

research

∙ 11/29/2021

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

Conversational bilingual speech encompasses three types of utterances: t...

0 Brian Yan, et al. ∙

research

∙ 11/22/2021

Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature

Automatic speech recognition (ASR) of multi-channel multi-speaker overla...

0 Yiwen Shao, et al. ∙

research

∙ 11/09/2021

Joint AEC AND Beamforming with Double-Talk Detection using RNN-Transformer

Acoustic echo cancellation (AEC) is a technique used in full-duplex comm...

0 Vinay Kothapally, et al. ∙

research

∙ 10/07/2021

FAST-RIR: Fast neural diffuse room impulse response generator

We present a neural-network-based fast diffuse room impulse response gen...

0 Anton Ratnarajah, et al. ∙

research

∙ 04/26/2021

Complex Neural Spatial Filter: Enhancing Multi-channel Target Speech Separation in Complex Domain

To date, mainstream target speech separation (TSS) approaches are formul...

0 Rongzhi Gu, et al. ∙

research

∙ 04/17/2021

MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation

Recently, our proposed recurrent neural network (RNN) based all deep lea...

0 Xiyun Li, et al. ∙

research

∙ 01/04/2021

Generalized RNN beamformer for target speech separation

Recently we proposed an all-deep-learning minimum variance distortionles...

0 Yong Xu, et al. ∙

research

∙ 12/24/2020

Multi-channel Multi-frame ADL-MVDR for Target Speech Separation

Many purely neural network based speech separation approaches have been ...

0 Zhuohuang Zhang, et al. ∙

research

∙ 10/30/2020

Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization

This paper proposes a new paradigm for handling far-field multi-speaker ...

0 Aswin Shanmugam Subramanian, et al. ∙

research

∙ 08/21/2020

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation

Speech enhancement and speech separation are two related tasks, whose pu...

0 Daniel Michelsanti, et al. ∙

research

∙ 05/08/2020

Neural Spatio-Temporal Beamformer for Target Speech Separation

Purely neural network (NN) based speech separation and enhancement metho...

0 Yong Xu, et al. ∙

research

∙ 03/16/2020

Multi-modal Multi-channel Target Speech Separation

Target speech separation refers to extracting a target speaker's voice f...

0 Rongzhi Gu, et al. ∙

research

∙ 03/09/2020

Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning

Hand-crafted spatial features (e.g., inter-channel phase difference, IPD...

0 Rongzhi Gu, et al. ∙

research

∙ 02/13/2020

Self-supervised learning for audio-visual speaker diarization

Speaker diarization, which is to find the speech segments of specific sp...

10 Yifan Ding, et al. ∙

research

∙ 01/06/2020

Audio-visual Recognition of Overlapped speech for the LRS2 dataset

Automatic recognition of overlapped speech remains a highly challenging ...

0 Jianwei Yu, et al. ∙

research

∙ 12/17/2019

A Unified Framework for Speech Separation

Speech separation refers to extracting each individual speech source in ...

4 Fahimeh Bahmaninezhad, et al. ∙

research

∙ 09/16/2019

Audio-Visual Speech Separation and Dereverberation with a Two-Stage Multimodal Network

Background noise, interfering speech and room reverberation frequently d...

0 Ke Tan, et al. ∙

research

∙ 05/17/2019

A comprehensive study of speech separation: spectrogram vs waveform separation

Speech separation has been studied widely for single-channel close-talk ...

0 Fahimeh Bahmaninezhad, et al. ∙

research

∙ 05/15/2019

End-to-End Multi-Channel Speech Separation

The end-to-end approach for single-channel speech separation has been st...

0 Rongzhi Gu, et al. ∙

research

∙ 05/11/2019

Encrypted Speech Recognition using Deep Polynomial Networks

The cloud-based speech recognition/API provides developers or enterprise...

0 Shi-Xiong Zhang, et al. ∙

research

∙ 04/08/2019

Improved Speaker-Dependent Separation for CHiME-5 Challenge

This paper summarizes several follow-up contributions for improving our ...

0 Jian Wu, et al. ∙

research

∙ 04/07/2019

Time Domain Audio Visual Speech Separation

Audio-visual multi-modal modeling has been demonstrated to be effective ...

0 Jian Wu, et al. ∙

research

∙ 01/03/2017

End-to-End Attention based Text-Dependent Speaker Verification

A new type of End-to-End system for text-dependent speaker verification ...

0 Shi-Xiong Zhang, et al. ∙

Shi-Xiong Zhang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro