Yuxuan Wang

research

∙ 08/28/2023

InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models

Music editing primarily entails the modification of instrument tracks or...

0 Bing Han, et al. ∙

research

∙ 06/05/2023

Query Encoder Distillation via Embedding Alignment is a Strong Baseline Method to Boost Dense Retriever Online Efficiency

The information retrieval community has made significant progress in imp...

0 Yuxuan Wang, et al. ∙

research

∙ 06/04/2023

MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning

We introduce MoviePuzzle, a novel challenge that targets visual narrativ...

0 Jianghui Wang, et al. ∙

research

∙ 05/30/2023

Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training

We introduce CDBERT, a new learning paradigm that enhances the semantics...

0 Yuxuan Wang, et al. ∙

research

∙ 05/30/2023

VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions

Video-grounded dialogue understanding is a challenging problem that requ...

0 Yuxuan Wang, et al. ∙

research

∙ 05/23/2023

Two Results on Low-Rank Heavy-Tailed Multiresponse Regressions

This paper gives two theoretical results on estimating low-rank paramete...

0 Kangqiang Li, et al. ∙

research

∙ 05/19/2023

Language-universal phonetic encoder for low-resource speech recognition

Multilingual training is effective in improving low-resource ASR, which ...

0 Siyuan Feng, et al. ∙

research

∙ 05/19/2023

Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition

We improve low-resource ASR by integrating the ideas of multilingual tra...

0 Siyuan Feng, et al. ∙

research

∙ 05/18/2023

a unified front-end framework for english text-to-speech synthesis

The front-end is a critical component of English text-to-speech (TTS) sy...

0 Zelin Ying, et al. ∙

research

∙ 05/09/2023

Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing

Automatic dubbing, which generates a corresponding version of the input ...

0 Jingbei Li, et al. ∙

research

∙ 04/21/2023

Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation

Visual-audio navigation (VAN) is attracting more and more attention from...

0 Hongcheng Wang, et al. ∙

research

∙ 12/30/2022

Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition

Recent studies have shown that using an external Language Model (LM) ben...

0 Yukun Feng, et al. ∙

research

∙ 11/11/2022

Interactive Context-Aware Network for RGB-T Salient Object Detection

Salient object detection (SOD) focuses on distinguishing the most conspi...

0 Yuxuan Wang, et al. ∙

research

∙ 10/22/2022

Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation

We study video-grounded dialogue generation, where a response is generat...

0 Xueliang Zhao, et al. ∙

research

∙ 10/22/2022

Neural Sound Field Decomposition with Super-resolution of Sound Direction

Sound field decomposition predicts waveforms in arbitrary directions usi...

0 Qiuqiang Kong, et al. ∙

research

∙ 08/27/2022

Network-Level Adversaries in Federated Learning

Federated learning is a popular strategy for training models on distribu...

0 Giorgio Severi, et al. ∙

research

∙ 08/24/2022

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

VQA is an ambitious task aiming to answer any image-related question. Ho...

0 Stan Weixian Lei, et al. ∙

research

∙ 07/13/2022

Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech

Some recent studies have demonstrated the feasibility of single-stage ne...

0 Zhengxi Liu, et al. ∙

research

∙ 06/16/2022

SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation

Adapting to a continuously evolving environment is a safety-critical cha...

32 Tao Sun, et al. ∙

research

∙ 04/12/2022

VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration

Speech restoration aims to remove distortions in speech signals. Prior m...

0 Haohe Liu, et al. ∙

research

∙ 04/01/2022

Generic Event Boundary Captioning: A Benchmark for Status Changes Understanding

Cognitive science has shown that humans perceive videos in terms of even...

0 Yuxuan Wang, et al. ∙

research

∙ 03/31/2022

NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism

Although deep learning and end-to-end models have been widely used and s...

0 Jingbei Li, et al. ∙

research

∙ 02/10/2022

The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge

We propose two improvements to target-speaker voice activity detection (...

1 Maokui He, et al. ∙

research

∙ 01/25/2022

An Efficient Algorithm for the Partitioning Min-Max Weighted Matching Problem

The Partitioning Min-Max Weighted Matching (PMMWM) problem is an NP-hard...

0 Yuxuan Wang, et al. ∙

research

∙ 11/30/2021

AssistSR: Affordance-centric Question-driven Video Segment Retrieval

It is still a pipe dream that AI assistants on phone and AR glasses can ...

8 Stan Weixian Lei, et al. ∙

research

∙ 10/15/2021

Neural Dubber: Dubbing for Videos According to Scripts

Dubbing is a post-production process of re-recording actors' dialogues, ...

2 Chenxu Hu, et al. ∙

research

∙ 10/13/2021

Deep Superpixel-based Network for Blind Image Quality Assessment

The goal in a blind image quality assessment (BIQA) model is to simulate...

0 Guangyi Yang, et al. ∙

research

∙ 10/07/2021

Cloning one's voice using very limited data in the wild

With the increasing popularity of speech synthesis products, the industr...

0 Dongyang Dai, et al. ∙

research

∙ 09/28/2021

VoiceFixer: Toward General Speech Restoration with Neural Vocoder

Speech restoration aims to remove distortions in speech signals. Prior m...

0 Haohe Liu, et al. ∙

research

∙ 09/12/2021

Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation

Deep neural network based methods have been successfully applied to musi...

0 Qiuqiang Kong, et al. ∙

research

∙ 09/05/2021

The ByteDance Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2021

This paper describes the ByteDance speaker diarization system for the fo...

0 Keke Wang, et al. ∙

research

∙ 07/20/2021

Joint Echo Cancellation and Noise Suppression based on Cascaded Magnitude and Complex Mask Estimation

Acoustic echo and background noise can seriously degrade the intelligibi...

0 Xiaofeng Shu, et al. ∙

research

∙ 07/01/2021

Audiovisual Singing Voice Separation

Separating a song into vocal and accompaniment components is an active r...

0 Bochen Li, et al. ∙

research

∙ 03/26/2021

Supervised Chorus Detection for Popular Music Using Convolutional Neural Network and Multi-task Learning

This paper presents a novel supervised approach to detecting the chorus ...

1 Ju-Chiang Wang, et al. ∙

research

∙ 03/26/2021

Modeling the Compatibility of Stem Tracks to Generate Music Mashups

A music mashup combines audio elements from two or more songs to create ...

1 Jiawen Huang, et al. ∙

research

∙ 03/19/2021

USTC-NELSLIP System Description for DIHARD-III Challenge

This system description describes our submission system to the Third DIH...

5 Yuxuan Wang, et al. ∙

research

∙ 03/02/2021

Listen, Read, and Identify: Multimodal Singing Language Identification of Music

We propose a multimodal singing language classification model that uses ...

0 Keunwoo Choi, et al. ∙

research

∙ 02/19/2021

Speech enhancement with weakly labelled data from AudioSet

Speech enhancement is a task to improve the intelligibility and perceptu...

0 Qiuqiang Kong, et al. ∙

research

∙ 02/19/2021

CatNet: music source separation system with mix-audio augmentation

Music source separation (MSS) is the task of separating a music piece in...

1 Xuchen Song, et al. ∙

research

∙ 10/28/2020

Large-Scale MIDI-based Composer Classification

Music classification is a task to classify a music piece into labels suc...

0 Qiuqiang Kong, et al. ∙

research

∙ 10/11/2020

GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music

Symbolic music datasets are important for music information retrieval an...

0 Qiuqiang Kong, et al. ∙

research

∙ 10/05/2020

High-resolution Piano Transcription with Pedals by Regressing Onsets and Offsets Times

Automatic music transcription (AMT) is the task of transcribing audio re...

2 Qiuqiang Kong, et al. ∙

research

∙ 07/12/2020

Xiaomingbot: A Multilingual Robot News Reporter

This paper proposes the building of Xiaomingbot, an intelligent, multili...

0 Runxin Xu, et al. ∙

research

∙ 05/26/2020

Noise Robust TTS for Low Resource Speakers using Pre-trained Model and Speech Enhancement

With the popularity of deep neural network, speech synthesis task has ac...

1 Dongyang Dai, et al. ∙

research

∙ 05/19/2020

Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech

Accent conversion (AC) transforms a non-native speaker's accent into a n...

0 Wenjie Li, et al. ∙

research

∙ 05/06/2020

Review of text style transfer based on deep learning

Text style transfer is a hot issue in recent natural language processing...

0 Xiangyang Li, et al. ∙

research

∙ 04/28/2020

Adversarial Feature Learning and Unsupervised Clustering based Speech Synthesis for Found Data with Acoustic and Textual Noise

Attention-based sequence-to-sequence (seq2seq) speech synthesis has achi...

0 Shan Yang, et al. ∙

research

∙ 04/23/2020

ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders

This paper presents ByteSing, a Chinese singing voice synthesis (SVS) sy...

0 Yu Gu, et al. ∙

research

∙ 02/06/2020

Source separation with weakly labelled data: An approach to computational auditory scene analysis

Source separation is the task to separate an audio recording into indivi...

1 Qiuqiang Kong, et al. ∙

research

∙ 01/31/2020

Convolutional Embedding for Edit Distance

Edit-distance-based string similarity search has many applications such ...

0 Xinyan Dai, et al. ∙

Yuxuan Wang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro