Improving singing voice separation using Deep U-Net and Wave-U-Net with data augmentation

03/04/2019
by   Alice Cohen-Hadria, et al.
0

State-of-the-art singing voice separation is based on deep learning making use of CNN structures with skip connections (like U-net model, Wave-U-Net model, or MSDENSELSTM). A key to the success of these models is the availability of a large amount of training data. In the following study, we are interested in singing voice separation for mono signals and will investigate into comparing the U-Net and the Wave-U-Net that are structurally similar, but work on different input representations. First, we report a few results on variations of the U-Net model. Second, we will discuss the potential of state of the art speech and music transformation algorithms for augmentation of existing data sets and demonstrate that the effect of these augmentations depends on the signal representations used by the model. The results demonstrate a considerable improvement due to the augmentation for both models. But pitch transposition is the most effective augmentation strategy for the U-Net model, while transposition, time stretching, and formant shifting have a much more balanced effect on the Wave-U-Net model. Finally, we compare the two models on the same dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2019

Improving singing voice separation with the Wave-U-Net using Minimum Hyperspherical Energy

In recent years, deep learning has surpassed traditional approaches to t...
research
06/06/2019

Singing voice separation: a study on training data

In the recent years, singing voice separation systems showed increased p...
research
03/28/2022

Improved singing voice separation with chromagram-based pitch-aware remixing

Singing voice separation aims to separate music into vocals and accompan...
research
06/16/2021

Source Separation-based Data Augmentation for Improved Joint Beat and Downbeat Tracking

Due to advances in deep learning, the performance of automatic beat and ...
research
06/08/2018

Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation

Models for audio source separation usually operate on the magnitude spec...
research
08/11/2020

Transfer Learning for Improving Singing-voice Detection in Polyphonic Instrumental Music

Detecting singing-voice in polyphonic instrumental music is critical to ...
research
02/20/2021

Singer Identification Using Deep Timbre Feature Learning with KNN-Net

In this paper, we study the issue of automatic singer identification (SI...

Please sign up or login with your details

Forgot password? Click here to reset