SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

04/18/2019
by   Daniel S. Park, et al.
16

We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients). The augmentation policy consists of warping the features, masking blocks of frequency channels, and masking blocks of time steps. We apply SpecAugment on Listen, Attend and Spell networks for end-to-end speech recognition tasks. We achieve state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work. On LibriSpeech, we achieve 6.8 model, and 5.8 the previous state-of-the-art hybrid system of 7.5 achieve 7.2 without the use of a language model, and 6.8 compares to the previous state-of-the-art hybrid system at 8.3

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset