Generalized LSTM-based End-to-End Text-Independent Speaker Verification
The increasing amount of available data and more affordable hardware solutions have opened a gate to the realm of Deep Learning (DL). Due to the rapid advancements and ever-growing popularity of DL, it has begun to invade almost every field, where machine learning is applicable, by altering the traditional state-of-the-art methods. While many researchers in the speaker recognition area have also started to replace the former state-of-the-art methods with DL techniques, some of the traditional i-vector-based methods are still state-of-the-art in the context of text-independent speaker verification (TI-SV). In this paper, we discuss the most recent generalized end-to-end (GE2E) DL technique based on Long Short-term Memory (LSTM) units for TI-SV by Google and compare different scenarios and aspects including utterance duration, training time, and accuracy to prove that our method outperforms the traditional methods.
READ FULL TEXT