Jagadeesh Balam | DeepAI

Chat Image Generator Video Music Voice Chat Photo Editor

Featured Co-authors

Shinji Watanabe
239 publications
Boris Ginsburg
44 publications
He Huang
33 publications
Somshubra Majumdar
15 publications
Oleksii Kuchaiev
12 publications
Vahid Noroozi
11 publications
Vitaly Lavrukhin
10 publications
Tae Jin Park
10 publications
Kunal Dhawan
8 publications
Yuekai Zhang
7 publications
Nithin Rao Koluguri
6 publications

research

∙ 09/19/2023

Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition

Discrete audio representation, aka audio tokenization, has seen renewed ...

0 Krishna C. Puvvada, et al. ∙

research

∙ 09/18/2023

Investigating End-to-End ASR Architectures for Long Form Audio Transcription

This paper presents an overview and evaluation of some of the end-to-end...

0 Nithin Rao Koluguri, et al. ∙

research

∙ 09/11/2023

Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach

Large language models (LLMs) have shown great promise for capturing cont...

0 Tae Jin Park, et al. ∙

research

∙ 07/13/2023

Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling

We study speech intent classification and slot filling (SICSF) by propos...

0 He Huang, et al. ∙

research

∙ 10/27/2022

AmberNet: A Compact End-to-End Model for Spoken Language Identification

We present AmberNet, a compact end-to-end neural network for Spoken Lang...

0 Fei Jia, et al. ∙

research

∙ 03/30/2022

Multi-scale Speaker Diarization with Dynamic Scale Weighting

Speaker diarization systems are challenged by a trade-off between the te...

0 Tae Jin Park, et al. ∙

research

∙ 07/22/2021

CarneliNet: Neural Mixture Model for Automatic Speech Recognition

End-to-end automatic speech recognition systems have achieved great accu...

0 Aleksei Kalinov, et al. ∙

research

∙ 04/05/2021

SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition

In the English speech-to-text (STT) machine learning task, acoustic mode...

14 Patrick K. O'Neill, et al. ∙

Success!

An error occurred