Since Large Language Models or LLMs have demonstrated high-quality
perfo...
Using a vision-inspired keyword spotting framework, we propose an
archit...
Spotting user-defined/flexible keywords represented in text frequently u...
Using audio and text embeddings jointly for Keyword Spotting (KWS) has s...
DNN pruning is a popular way to reduce the size of a model, improve the
...
Streaming keyword spotting is a widely used solution for activating voic...
This paper explores the possibility of using visual object detection
tec...
Deep Neural Network–Hidden Markov Model (DNN-HMM) based methods have bee...
In this paper, we address the task of determining whether a given uttera...
False triggers in voice assistants are unintended invocations of the
ass...
We present an introspection of an audiovisual speech enhancement model. ...
Emotion plays an essential role in human-to-human communication, enablin...
Automatic speech transcription and speaker recognition are usually treat...
Voice-triggered smart assistants often rely on detection of a trigger-ph...
Millions of people reach out to digital assistants such as Siri every da...