Few-shot keyword spotting (FS-KWS) models usually require large-scale
an...
Streaming automatic speech recognition (ASR) models are restricted from
...
Monocular depth estimation is very challenging because clues to the exac...
Image text retrieval is a task to search for the proper textual descript...
Pre-trained Transformer models such as BERT have shown great success in ...
Generalized zero-shot learning (GZSL) is a technique to train a deep lea...
Transformer-based deep neural networks have achieved great success in va...
Phoneme recognition is a very important part of speech recognition that
...
Transformer-based speech recognition models have achieved great success ...
The performance of automatic speech recognition (ASR) models can be grea...
Generalized zero-shot learning (GZSL) is a technique to train a deep lea...
While Transformer-based models have shown impressive language modeling
p...