Self-supervised visual pretraining has shown significant progress recent...
Audio Visual Scene-aware Dialog (AVSD) is a task to generate responses w...
Computational audio analysis has become a central issue in associated ar...
Building a good speech recognition system usually requires large amounts...
Speech recognition technologies are gaining enormous popularity in vario...
Code-switching speech recognition has attracted an increasing interest
r...
End-To-End speech recognition have become increasingly popular in mandar...