With recent rapid growth of large language models (LLMs), discrete speec...
The advancement of audio-language (AL) multimodal learning tasks has bee...
Speech is the surface form of a finite set of phonetic units, which can ...
This paper introduces GigaST, a large-scale pseudo speech translation (S...