Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments
This paper addresses the problem of online multiple-speaker localization and tracking in reverberant environments. We propose to use the direct-path relative transfer function (DP-RTF) -- a feature that encodes the inter-channel direct-path information robust against reverberation, hence well suited for reliable localization. A complex Gaussian mixture model (CGMM) is then used, such that each component weight represents the probability that an active speaker is present at a corresponding candidate source direction. Exponentiated gradient descent is used to update these weights online by minimizing a combination of negative log-likelihood and entropy. The latter imposes sparsity over the number of audio sources, since in practice only a few speakers are simultaneously active. The outputs of this online localization process are then used as observations within a Bayesian filtering process whose computation is made tractable via an instance of variational expectation-maximization. Birth and sleeping processes are used to account for the intermittent nature of speech. The method is thoroughly evaluated using several datasets.
READ FULL TEXT