Communication-Cost Aware Microphone Selection For Neural Speech Enhancement with Ad-hoc Microphone Arrays
When performing multi-channel speech enhancement with a wireless acoustic sensor network, streaming information from all sensors can be prohibitive in terms of communication costs. However, not all sensors will be necessary to achieve good performance, which presents an opportunity to reduce communication costs. We propose a data-driven technique to leverage these opportunities by jointly learning a speech enhancement and data-request neural network. Our model is trained with a task-performance/communication-cost trade off. While working within the trade off, our method can intelligently stream from more microphones in lower SNR scenes and fewer microphones in higher SNR scenes. We evaluate the model in a complex echoic acoustic scene with moving sources and show that it matches the performance of a baseline model while streaming less data.
READ FULL TEXT