Robust Sound Source Localization Using CNN-LSTM and Q-Learning in Dynamic Scenes

پذیرفته شده برای ارائه شفاهی

کد مقاله : 1096-ISAV2025 (R2)

نویسندگان

¹دانشجو

²هیئت علمی دانشگاه لرستان

چکیده

This paper proposes a novel hybrid model for localizing moving sound sources using micro-phone arrays, integrating convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and Q-Learning. The proposed method incorporates multidimensional acoustic features—namely, Mel-frequency cepstral coefficients (MFCCs) and Mel-spectrograms—extracted from time-frequency audio signals into a Q-Learning framework that leverages CNN and LSTM architectures. This integration effectively mitigates the challenges posed by noise and reverberation in complex acoustic environments. For performance evaluation, the model utilizes synthetic data generated by the Room Impulse Response Generator software. Experimental designs were carefully constructed to assess the robustness of the model, with particular emphasis on the impact of incrementally in-creasing noise levels and reverberation times on localization performance. The results demonstrate that the proposed algorithm significantly outperforms existing baseline methods, achieving a locali-zation accuracy of 96% and a root mean square error (RMSE) of 3.650° in direction-of-arrival (DOA) estimation. These findings underscore the substantial potential of the model to deliver relia-ble and efficient sound source localization in real-world acoustic scenarios

کلیدواژه ها

Moving؛ localization؛ DOA؛ reverberant

موضوعات

Signal Processing in Acoustics

Title

Robust Sound Source Localization Using CNN-LSTM and Q-Learning in Dynamic Scenes

Authors

elham yazdankhah, salman karimi

Abstract

Keywords

Moving, localization, DOA, reverberant