Robust Sound Source Localization Using CNN-LSTM and Q-Learning in Dynamic Scenes

پذیرفته شده برای ارائه شفاهی
کد مقاله : 1096-ISAV2025 (R2)
نویسندگان
1دانشجو
2هیئت علمی دانشگاه لرستان
چکیده
This paper proposes a novel hybrid model for localizing moving sound sources using micro-phone arrays, integrating convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and Q-Learning. The proposed method incorporates multidimensional acoustic features—namely, Mel-frequency cepstral coefficients (MFCCs) and Mel-spectrograms—extracted from time-frequency audio signals into a Q-Learning framework that leverages CNN and LSTM architectures. This integration effectively mitigates the challenges posed by noise and reverberation in complex acoustic environments. For performance evaluation, the model utilizes synthetic data generated by the Room Impulse Response Generator software. Experimental designs were carefully constructed to assess the robustness of the model, with particular emphasis on the impact of incrementally in-creasing noise levels and reverberation times on localization performance. The results demonstrate that the proposed algorithm significantly outperforms existing baseline methods, achieving a locali-zation accuracy of 96% and a root mean square error (RMSE) of 3.650° in direction-of-arrival (DOA) estimation. These findings underscore the substantial potential of the model to deliver relia-ble and efficient sound source localization in real-world acoustic scenarios
کلیدواژه ها
 
Title
Robust Sound Source Localization Using CNN-LSTM and Q-Learning in Dynamic Scenes
Authors
elham yazdankhah, salman karimi
Abstract
This paper proposes a novel hybrid model for localizing moving sound sources using micro-phone arrays, integrating convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and Q-Learning. The proposed method incorporates multidimensional acoustic features—namely, Mel-frequency cepstral coefficients (MFCCs) and Mel-spectrograms—extracted from time-frequency audio signals into a Q-Learning framework that leverages CNN and LSTM architectures. This integration effectively mitigates the challenges posed by noise and reverberation in complex acoustic environments. For performance evaluation, the model utilizes synthetic data generated by the Room Impulse Response Generator software. Experimental designs were carefully constructed to assess the robustness of the model, with particular emphasis on the impact of incrementally in-creasing noise levels and reverberation times on localization performance. The results demonstrate that the proposed algorithm significantly outperforms existing baseline methods, achieving a locali-zation accuracy of 96% and a root mean square error (RMSE) of 3.650° in direction-of-arrival (DOA) estimation. These findings underscore the substantial potential of the model to deliver relia-ble and efficient sound source localization in real-world acoustic scenario
Keywords
Moving, localization, DOA, reverberant