Abstract
Localization implies tracking objects in the given environment. Sound source localization is a prominent research area to improve hearing sense in human-machine interaction. It has numerous applications including smart speakers and robots. The microphone array is capable to record sound to build powerful applications using audio data. In this project, we propose deep learning-based sound source localization with a microphone array. Using speaker and microphone array, we collect training and testing audio data for different user locations in two home environments. We mark each user location in the 2D plane using x and y coordinates. Test locations are within the range of 1 meter from the corresponding training locations. we extract features from audio data for each location using Short Time Fourier Transform (STFT) and convert audio data into spectrogram images for each location. Then, we apply deep convolutional neural networks on training locations to classify the user audio location. we use the same trained model on the test dataset to estimate user locations. We calculate distance error between the test and predicted user locations. In the end, experimental results show that our proposed system can obtain good accuracy and less error in classifying the user locations.