Abstract
With the fast development of the Internet of Things (IoT), smart speakers for voice assistance have become increasingly important in smart homes, which offers a new type of human-machine interaction interface. Voice localization with microphone arrays can improve smart speaker's performance and enable many new IoT applications. To address the challenges of complex indoor environments, such as non-line-of-sight (NLOS) and multi-path propagation, we propose voice fingerprinting for indoor localization using a single microphone array. The proposed system consists of a ReSpeaker 6-mic circular array kit connected to a Raspberry Pi and a deep learning model, and operates in offline training and online test stages. In the offline stage, the models are trained with spectrogram images obtained from audio data using short-time Fourier transform (STFT). Transfer learning is used to speed up the training process. In the online stage, a top- K probabilistic method is used for location estimation. Our experimental results demonstrate that the Inception-ResNet-v2 model can achieve a satisfactory localization performance with small location errors in two typical home environments.