Abstract
Nowadays, voice commands are used widely as input methods in various devices to accomplish tasks such as authentication, speech recognition, and pattern analysis. The improvements in Automatic Speech Recognition (ASR) system allows us to perform many tasks which can be done by human listening. These improvements are based on an ongoing development of Deep Neural Networks (DNNs) which has now become the main evaluation technique used in ASR. However, DNNs became more vulnerable to adversarial disturbance as shown by the results of recent research studies. Forcing DNNs to make the false transcription of the audio became an open invitation for the attackers. In recent years, adversarial attacks on audio constitute one of the most frequently happening and most challenging attacks. Since adversarial attacks are major threat in the speech recognition field, prevention of attacks is an important concern in the fields of machine learning and deep learning. In the first part of this project, we created an adversarial attack system by creating adversarial examples of the original audio to fool the ASR system. We created adversarial examples which can attack a DNN of ASR models in the physical world. As a part of defense strategy, we implemented Generative Adversarial Network (GAN) and Convolutional Neural Network (CNN) an unsupervised learning and supervised learning method respectively, which can detect adversarial audios using anomalies or perturbations added into the original audio. Implemented GANs achieved better performance results on audio dataset than other implemented GANs and One-Class SVM.