Abstract
Most online services continue their reliance on text-based passwords as the primary means of user authentication. With a growing number of these services and the limited creativity and memory to come up with new memorable passwords, users tend to reuse their passwords across multiple platforms. These factors, combined with the increasing number of leaked passwords, make passwords vulnerable to cross-site guessing attacks. Over the years, several popular methods have been proposed to predict subsequently used passwords, such as dictionary attacks, rule-based approaches, neural networks, and combinations of the above.In the first part, we exploit the correlation between the similarity and predictability of subsequent passwords in a dataset of 28.8 million users and their 61.5 million passwords. We use a rule-based approach but delegate rule derivation, classification, and prediction to a Recurrent Neural Network (RNN). We limit the number of guessing attempts to ten yet get an astonishingly high prediction accuracy of up to 83% in under five attempts in several categories, which is twice as much as any other known models or algorithms. It makes our model an effective solution for real-time password guessing against online services without getting spotted or locked out. To the best of our knowledge, this study is the first attempt of its kind using RNN.
In the second part, we explore the use of RNN models in passphrase breaking. Passphrases are perceived to be more secure and easier to remember than passwords of the same length. We work with the dataset built of Corpus of Contemporary American English that contains around 100,000 distinct phrases. We demonstrate that RNN models can predict complete passphrases given the initial word with rates up to 40%, which is twice better than any other known approach. Additionally, the predictions can be achieved in under 5,000 attempts, which is a 100% improvement compared to any known algorithm. In addition, this approach provides ease of deployment and customization as well as low resource consumption. To the best of our knowledge, this is the first attempt at using RNN for passphrase prediction.