Abstract
In recent years, many technologies have been developed which have helped to build botnets. There are also some AI tools available to generate an automatic post on Twitter. Although there are many advantages of these tools and technologies, they are also used by spammers to spread fake news and election campaign. Bots are also dangerous in the area of click fraud, creating a negative environment, and providing misleading information. Bots work in groups to gain attention - bots retweet each other to make it a trend or posting spam tweets with minor modifications to make it seem like human tweets. The motivation of this project is to use machine learning to detect individual botnet accounts on Twitter, which are working in groups. In this study, we have used a dataset provided by Kaggle. This dataset contains user profile related information of 1,321 bots and 1,476 non-bots. Data of tweets, which are similar to those of a particular user’s tweets, are retrieved using Twitter REST API. We have used two factors to identify the bots on Twitter: user profile information and semantic similarity score. The approach is to train a machine learning model with features such as name, status, description, follower count, listed count, screen name, verification status, and average semantic score. An average semantic score of those is calculated using a pre-trained deep learning model provided by Google. Each tweet is encoded into a fixed 512 size of vector. Then, cosine similarity is calculated to find similarity between two vectors. Higher cosine similarity means tweets are more similar to each other and vice versa. User’s tweets compared with each similar tweet and an average semantic similarity score is calculated. This score, combined with other user profile features, is used to train machine learning models. The Neural Network model outperformed a Multinomial Naive Bayes approach with an accuracy of 67.7% of a testing dataset.