Botnet account detection on Twitter using deep learning

Bhoomi Shah

Back

Botnet account detection on Twitter using deep learning

Thesis

Open access

Botnet account detection on Twitter using deep learning

Bhoomi Shah

Master of Science (MS), California State University, Sacramento

01/22/2020

Handle: https://hdl.handle.net/10211.3/214834

Abstract

Semantic computing

Computational Linguistics

Data Mining

Machine Learning

In recent years, many technologies have been developed which have helped to build botnets. There are also some AI tools available to generate an automatic post on Twitter. Although there are many advantages of these tools and technologies, they are also used by spammers to spread fake news and election campaign. Bots are also dangerous in the area of click fraud, creating a negative environment, and providing misleading information. Bots work in groups to gain attention - bots retweet each other to make it a trend or posting spam tweets with minor modifications to make it seem like human tweets. The motivation of this project is to use machine learning to detect individual botnet accounts on Twitter, which are working in groups. In this study, we have used a dataset provided by Kaggle. This dataset contains user profile related information of 1,321 bots and 1,476 non-bots. Data of tweets, which are similar to those of a particular user’s tweets, are retrieved using Twitter REST API. We have used two factors to identify the bots on Twitter: user profile information and semantic similarity score. The approach is to train a machine learning model with features such as name, status, description, follower count, listed count, screen name, verification status, and average semantic score. An average semantic score of those is calculated using a pre-trained deep learning model provided by Google. Each tweet is encoded into a fixed 512 size of vector. Then, cosine similarity is calculated to find similarity between two vectors. Higher cosine similarity means tweets are more similar to each other and vice versa. User’s tweets compared with each similar tweet and an average semantic similarity score is calculated. This score, combined with other user profile features, is used to train machine learning models. The Neural Network model outperformed a Multinomial Naive Bayes approach with an accuracy of 67.7% of a testing dataset.

Files and links (1)

pdf

Final_Bhoomi Shah - Report.508CompliantCopy539.70 kBDownload View

TextProject Open Access

Metrics

23 File views/ downloads

95 Record Views

Details

Title: Botnet account detection on Twitter using deep learning
Creators: Bhoomi Shah
Contributors: Jun Dai (Committee Member)
V. Scott Gordon (Advisor)
Academic Unit: Computer Science Department; Student Research Center
Theses and Dissertations: Master of Science (MS); Computer Science; California State University, Sacramento; 12/05/2019
Publication Details: 01/22/2020
Identifiers: 99257830787401671; https://hdl.handle.net/10211.3/214834
Resource Type: Masters Project
Language: English
Comment: The accessibility of this document has been verified by Sacramento State University Library. For questions, please contact lib-508Accessibility@csus.edu.