Detection of offensive language from social media using BERT and HateBERT

Mansi Soni

Back

Detection of offensive language from social media using BERT and HateBERT

Thesis

Open access

Detection of offensive language from social media using BERT and HateBERT

Mansi Soni

California State University, Sacramento

Master of Science (MS), California State University, Sacramento

08/12/2024

Handle:

https://hdl.handle.net/20.500.12741/rep:12220

Abstract

Natural language processing (Computer science)

Offensive language

Sentiment analysis

In today’s modern era, social media platforms perform an important role in providing platforms to people where they can share their opinions, thoughts, points of view via different mediums including Twitter, Facebook, Instagram, and YouTube. On the other side of the coin, Social Media platforms can become stages for cyberbullying and digital harassment. These types of offensive activities cause mental distress and adverse effects on the human mind. McAfee’s Cyberbullying Report 2022 mentioned that around 28% of children around the globe have faced cyberbullying in which highest rates occurring in US and India. Social media harassment has evolved around hostile and aggressive behavior to damage or disturb someone continuously via different medium over the internet. Among different types of Cyberbullying in particular, we implement various types and target offensive messages on Social Media platforms. In this project, I have analyzed this type of offensive language data and to understand its behaviors. Our project showcased a sophisticated approach of offensive language detection on the social media platform. With the help of offensive language dataset, we explored various types of offences on the social media platform. Furthermore, we implemented and evaluated recurrent neural network with LSTM, NLP based transformers BERT and HateBERT. I have evaluated all models and analyzed my findings. Our results supported the findings regarding the substantial improvement in recall, precision, F-1 score and presenting the model’s effectiveness in identifying offensive language. The purpose of our project was to find offensive content efficiently with the help of models that are higher level in clarification and support insights for this complex mechanism of offensive content findings. The outcome of the project helped us understand the different offensive data accurately that are avail on different social media platforms. The core vision of the project is to establish a healthier social media communication platform.

Files and links (1)

pdf

SoniMansi_Spring2024649.66 kBDownload View

TextProject Open Access

Metrics

1 Record Views

Details

Title: Detection of offensive language from social media using BERT and HateBERT
Creators: Mansi Soni
Contributors: Hady Ahmady Phoulady (Advisor)
Anna Baynes (Committee Member)
Academic Unit: Computer Science Department
Theses and Dissertations: Master of Science (MS); Computer Science; California State University, Sacramento; 04/30/2024; 2024
Publisher: California State University, Sacramento
Publication Details: 08/12/2024
Identifiers: 99258157063101671; https://hdl.handle.net/20.500.12741/rep:12220
Resource Type: Masters Project
Language: English
Number of pages: 52
Comment: The accessibility of this document has been verified by Sacramento State University Library. For questions, please contact lib-508Accessibility@csus.edu.