Multimodal hateful meme classification using Vision Transformer and BERT

Lakshmi Tejaswi Devarapalli

Back

Multimodal hateful meme classification using Vision Transformer and BERT

Thesis

Open access

Multimodal hateful meme classification using Vision Transformer and BERT

Lakshmi Tejaswi Devarapalli

California State University, Sacramento

Master of Science (MS), California State University, Sacramento

04/30/2025

Handle:

https://hdl.handle.net/20.500.12741/rep:13092

Abstract

Hateful content detection

Meme classification

Mutimodal meme analysis

Transformer based models for hateful meme detection

Memes

Internet meme classification

A meme is a socially produced image used to comment on an event, often accompanied by a template of high-quality online pictures with text. Memes can spread humor, but they can also be hurtful to certain groups or individuals. Multimodal memes frequently contain abusive images with unpleasant words, making the classification of hateful memes essential. It has become challenging to categorize abusive memes because the model must get the combined multimodal context of both the image and text. The project's main goal is to reduce hatefulness on online platforms by detecting these hateful memes. This work aims to identify and categorize hateful content shared via memes on social media platforms to create safer online spaces using the capabilities of Vision Transformer (ViT) architecture and Bidirectional Encoder Representations from Transformers (BERT). My project involves extracting text from memes in image format, using Optical Character Recognition (OCR), and then building a model that integrates ViT for image analysis and BERT for text processing. Further, the model is trained by concatenating the output layers of ViT and BERT. The "Hateful memes" dataset is sourced from the Hugging Face platform and is specifically designed to classify hateful memes. Each meme in the dataset is labeled as either hateful or not.

Files and links (1)

pdf

DevarapalliLakshmiTejaswi_Fall20241.46 MBDownload View

TextProject Open Access

Metrics

2 Record Views

Details

Title: Multimodal hateful meme classification using Vision Transformer and BERT
Creators: Lakshmi Tejaswi Devarapalli
Contributors: Kin Chung Kwan (Advisor)
Bang Tran (Committee Member)
Academic Unit: Computer Science Department
Theses and Dissertations: Master of Science (MS); Computer Science; California State University, Sacramento; 12/06/2024; 2024
Publisher: California State University, Sacramento
Publication Details: 04/30/2025
Identifiers: 99258207117801671; https://hdl.handle.net/20.500.12741/rep:13092
Resource Type: Masters Project
Language: English
Number of pages: 55
Comment: The accessibility of this document has been verified by Sacramento State University Library. For questions, please contact lib-508Accessibility@csus.edu.