Abstract
Spam detection in online reviews is a challenging task to overcome, especially when labeled data are limited. To address this, semi-supervised learning has become an effective approach that leverages both labeled and unlabeled data to improve the accuracy of classification. This study explores the usage of semi-supervised learning techniques to detect fake reviews by leveraging SpaBERT embeddings along with traditional BERT embeddings. The BERT embeddings capture the semantic context of each review, while the SpaBERT embeddings, an extension of BERT, capture the spatial context of each geo-entity within the reviews. This research demonstrates that the addition of this spatial context leads to improved spam detection accuracy.The comparative experiments using the real-world review dataset show that by combining the semantic and spatial contexts of reviews, our model outperformed the baseline GAN-BERT model in terms of effectiveness metrics, making this approach a promising direction for spam detection where labeled data is hard to obtain.