Online opinion spam detection at business level based on semantic embeddings

Praveen Kumar Reddy Hande

Back

Online opinion spam detection at business level based on semantic embeddings

Thesis

Open access

Online opinion spam detection at business level based on semantic embeddings

Praveen Kumar Reddy Hande

Master of Science (MS), California State University, Sacramento

02/28/2020

Handle: https://hdl.handle.net/10211.3/215215

Abstract

User-generated content

Neural networks (Computer science)

Online opinion spamming has become a potential threat widespread in this digital era as most decisions from the purchase of a simple product to consulting certain doctor are taken based on the online user opinions. Taking this as an advantage, businesses in various fields have either committing online spamming or being affected by the same for several reasons like market competition and profit gains. Despite of the significant research carried out in identifying spam reviews, there is a huge gap left unbridged in detecting the spamming activity on the business as a whole (we measure this as honesty of the businesses). Identifying a single review to be spam or benign cannot clearly justify the business to be dishonest or trustworthy. With the advancements in the camouflage strategies followed by malicious users (spammers) in writing fake reviews, it has become difficult to categorize a review as a spam/no-spam. One such important strategy is singleton review technique – the technique where reviewers create multiple accounts and write only one review under each account. A large number of such Singleton Reviews (SRs) constitute to a biased review of the overall business. Recent research reveals that singleton reviews are a significant source of spam reviews and largely affects the ratings of online businesses. For example, about 68% of the amazon review data are singleton reviews. In this research project, we focus on detecting the businesses that are affected by opinion spamming over time. We take advantage of the Yelp review data containing reviews from 5,044 business by 260,277 reviewers. We leverage the recent techniques in deep learning such as transfer learning, semantic embeddings, auto encoding and LSTMs to classify the business as honest or dishonest based on semantic analysis of their reviews over time. Extensive experiments showed that the proposed models outperformed the baseline models in terms of precision, recall, and F1 score metrics in identifying both honest and dishonest businesses.

Files and links (1)

pdf

Praveen_Hande_MS_Project_Fall20192.20 MBDownload View

Masters project report Open Access

Metrics

3 File views/ downloads

20 Record Views

Details

Title: Online opinion spam detection at business level based on semantic embeddings
Creators: Praveen Kumar Reddy Hande
Contributors: Anna Baynes (Committee Member)
Haiquan Chen (Advisor)
Academic Unit: Computer Science Department
Theses and Dissertations: Master of Science (MS); Computer Science; California State University, Sacramento; 12/05/2019
Publication Details: 02/28/2020
Identifiers: 99257830925301671; https://hdl.handle.net/10211.3/215215
Resource Type: Masters Project
Language: English