Abstract
The idea is to solve people’s dilemma for choosing a certain product or service over other. The objective is to collect data from tweeter feeds on trending topics like various OS, new technologies, different Products etc. and to categories them according to need. This Project would use Scala and Spark cluster. All the Big Data that would be fetched would be collected in MySQL database. Then using the MapReduce API’s and Data Mining algorithms data will be classified accordingly. Here we would be using Naïve Bayes Algorithm for Sentimental analysis over the data. Twitter provides a streaming API to stream real time twitter data. This library named Twitter4j is available in python to access Streaming API and download twitter data. The filter on this data is based on a list of keywords supplied. This analysis will employee a distributed data processing system known as Apache Spark using several worker and master nodes. This cluster is scalable and can handle millions of records. To filter out the huge number of data we will use Map reduce technique on Spark. The input file will contain a JSON object for each tweet in data. This file will be uploaded on Spark frame structure. The Spark frame structure will have replicated the structure and distribute to multiple nodes, thus the mapper takes all files presented in the directory and classifies them according to the filter set. These filtered tweets will pass through the data mining algorithms and thus we could observe one to one comparison of data which would be helpful to take tough decisions.