Abstract
Enterprise-level data processing is taking new steps in terms of data management and to derive practical insights from large volumes of data. To stream this big data reliably into actionable results is highly effective using Hadoop technology. Hadoop MapReduce distributed framework enables processing of large datasets simultaneously on large cluster of nodes where the dataset is broken into chunks. Amazon Elastic MapReduce (EMR) gives advantage in deploying and maintaining the Amazon Elastic Compute Cloud (EC2) instances and easily retrieves the data which is stored in cloud like Amazon Simple Storage Service (S3). In this project, I designed and implemented the MapReduce Jobs using Hadoop MapReduce framework on popular Yelp dataset provided by Yelp Inc. The purpose of the project is to extract the hidden facts from the Yelp dataset to make the entrepreneurs understand the business growth and estimate the impact of users rating over the period of time. In addition to that, from the perspective of users it finds the peak time and busy days of the business and gives the information to make early appointments to get around long wait times. Implementation of MapReduce Jobs are carried out in Amazon EMR clusters and used Amazon cloud S3 for data storage. This project has a web application interface built using AngularJS Single Page Architecture. The results from MapReduce Jobs are shown in a graphical manner using ChartJS. This application also allows users to filter business details, compare growth of different business categories and location advantage for a business over other locations.