Abstract
Machine learning (ML) and Artificial Intelligence (AI) are two growing fields and have been gaining transcendent attention. Natural language processing (NLP) is the subfield of machine learning, which refers to the ability of computers to interpret and manipulate human language. Many NLP tasks are used to break down human language into sensible chunks for computers. Part of speech (POS) tagging, Named entity recognition (NER), Tokenization, Lemmatization, and Stemming are a few of these tasks. NLP has a variety of real-world applications such as speech recognition, question answering, text auto-correction, text prediction, and Chatbots. Also, NLP has been widely used to uncover worthwhile information from social media. Information extracted from social media posts and reviews can be used by businesses to increase customer satisfaction, optimize costs, and improve business processes. Automating this process to extract useful information in real-time to obtain customers' insight can be beneficial. This automation can be challenging for the following reasons: First, finding reliable resources and accessing them is not a simple task. Second, engineering a technology stack to accomplish monitoring in real-time can be challenging. Third, social media text could be informal and ungrammatical. Hence, training ML model for such systems requires a lot of training data. Forth, ML model performance could decay due to evolving data. In this project, a new pipeline is introduced to extract product names utilizing the existing NER tool to assist businesses to monitor top mentioned products on Twitter. Tweeter was utilized as a source to be streamed to find results that match the user's search term. Some challenges that microposts such as tweets present are addressed. The Continues training concept is implemented to automatically retrain the model when users add a new training set. Moreover, some challenges that could decrease NER task performance due to the informal nature of tweets are addressed by implementing a data preprocessing step, which corrects ill-formed text. Finally, the graphical report of all found entities is displayed using D3.js and jQuery DataTable to let users keep track of mentions of top products. Moreover, Naïve Bayes and Perceptron classifiers are evaluated and compared.