Abstract
The architecture presented in this paper is an analytical approach for identifying and exploring similarities of topics (such as Twitter) over the spatial and temporal domain. The approach presents a multi-view, D3 based visualization tool for visualizing the results of the analysis. Since tweets are of short length and are ineffective when it comes to computing good result. It requires the text to be first aggregated, then applying topic modeling techniques. The topic extraction becomes complicated with short length text or tweets, which are also known as microblogs. However, applying the aggregation technique and visualization on these microblogs helps find abstract topics or inherent meanings, which is otherwise difficult to find. Interactive visualizations further support the easy identification of burst events happening in different time and space. This tool applies a textual analysis approach to tweets collected from celebrities’ Twitter accounts established in their respective domains such as industry, politics, actors, and space organization. This project presents an easy-to-maintain one-page web application built on a Node.js web server and uses the D3 JavaScript framework. The core functionality can be broken down into four modules - Tweet Collection, Data Preprocessing, Cleansing, Linear Dirichlet Allocation (LDA) Analysis for Topic Generation, and Visualization of results. Tweet collection involves collecting data in the form of tweets from tweet live streaming APIs. The tweets collected are then forwarded to the next stage for data preprocessing and cleansing. In the next stage, tweets go through aggregation based on spatial and temporal aspects, and later the product is forwarded for the LDA stage, which generates the topics. The topics are visualized in depth using D3 and Node.js based web application interactively.