Abstract
With the rapid development of industries and corporate sectors, the amount of data that is generated is increasing exponentially. Data collection has reached almost every corner of the technologies we are using. Many businesses are using this data to make better decisions to improve profits and customer satisfaction; governments are taking optimal steps to improve public services by using collected data. Scientists need data to conduct research and extract new discoveries; even a typical individual uses data to personalize their experiences. Data has become a primary source of progress and innovation in today’s world. These data can be in many forms: text, numbers, images, video, audio, etc. Not all the data can be understood by just looking at it. Thus, data requires visualization, which involves making meaningful charts that extract underlying insights from the data. These visualizations minimize users’ effort to understand the data and its underlying trends and patterns and can improve user comprehension. However, for some people who have full or partial visual disabilities, it can be challenging to understand these visualizations.
The LINECAP, a novel figure captioning dataset that is used in this project, has a collection of line charts. Each line chart has a human-generated summary or caption and a number that indicates the number of lines in the chart attached to it. This dataset has been used to develop machine learning models that predict the count of lines and summary for these line charts. This captioning dataset contains a total of 3528 line chart images. Instead of trying to look and understand the chart, these models can be used to summarize the line chart and predict the count of lines in the line charts.
This project focuses on building machine learning models that process these line chart images and generate results that help people with visual impairment comprehend these charts. The Line count prediction model uses the DenseNet architecture for the image feature extraction. The second model, the Line caption generation model, comprises the transformer architecture as the language modeling. This project also aims to cover the web application that works alongside these models and use them for generating summaries for the line charts.