Abstract
This project proposes a new approach to digital healthcare by improving the interpretation of medical record databases through advances in Multi-Label Text Classification (MLTC). MLTC involves assigning multiple labels to a given text, which is a challenging but essential task in healthcare because of the complexity of clinical information. The heterogeneous nature of health data requires effective management strategies to improve patient outcomes and healthcare operations. This project aims to investigate state-of-the-art approaches, including classical machine learning, deep learning, and natural language processing (NLP) models, to improve the accuracy of medical document classification. Central to this project is developing an MLTC framework that not only achieves correct categorization of medical texts but also extracts valuable insights, transforming latent data into actionable knowledge. This project uses two distinct datasets: the Toxic Comment Classification Challenge dataset (cjadams et al., 2017), which is widely used for text classification tasks, and MIMIC-IV (Johnson et al., 2020), an extensive, freely available database of de-identified health data. Using these datasets, we focus on developing a model capable of handling overlapping and diverse medical labels, easing better retrieval, decision support, and patient care.
This project builds upon the baseline work presented in 'Multi-Label Text Classification using Attention-based Graph Neural Network' by Ankit Pal, Muru Selvakumar, and Malaikannan Sankarasubbu (Pal et al., 2020). Their study introduced a graph attention network-based model proposed to capture the attentive dependency structure among labels, using a feature matrix and a correlation matrix to explore dependencies and generate classifiers for the task. In contrast, this project develops a novel model that combines graph neural networks with transformer-based architectures, specifically BERT, to achieve enhanced classification performance. The user interface allows stakeholders to explore the model through Hugging Face Spaces, providing an interactive platform to evaluate the model's capabilities.