A vision transformer-driven method for generating medical reports based on x-ray radiology

Jyothirmai Kottu

Back

A vision transformer-driven method for generating medical reports based on x-ray radiology

Thesis

Open access

A vision transformer-driven method for generating medical reports based on x-ray radiology

Jyothirmai Kottu

California State University, Sacramento

Master of Science (MS), California State University, Sacramento

10/07/2024

Handle:

https://hdl.handle.net/20.500.12741/rep:12538

Abstract

Healthcare

Radiology reports

Vision transformers

Machine Learning

Medical reports, radiology, and pathology images are critical in achieving accurate diagnosis and treatment planning in the medical field. However, generating these reports is time-consuming and prone to errors. To address this, we explore the potential of transformer-based architectures for the automatic generation of medical reports. Our study is focused on investigating how the combination of Vision Transformer (ViT) and Contrastive Language-Image Pre-training (CLIP) model (specifically the VIT B/14 variant) as image encoders, along with Generative Pre-trained Transformer (GPT) as text decoder, enhance the understanding of the relationship between medical images and reports. In this paper, we explore three architectures: ViT - CoAttention - LSTM, ViT-GPT2, and CLIP-GPT2. Experiments on a public dataset [1] demonstrate that our best model, CLIP-GPT2, outperforms existing baseline models. Furthermore, we integrate these models into a web application deployed on Hugging Face for ease of use and broader accessibility.

Files and links (1)

pdf

KottuJyothirmai_ThesisReport2.42 MBDownload View

TextThesis Open Access

Metrics

1 Record Views

Details

Title: A vision transformer-driven method for generating medical reports based on x-ray radiology
Creators: Jyothirmai Kottu
Contributors: Haiquan Chen (Advisor)
Ying Jin (Committee Member)
Anna Baynes (Committee Member)
Academic Unit: Computer Science Department
Theses and Dissertations: Master of Science (MS); Computer Science; California State University, Sacramento; 04/24/2024; 2024
Publisher: California State University, Sacramento
Publication Details: 10/07/2024
Identifiers: 99258164162601671; https://hdl.handle.net/20.500.12741/rep:12538
Resource Type: Masters Thesis
Language: English
Number of pages: 51