Enhanced FPGA based deep learning accelerator architecture for optimized performance in real time AI applications

Manoj Kamal Karala; Sai Vasanth Bijibilla

Back

Enhanced FPGA based deep learning accelerator architecture for optimized performance in real time AI applications

Thesis

Open access

Enhanced FPGA based deep learning accelerator architecture for optimized performance in real time AI applications

Manoj Kamal Karala and Sai Vasanth Bijibilla

California State University, Sacramento

Master of Science (MS), California State University, Sacramento

03/16/2026

Handle:

https://hdl.handle.net/20.500.12741/rep:13962

Abstract

This project presents the design and FPGA implementation of a Hybrid Deep Learning Accelerator Unit (Hybrid DLAU) optimized for high performance and energy efficiency in real-time AI applications. The proposed architecture overcomes the limitations of conventional accelerators by reducing computation delay, power consumption, and hardware utilization through a hybrid arithmetic design. The Hybrid DLAU integrates Carry-Save Adders (CSA) and a Wallace-tree reduction network to minimize carry-propagation delay and enhance throughput. It comprises the three pipelined modules called Tiled Matrix Multiplication Unit (TMMU), Partial Sum Accumulation Unit (PSAU), and Activation Function Acceleration Unit (AFAU) for supporting multiple nonlinear activation functions such as ReLU, Linear, Hard-Sigmoid, and Hard-Tanh using fixed-point approximations. FPGAs were chosen over other devices like ASICs, CPUs, and GPUs for their balance of flexibility, parallelism, and low power, also enabling rapid prototyping and reconfiguration for evolving neural network models without the high cost and inflexibility of ASIC fabrication. The architecture was modeled in a Verilog HDL and synthesized using Xilinx Vivado 2018 version on a Zynq-7000 FPGA. Experimental results show a 26.8% reduction in data-path delay, 49.9% lower power consumption, and over 60% fewer logic registers than the baseline DLAU while maintaining identical DSP usage. These results demonstrate that the proposed Hybrid DLAU provides a scalable, reconfigurable, and energy-efficient hardware platform for real-time deep-learning inference on FPGA systems.

Files and links (1)

pdf

BijibillaSaiVasanth_KaralaManojKamal_Fall20251.33 MBDownload View

TextProject Open Access

Metrics

1 Record Views

Details

Title: Enhanced FPGA based deep learning accelerator architecture for optimized performance in real time AI applications
Creators: Manoj Kamal Karala
Sai Vasanth Bijibilla
Contributors: Neal Frederick Levine (Advisor)
Preetham B Kumar (Committee Member) - California State University, Sacramento, Electrical Engineering
Academic Unit: Computer Engineering Program
Theses and Dissertations: Master of Science (MS); Computer Engineering; California State University, Sacramento; 12/04/2025; 2025
Publisher: California State University, Sacramento
Publication Details: 03/16/2026
Identifiers: 99258300370601671; https://hdl.handle.net/20.500.12741/rep:13962
Resource Type: Masters Project
Language: English
Number of pages: 85
Accessibility Statement: The accessibility of this document has been verified by Sacramento State University Library. For questions, please contact lib-accessibility@csus.edu.