Logo image
Enhanced FPGA based deep learning accelerator architecture for optimized performance in real time AI applications
Thesis   Open access

Enhanced FPGA based deep learning accelerator architecture for optimized performance in real time AI applications

Manoj Kamal Karala and Sai Vasanth Bijibilla
California State University, Sacramento
Master of Science (MS), California State University, Sacramento
03/16/2026
Handle:
https://hdl.handle.net/20.500.12741/rep:13962

Abstract

This project presents the design and FPGA implementation of a Hybrid Deep Learning Accelerator Unit (Hybrid DLAU) optimized for high performance and energy efficiency in real-time AI applications. The proposed architecture overcomes the limitations of conventional accelerators by reducing computation delay, power consumption, and hardware utilization through a hybrid arithmetic design. The Hybrid DLAU integrates Carry-Save Adders (CSA) and a Wallace-tree reduction network to minimize carry-propagation delay and enhance throughput. It comprises the three pipelined modules called Tiled Matrix Multiplication Unit (TMMU), Partial Sum Accumulation Unit (PSAU), and Activation Function Acceleration Unit (AFAU) for supporting multiple nonlinear activation functions such as ReLU, Linear, Hard-Sigmoid, and Hard-Tanh using fixed-point approximations. FPGAs were chosen over other devices like ASICs, CPUs, and GPUs for their balance of flexibility, parallelism, and low power, also enabling rapid prototyping and reconfiguration for evolving neural network models without the high cost and inflexibility of ASIC fabrication. The architecture was modeled in a Verilog HDL and synthesized using Xilinx Vivado 2018 version on a Zynq-7000 FPGA. Experimental results show a 26.8% reduction in data-path delay, 49.9% lower power consumption, and over 60% fewer logic registers than the baseline DLAU while maintaining identical DSP usage. These results demonstrate that the proposed Hybrid DLAU provides a scalable, reconfigurable, and energy-efficient hardware platform for real-time deep-learning inference on FPGA systems.
pdf
BijibillaSaiVasanth_KaralaManojKamal_Fall20251.33 MBDownloadView
TextProject Open Access

Metrics

1 Record Views

Details

Logo image