Reconfigurable logic based acceleration of convolutional neural network training

Flengas Georgios

Full record

URI:

http://purl.tuc.gr/dl/dias/05A0B1B4-517D-4C54-B42C-E1E04D52AB15

Year

2024

Type of Item

Diploma Work

License

Details

Bibliographic Citation

Georgios Flengas, "Reconfigurable logic based acceleration of convolutional neural network training", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2024 https://doi.org/10.26233/heallink.tuc.99113

Appears in Collections

Diploma Works in Community School of Electrical and Computer Engineering

Diploma Works in Community Microprocessor and Hardware Laboratory

Summary

In the rapidly evolving landscape of artificial intelligence and machine learning, the intricate nature of neural network architectures, combined with exponential data growth, has intensified the need for advanced computational training. Traditional CPUs and GPUs struggle to meet these demands, prompting exploration into the untapped potential of FPGA-based acceleration. This research introduces an innovative FPGA-tailored hardware architecture for training Convolutional Neural Networks (CNNs), prioritizing optimal accuracy, energy efficiency, and speedup over conventional CPU and GPU systems. Building on prior research, we employ General Matrix Multiply (GEMM) and Image to Column(im2col) implementations, coupled with batch level parallelism. The workload distribution between the CPU and FPGA is intricately balanced, ensuring efficient collaboration, while multiple operations are synergistically combined to streamline computation time and reduce complexity. The integration of state-of-the-art machine learning algorithms with advanced FPGA design tools, including Vitis High-Level Synthesis (HLS), yields tailored IP blocks for each stage of the neural network training process. Our Proposed Platform achieves a notable throughput of 374.32 images per second, surpassing the CPU rate of 258.7 images per second but falls behind GPU with a throughput of 1333.3 images per second, while operating at a significantly lower power consumption of 4.16 Watts (0.011 Joules per image). This positions the Proposed Platform as a leading candidate for energy-efficient neural network training, showcasing a 16.55X energy efficiency gain over CPUs and a 7.75X over GPUs.

Search

Browse

My Space

Reconfigurable logic based acceleration of convolutional neural network training

Flengas Georgios

Summary

Available Files

Services

Export

Share

Statistics

Metadata & Content in a METS Package:

Metadata in Format: