Institutional Repository
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Design and implementation of an FPGA-based convolutional neural network accelerator

Pitsis Antonios-Georgios

Full record


URI: http://purl.tuc.gr/dl/dias/B72D4A43-A5A0-47DB-8A92-FEE788EF888A
Year 2018
Type of Item Diploma Work
License
Details
Bibliographic Citation Antonios-Georgios Pitsis, "Design and implementation of an FPGA-based convolutional neural network accelerator", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2018 https://doi.org/10.26233/heallink.tuc.79092
Appears in Collections
Relations with other Items

Summary

In recent years Convolutional Neural Networks (CNNs) have shown extremely growth due to their effectiveness at complex image recognition problems. They are currently adopted to solve an ever greater number of problems, ranging from speech recognition to image segmentation and classification. The continuing increasing amount of processing required by CNNs creates the field for hardware support methods. Moreover, CNN workloads have a streaming nature, well suited to reconfigurable hardware architectures such as FPGAs. The amount of research on the Machine Learning and especially on CNN (implemented on FPGA platforms) within the last 4 years demonstrates the tremendous industrial and academic interest. This study presents a CNN inference accelerator over FPGAs. The network we aim to accelerate was developed by Dr. Tsagatakis in the context of DEDALE project (Horizon 2020) for astrophysics subject. After carrying out Robustness Analysis computational workloads and memory accesses are analyzed, as well as compression methods and algorithmic optimizations to exploit FPGA parallelism. At the level of neurons, optimizations of the convolutional and fully connected layers are explained and compared. At the network level, approximate computing optimization methods are examined limited by not reducing the accuracy of the network. The platforms were used are ZCU102 and QFDB(a custom 4-FPGA platform developed at FORTH). The implemented accelerator was managed to achieve 20x latency speedup, 2.17x throughput speedup and 11.9x energy efficient over GPU NVIDIA-Quadro-K2200 in terms of EuroExa project.

Available Files

Services

Statistics