Το work with title Main memory performance for realistic data access in FPGA systems: an experimental study by Argyriou Maria is licensed under Creative Commons Attribution 4.0 International
Bibliographic Citation
Maria Argyriou, "Main memory performance for realistic data access in FPGA systems: an experimental study", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2021
https://doi.org/10.26233/heallink.tuc.88903
Nowadays, computationally demanding applications, such as Convolutional Neural Networks (CNN), are mapped to hardware accelerators like Field Programmable Gate Arrays (FPGAs) due to customizable datapath with designer-tunable parallelism and pipelining. Memory is in many cases the limiting factor of every performance-bound application but its real performance is often overlooked. Most studies and benchmarks on memory subsystems focus on best-case scenarios for memory access times. Μemory access times and throughput are affected by such factors as the memory controller’s performance, buffering, the need (or lack of) a microprocessor for on-FPGA data transfer, and even the capabilities of Computer-Aided Design (CAD) tools and frameworks. This study, prompted by CNN applications on mid-range single- and multi-FPGA systems focuses on the experimental memory evaluation, aiming at providing the designer with realistic figures which can be used towards on-FPGA buffer sizing, computation-to-memory I/O estimation to avoid bottlenecks, and even pipeline strategy. Detailed experimental results have been obtained by memory access patterns which represent realistic scenarios, and these are presented and analyzed in this thesis. One of the conclusions from this work is that when random accesses are required, large numbers of such accesses performed together lead to better results vs. fewer, scattered accesses. The results of this work not only show a significant deviation from ideal transfer rates of the DDR memory channel (which can approach 20 GBytes/sec), but they are also substantially lower than the internal AXI port maximum bandwidth of 4.8 GBytes/sec, the degradation being due to internal to the FPGA data transfers and the DDR controller.