Sofia-Maria Nikolakaki, "Real-time stream data processing with FPGA-based SuperComputer", Master Thesis, School of Electronic and Computer Engineering, Technical University of Crete, Chania, Greece, 2015
https://doi.org/10.26233/heallink.tuc.26973
It is a foregone conclusion that contemporary applications are bounded by massive computational demands. The semiconductor industry has announced that physical constraints restrict the community from surpassing the currently upper frequency limit of modern processors, thus leading to the emergence of multi-core platforms. This thesis explores the recently emergent paradigm of the Maxeler multi-FPGA platform for dataflow computing to efficiently map computationally intensive algorithms on modern hardware. We tackle two challenging problems within this framework, the first being classificationby focusing on the kernel computation of the broadly used Support Vector Machines (SVM) classifier, and the second being time-series analysis by focusing on the calculation of the Mutual Information (MI) value between two time-series. Prior art on modern hardware has indicated the parallelism opportunities offered by the SVM method, but mainly for low-dimensional datasets, while no work has contemplated the performance of the algorithm on dataflow processors. Moreover, the problem of MI computation between two time-series on special-purpose platforms has been addressed by the research community for low-precision arithmetic applications, and again the performance of the specific method has not been evaluated on the emerging dataflow platforms. This is thefirst work to extensively study the pros and cons of using the Maxeler platform, by identifying the most essential dataflow elements and describing how they can be efficiently utilized. Thus, it can be employed as an independent point of reference for similar future endeavors. In terms of results, while the SVM kernel computation reached the same performance as the reference software for high-dimensional data, the know-how acquired during this process was leveraged towards the design of the MI FPGA-based architecture that yielded 9.4x speedup using two parallel cores and 32-precision arithmetic.