Institutional Repository
Technical University of Crete
EN  |  EL



My Space

Hardware accelerated basic blocks for power-aware intercommunication in HPC and embedded systems

Tampouratzis Nikolaos

Full record

Year 2014
Type of Item Master Thesis
Bibliographic Citation Nikolaos Tampouratzis, "Hardware accelerated basic blocks for power-aware intercommunication in HPC and embedded systems", Master Thesis, Σχολή Ηλεκτρονικών Μηχανικών και Μηχανικών Υπολογιστών, Πολυτεχνείο Κρήτης, Chania, Greece, 2014
Appears in Collections


In the past, a transition to the next fabrication process typically translated to more transistors and frequency and less power. The higher frequencies paired with innovations in computer architecture defined the semiconductor industry and research until the mid-90s. At that point architecture research saturated and industry resided to the technology scaling for performance gains. During the mid-00s frequency scaling saturated as well. Transistor count, the only resource which reliably kept scaling, along with intra-chip parallelism, which could leverage and extend the existing knowledge of old-days supercomputers, emerged as the only solution to keep Moore’s law live. In parallel systems, computing nodes cooperate to solve processing intensive problems. The communication between nodes is achieved through a variety of protocols. Traditionally, research has focused on optimizing these protocols and identifying the most suitable ones per system and application. Recently, an attempt to unify the primitive operations of the proposed intercommunication protocols has been realized through the Portals system. Portals offer a set of low level communication routines which can be composed to model complex protocols. However, Portals modularity comes at a performance cost, as communication protocols have been tuned and many of their timing critical parts have been decoupled from the main execution thread and in many cases accelerated as dedicated hardware. This work targets to close the performance gap between a generic and reusable intercommunication layer, Portals, and the several monolithic but highly tuned protocols. A software driven hardware accelerated system is suggested which resides on execution of actual software to highlight the critical parts of the communication routines. Accelerating the bottlenecks starts by modeling the hardware in untimed virtual prototypes and the software in a range of candidate embedded processors. A novel path from hardware prototypes to actual silicon allows rapid characterization of the accelerator in terms of power, performance and area. The suggested approach triggers a speedup from one order of magnitude in bottleneck components of Portals, while it is up to two orders of magnitude faster in both MPI and GA baseline implementations in a recent embedded processor.

Available Files