Reliable runtime architecture for multiprocessor systems on chipReliable runtime architecture for multiprocessor systems on chip
Διπλωματική Εργασία
Diploma Work
2014-06-032014enMission critical applications rely on both hardware- and software-approaches for fault-tolerance. With the adoption of multiprocessor systems on chip (MPSoCs), processor fault-tolerance with modular redundancy has become a major issue, cost and performance wise. In this thesis first , we augment a task-parallel runtime system with support for transparent checkpoints of task data that may be written during task execution and seamlessly rerun failed tasks. The system can recover from transient errors during task execution within a single core by rerunning the failed task, as well as from permanent errors that disable a worker core by redistributing work among remaining cores. We have evaluated our implementation using six benchmarks and found that checkpointing incurs a performance overhead of 8\% on average, mainly due to the cost of memory copies, and only a negligible space overhead due to the recycling of checkpoint memory. Then, in order to protect the workers runtime system beyond the execution stage, we present ASGUARDIAN, a lightweight hardware mechanism based on a task-oriented model for general programmability. The ASGUARDIAN features both store-and-forward and cut-through capabilities to reliably transfer task descriptions and arguments between main memory and available worker cores. It also isolates the workers from accessing the main memory. A hardware prototype has been implemented on a Xilinx ML605 FPGA board using the widely-used ARM AMBA protocol. Introducing the ASGUARDIAN reliability features results in a 8% average overhead on hardware resources for a configuration with four Microblaze cores. The performance overhead for the store-and-forward and cut-through implementations were 2.3x and 1.2x respectively against an unprotected, shared memory system. When compared against an -unprotected- scratchpad-based memory system, the store-and-forward version showed an overhead of 1.7x, while the cut-through version showed a speedup of 6% on average.Προπτυχιακή Διατριβή που υποβλήθηκε στη σχολή ΗΜΜΥ του Πολ. Κρήτης για την πλήρωση προϋποθέσεων λήψης του Προπτυχιακού Διπλώματος Ειδίκευσης.http://creativecommons.org/licenses/by/4.0/Πολυτεχνείο Κρήτης::Σχολή Ηλεκτρονικών Μηχανικών και Μηχανικών ΥπολογιστώνSkarlatos_Dimitrios_Dip_2014.pdfChania [Greece]Library of TUC2014-06-03application/pdf493.7 kBfree
Skarlatos Dimitrios
Σκαρλατος Δημητριος
Pnevmatikatos Dionysios
Πνευματικατος Διονυσιος
Dollas Apostolos
Δολλας Αποστολος
Papaefstathiou Ioannis
Παπαευσταθιου Ιωαννης
Πρατικάκης Πολύβιος
Pratikakis Polyvios
Πολυτεχνείο Κρήτης
Technical University of Crete
Computing, Fault-tolerant
fault tolerant computing
computing fault tolerant
Computer reliability
computers reliability
computer reliability
CLR (Common Language Runtime)
common language runtime computer science
clr common language runtime
SOC design
Systems on chip
systems on a chip
soc design
systems on chip
Multicores
Task Based Programming Model
Field programmable logic arrays
FPGAs
field programmable gate arrays
field programmable logic arrays
fpgas