Institutional Repository
Technical University of Crete
EN  |  EL



My Space

Convolutional neural network optimizations using knowledge distillation for applications on hardware accelarators

Vailakis Apostolos-Nikolaos

Full record

Year 2022
Type of Item Diploma Work
Bibliographic Citation Apostolos-Nikolaos Vailakis, "Convolutional neural network optimizations using knowledge distillation for applications on hardware accelarators", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2022
Appears in Collections


Over the last decade, Convolutional Neural Networks have gained popularity amongst the scientific community, due to their versatility and performance in an all-growing domain of applications. Recent advances in computational power have enabled researchers to develop and train CNNs of exponential complexity, capable of solving problems previously considered unattainable. From facial recognition, to climate analysis and self-driving cars, CNNs constantly prove their value in the field of Machine Learning. Deploying however such models in real-world applications presents a significant challenge. While training complex CNNs requires high performance computing systems, inference may need to be performed at much tighter computational budgets. This has motivated the scientific community to develop both hardware architectures capable of efficiently executing CNNs, as well as methodologies for compressing networks. Hardware accelerators focused on edge applications opt for lower precision arithmetics (network quantization), which in turn simplifies the computational engines and greatly reduces the memory footprint of the models. This however can result in staggering accuracy losses. Recent advances in quantization-aware training techniques promise to mitigate these effects. Centered around DenseNet, a state-of-the-art CNN developed for image classification, this study performs an in-depth analysis of Quantization Aware Knowledge Distillation (QKD), a promising technique which combines quantization-aware training with knowledge distillation. Additionally, a comparison in inference performance between a CPU, a GPU and a Xilinx DPU is conducted, the latter of which employs 8-bit integer arithmetic. To achieve this, QKD is integrated in Xilinx's Vitis-AI workflow. Achieving a minimum of 9x latency speedup and 4x power efficiency compared to all other platforms using Xilinx's DPU, indicates that effective model compression and quantization, coupled with dedicated hardware architectures can produce highly capable systems for edge applications.

Available Files