Distributed monitoring of streaming data over neural training pipelines

Klioumis Georgios

Full record

URI:

http://purl.tuc.gr/dl/dias/14A047C4-DAEB-4F96-B452-47BA5D21B9AC

Year

2025

Type of Item

Master Thesis

License

Details

Bibliographic Citation

Georgios Klioumis, "Distributed monitoring of streaming data over neural training pipelines", Master Thesis, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2025 https://doi.org/10.26233/heallink.tuc.105137

Appears in Collections

Master Theses in Community School of Electrical and Computer Engineering

Summary

Our goal is to propose a big data workflow model by introducing 3 novel contributions in the field of distributed data synopses, stream processing and learning. We start by introducing the Reversible Random Hyperplane Projection (RRHP) Locality Sensitive Hashing scheme, a novel, lightweight, reversible data synopsis that can be used to compress data streams in resource constrained environments, such as Wireless Sensor Networks (WSNs). Therefore, RRHP is an efficient way of gathering and transmitting data from the edge. We showcase real-world experiments proving that RRHP can achieve similar or better performance metrics compared to other lightweight data synopses mechanisms, that can be deployed on a WSN setting. Moreover, we show that RRHP can prolong the life of sensors on the field by reducing their energy consumption by up to 10 times.Next, we present EVENFLOW, a novel toolkit of synchronization protocols for data-parallel training of neural networks using the Parameter Server paradigm, that achieves both timely and accurate global model updates in streaming settings. Our experimental evaluation shows that EVENFLOW combines the virtues of both the vanilla (synchronous, asynchronous) protocols, offering the rapid training times of asynchronous, with mostly equal or even improved accuracy compared to synchronous. Therefore, EVENFLOW enables us to train on the data that arrive from the edge, in a distributed manner.Finally, we present the Distribuito SuBiTO framework, a version of the original SuBiTO framework, that performs sampling, training and inference in a distributed manner, all while constantly optimising the neural learning strategy. Distribuito SuBiTO retains the original operation of SuBiTO that automatically and continuously learns as new data stream in and fine tunes each part of data processing and learning, adapting these parameters on-the-fly. We couple our contributions with extensive experimental evaluation, testing every re-engineered part of the Distribuito SuBiTO framework and proving the platform's efficacy in handling large volume streams in an efficient real-time character, all while retaining the original functionality of SuBiTO. Therefore, Distribuito SuBiTO, enables us to perform data analytics tasks on big data streams arriving from the edge, in a highly adaptable, real-time and distributed manner.

Αναζήτηση

Πλοήγηση

Ο Χώρος μου

Distributed monitoring of streaming data over neural training pipelines

Klioumis Georgios

Summary

Available Files

Services

Export

Share

Statistics

Metadata & Content in a METS Package:

Metadata in Format: