Institutional Repository
Technical University of Crete
EN  |  EL



My Space

Parallel sketch algorithms with Spark, Storm, Akka and Kafka-Streams

Petheriotis Aggelos

Full record

Year 2021
Type of Item Diploma Work
Bibliographic Citation Aggelos Petheriotis, "Parallel sketch algorithms with Spark, Storm, Akka and Kafka-Streams", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2021
Appears in Collections


Efficient processing over massive & realtime data sets has been more vital in the last few decades due to the growing volumes of data in a variety of applications. Typical algorithms are not able to handle the load and rate of these streams in an efficient and cost-effective way. On the contrary, summarised data structures with small memory footprint, also known as synopses, seem suitable for this kind of applications. Given the fact that we observe the unbounded real-time data stream only once, we need to make sure that the frameworks used to run the computations on, are utilised to the maximum. We evaluate four real time, distributed and fault-tolerant frameworks, Storm, Spark, Akka and Kafka Streams. Those frameworks have totally different architectures to the batch processing frameworks that have been established over the previous years. Furthermore, each one of these frameworks relies on different design principles and patterns which results in different characteristics that are analysed in this thesis.We evaluate CMS, ECMS & AMS algorithms on those four frameworks, in a multi node cluster topology with regards to performance. We observe the throughput, the number of processed items per second while simultaneously we observe that error guarantees are met in each case.

Available Files