Institutional Repository
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Stream data processing in the cloud for real-time anomaly detection

Kostalia Elisavet-Elli

Full record


URI: http://purl.tuc.gr/dl/dias/6D9C3DBE-41AE-497A-8995-219FD655A956
Year 2019
Type of Item Diploma Work
License
Details
Bibliographic Citation Elisavet-Elli Kostalia, "Stream data processing in the cloud for real-time anomaly detection", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2019 https://doi.org/10.26233/heallink.tuc.82295
Appears in Collections

Summary

In recent years, data stream processing is becoming extremely popular because of the Big Data era and the increasing number of microservices and IoT devices being used, therefore, the development, deployment and management of distributed services are more important than ever. Data gets stored and analyzed in order to provide predictive and actionable results via frameworks, such as Apache Storm that allows distributed real-time computation of tons of data coming in extremely fast, from various sources. Non-functional requirements often demand a highly-available, high-throughput, fault-tolerant and massively scalable solution. In this context, Apache Kafka is used as a publish-subscribe messaging system that will serve as a broker between various data sources. In our implementation, all services above can reach their highest utility when deployed in containers. This way, the implementation takes advantage of the virtualization features of cloud computing. We ran several experiments based on a simulated (but realistic) use case scenario for detecting un-authorized system accesses (anomalies) in real time. The project runs a distributed cluster of Apache Storm worker nodes (implemented as separate containers) that process incoming data, and uses decision tree classifiers to detect anomalies based on a given dataset. The experimental results demonstrates that Amanda application responds to the increasing resource demands of the application leading to significantly faster response times while more workers are deployed distributed compared to a non-distributed implementation where all service requests are handled by the maximum statically pre-allocated resources.

Available Files

Services

Statistics