Vasileios Vittis " Online Ensemble Classification Algorithms of Big Data Streams at Apache Flink", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2021
https://doi.org/10.26233/heallink.tuc.90722
The growing need to make high-precision real-time decisions from dynamic data creates, the need to create modern systems capable of coping with diverse problems. Thus, the demands generated by the 4 Vs (volume, variety, velocity, and veracity) make the classical systems inefficient, thus creating space for systems that process data only once, without the need to store them. Ensemble Systems consist of individual subsystems with different characteristics, participating in the voting process in order to make the final decision. These subsystems are implemented by the state-of-the-art decision tree algorithm, Hoeffding Tree, due to its simple construction and the fewer assumptions it makes. It is important that such models take advantage of the available distributed environments in order to effectively speed up the learning process. In this dissertation, we create a distributed ensemble learning system for binary classification, consisting of Hoeffding Trees, creating a Random Forest. After observations about the response time and development space of the specific system, we implemented techniques that purposefully solve such problems. The results of the experimental process confirm the proposed methodology, when compared with corresponding techniques in the literature.