Institutional Repository
Technical University of Crete
EN  |  EL



My Space

A caching platform for large scale data-intensive distributed applications

Kafritsas Nikolaos

Simple record

Extent1 MBen
TitleA caching platform for large scale data-intensive distributed applicationsen
CreatorKafritsas Nikolaosen
CreatorΚαφριτσας Νικολαοςel
Contributor [Thesis Supervisor]Garofalakis Minosen
Contributor [Thesis Supervisor]Γαροφαλακης Μινωςel
Contributor [Committee Member]Deligiannakis Antoniosen
Contributor [Committee Member]Δεληγιαννακης Αντωνιοςel
Contributor [Committee Member]Samoladas Vasilisen
Contributor [Committee Member]Σαμολαδας Βασιληςel
PublisherΠολυτεχνείο Κρήτηςel
PublisherTechnical University of Creteen
Academic UnitTechnical University of Crete::School of Electrical and Computer Engineeringen
Academic UnitΠολυτεχνείο Κρήτης::Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστώνel
Content SummaryIn the last decade, data processing systems started using main memory as much as possible, in order to speed up computations and boost performance. Towards this direction, many breakthroughs were created in the stream processing systems, which must meet rigorous demands and achieve sub-second latency along with high throughput. These advancements were feasible due to the availability of large amounts of DRAM at a plummeting cost and the rapid evolution of in-memory databases. However, in the Big Data era, maintaining such a huge amount of data in memory is impossible. On the other hand, the use of disk-based databases to remedy the situation is prohibitively expensive in terms of disk latencies. The ideal scenario would be to have the high access speed of memory, with the large capacity and low price of disk. This hinges on the ability to effectively utilize both the main memory and disk. Consequently, developing a solution which somehow combines the benefits of both worlds is highly desirable. This diploma thesis tackles the aforementioned problem by proposing an alternative architecture. More specifically, hot data are stored in memory, while cold data are moved to disk in a transactionally-safe manner as the database grows in size. Because data initially reside in memory, this architecture reverses the traditional storage hierarchy of disk-based systems. The disk is treated as an extended storage for evicted elements/cold data, not the primary host for the whole data. Based on this architecture, a multi-layered platform is presented which is highly scalable and can work in a distributed manner. The memory layer acts as a cache with configurable capacity and provides several eviction policies, the most important being a variation of the traditional LFU eviction policy. In particular, data regarded as cold could return back to memory if it becomes hot again, a case that occurs when the distribution of data changes in online processing. Thanks to this feature and the sub-second latency that is achieved, the platform can also perform efficiently in a streaming environment and can be used as a stateful memory component in a real-time architecture. The disk layer is flexible and elastic, meaning that users can use the database of their choice as a disk-based storage for cold data. Finally, the platform is tested in different scenarios under heavy load, and the benchmarks showed that it can perform extremely well and achieve throughput in the order of thousands of elements per second.en
Type of ItemΔιπλωματική Εργασίαel
Type of ItemDiploma Worken
Date of Item2019-09-10-
Date of Publication2019-
SubjectBig dataen
SubjectReal time processingen
SubjectStreaming systemen
Bibliographic CitationNikolaos Kafritsas, "A caching platform for large scale data-intensive distributed applications", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2019en

Available Files