Alexandros Nikitopoulos, "Algorithms for massive data problems: streaming sketching sampling", Master Thesis, School of Production Engineering and Management, Technical University of Crete, Hellenic Army Academy, Chania, Greece, 2021
https://doi.org/10.26233/heallink.tuc.89192
In recent years, advances in hardware technology have facilitated continuous data collection. Simple everyday transactions, such as using a credit card, telephone or web browsing, lead to automated data storage. Similarly, advances in information technology have led to the creation of large data streams in IP networks. In many cases, these large volumes of data can be exported for interesting information in a wide variety of applications. The huge volume of underlying data leads to a number of computational challenges as well as challenges related to the extraction of this data. Mass data sampling deals with massive data problems where the input data (a graph, a matrix or some other object) is too large to be stored in random access memory. One model for such problems is the streaming model, where the data is only visible once. In the flow model, the natural technique for dealing with bulk data is sampling. In the present work, a number of algorithms related to problems of mass data management in flows, such as sampling, export, classification and processing, are studied in the literature, focusing on Streaming Sketching.