Georgios Skoulas, "Collection and analysis of datasets for AI-Driven networking algorithms", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2025
https://doi.org/10.26233/heallink.tuc.102387
The integration of artificial intelligence (AI) into networking systems has ushered in a new era of efficiency, scalability, and intelligence in managing modern communication infrastructures. However, the effectiveness of AI-driven networking algorithms is intrinsically tied to the quality and relevance of the datasets used for their development and evaluation. This thesis focuses on the collection, preprocessing, and analysis of datasets derived from diverse domains, including cloud computing, 4G networks, and online platforms like YouTube, to enable the design of advanced AI-driven algorithms. We present a comprehensive study of four key datasets: Helios and Philly, representing GPU-based cloud computing workloads, a 4G LTE dataset capturing cellular network performance under varying mobility conditions, and a YouTube dataset encompassing user engagement metrics. Each dataset is meticulously preprocessed and analyzed to address challenges such as non-stationarity, heavy-tailed distributions, and missing data. Time-series analysis techniques, including rolling mean, autocorrelation function (ACF), and the Augmented Dickey-Fuller (ADF) test, are applied to uncover statistical properties and enhance data suitability for predictive modeling. This work also demonstrates the practical applications of these datasets by developing predictive models using techniques such as ARIMA and neural networks. The models are evaluated for their ability to forecast key performance metrics, optimize resource allocation, and enhance the reliability of networking systems. Additionally, insights from the analysis inform strategies for improving system performance and developing error-resilient scheduling policies in GPU clusters and cellular networks. The findings of this thesis underscore the critical role of robust datasets in advancing AI-driven networking algorithms. By addressing the challenges of data collection and preprocessing and showcasing their potential in real-world scenarios, this work contributes to the foundation for future innovations in intelligent networking systems.