Το work with title Sharing aggregate computation for distributed queries by Huebsch Ryan, Garofalakis Minos, Hellerstein, Joseph, 1952-, Stoica, Ion, 1965- is licensed under Creative Commons Attribution 4.0 International
Bibliographic Citation
R. Huebsch, M. Garofalakis, J. M. Hellerstein and I. Stoica, "Sharing aggregate computation for distributed queries", in ACM SIGMOD International Conference on Management of Data, 2007. doi: 10.1145/1247480.1247535
https://doi.org/10.1145/1247480.1247535
An emerging challenge in modern distributed querying is to effi-ciently process multiple continuous aggregation queries simultaneously.Processing each query independently may be infeasible,so multi-query optimizations are critical for sharing work acrossqueries. The challenge is to identify overlapping computations thatmay not be obvious in the queries themselves.In this paper, we reveal new opportunities for sharing work in thecontext of distributed aggregation queries that vary in their selectionpredicates. We identify settings in which a large set of q suchqueries can be answered by executing k ≪ q different queries.The k queries are revealed by analyzing a boolean matrix capturingthe connection between data and the queries that they satisfy,in a manner akin to familiar techniques like Gaussian elimination.Indeed, we identify a class of linear aggregate functions (includingSUM, COUNT and AVERAGE), and show that the sharing potentialfor such queries can be optimally recovered using standard matrixdecompositions from computational linear algebra. For someother typical aggregation functions (including MIN and MAX) wefind that optimal sharing maps to the NP-hard set basis problem.However, for those scenarios, we present a family of heuristic algorithmsand demonstrate that they perform well for moderate-sizedmatrices. We also present a dynamic distributed system architectureto exploit sharing opportunities, and experimentally evaluatethe benefits of our techniques via a novel, flexible random workloadgenerator we develop for this setting.