URI | http://purl.tuc.gr/dl/dias/8167266C-005C-4C48-A9EB-DC0302CB7517 | - |
Identifier | http://www.cs.berkeley.edu/~istoica/papers/2007/ryan-sigmod07.pdf | - |
Identifier | https://doi.org/10.1145/1247480.1247535 | - |
Language | en | - |
Extent | 12 pages | en |
Title | Sharing aggregate computation for distributed queries | en |
Creator | Huebsch Ryan | en |
Creator | Garofalakis Minos | en |
Creator | Γαροφαλακης Μινως | el |
Creator | Hellerstein, Joseph, 1952- | en |
Creator | Stoica, Ion, 1965- | en |
Publisher | Association for Computing Machinery | en |
Content Summary | An emerging challenge in modern distributed querying is to effi-
ciently process multiple continuous aggregation queries simultaneously.
Processing each query independently may be infeasible,
so multi-query optimizations are critical for sharing work across
queries. The challenge is to identify overlapping computations that
may not be obvious in the queries themselves.
In this paper, we reveal new opportunities for sharing work in the
context of distributed aggregation queries that vary in their selection
predicates. We identify settings in which a large set of q such
queries can be answered by executing k ≪ q different queries.
The k queries are revealed by analyzing a boolean matrix capturing
the connection between data and the queries that they satisfy,
in a manner akin to familiar techniques like Gaussian elimination.
Indeed, we identify a class of linear aggregate functions (including
SUM, COUNT and AVERAGE), and show that the sharing potential
for such queries can be optimally recovered using standard matrix
decompositions from computational linear algebra. For some
other typical aggregation functions (including MIN and MAX) we
find that optimal sharing maps to the NP-hard set basis problem.
However, for those scenarios, we present a family of heuristic algorithms
and demonstrate that they perform well for moderate-sized
matrices. We also present a dynamic distributed system architecture
to exploit sharing opportunities, and experimentally evaluate
the benefits of our techniques via a novel, flexible random workload
generator we develop for this setting. | en |
Type of Item | Πλήρης Δημοσίευση σε Συνέδριο | el |
Type of Item | Conference Full Paper | en |
License | http://creativecommons.org/licenses/by/4.0/ | en |
Date of Item | 2015-11-30 | - |
Date of Publication | 2007 | - |
Subject | Algorithms | en |
Subject | Design | en |
Subject | Measurement | en |
Bibliographic Citation | R. Huebsch, M. Garofalakis, J. M. Hellerstein and I. Stoica, "Sharing aggregate computation for distributed queries", in ACM SIGMOD International Conference on Management of Data, 2007. doi: 10.1145/1247480.1247535 | en |