Institutional Repository
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Holistic aggregates in a networked world: distributed tracking of approximate quantiles

Cormode, Graham, 1977-, Muthukrishnan, S, Garofalakis Minos, Rastogi Rajeev

Full record


URI: http://purl.tuc.gr/dl/dias/494394BF-9D81-47F8-AF0F-EAA0B9094476
Year 2005
Type of Item Conference Full Paper
License
Details
Bibliographic Citation G. Cormode, M. Garofalakis, S. Muthukrishnan and R. Rastogi, "Holistic aggregates in a networked world: distributed tracking of approximate quantiles", in ACM SIGMOD International Conference on Management of Data, June 2005, pp. 25-36.
Appears in Collections

Summary

While traditional database systems optimize for performance onone-shot queries, emerging large-scale monitoring applications requirecontinuous tracking of complex aggregates and data-distributionsummaries over collections of physically-distributed streams.Thus, effective solutions have to be simultaneously space efficient(at each remote site), communication efficient (across the underlyingcommunication network), and provide continuous, guaranteedqualityestimates. In this paper, we propose novel algorithmic solutionsfor the problem of continuously tracking complex holistic aggregatesin such a distributed-streams setting — our primary focusis on approximate quantile summaries, but our approach is morebroadly applicable and can handle other holistic-aggregate functions(e.g., “heavy-hitters” queries). We present the first knowndistributed-tracking schemes for maintaining accurate quantile estimateswith provable approximation guarantees, while simultaneouslyoptimizing the storage space at each remote site as well asthe communication cost across the network. In a nutshell, our algorithmsemploy a combination of local tracking at remote sites andsimple prediction models for local site behavior in order to producehighly communication- and space-efficient solutions. We performextensive experiments with real and synthetic data to explore thevarious tradeoffs and understand the role of prediction models inour schemes. The results clearly validate our approach, revealingsignificant savings over naive solutions as well as our analyticalworst-case guarantees.

Services

Statistics