Το work with title Fast approximate wavelet tracking on streams by Cormode, Graham, 1977-, Garofalakis Minos, Sacharidis Dimitris is licensed under Creative Commons Attribution 4.0 International
Bibliographic Citation
G. Cormode, M. Garofalakis and D. Sacharidis, "Fast approximate wavelet tracking on streams", in 10th International Conference on Advances in Database Technology, September 2006.
Recent years have seen growing interest in effective algorithms forsummarizing and querying massive, high-speed data streams. Randomized sketchsynopses provide accurate approximations for general-purpose summaries of thestreaming data distribution (e.g., wavelets). The focus of existing work has typicallybeen on minimizing space requirements of the maintained synopsis — however,to effectively support high-speed data-stream analysis, a crucial practicalrequirement is to also optimize: (1) the update time for incorporating a streamingdata element in the sketch, and (2) the query time for producing an approximatesummary (e.g., the top wavelet coefficients) from the sketch. Such timecosts must be small enough to cope with rapid stream-arrival rates and the realtimequerying requirements of typical streaming applications (e.g., ISP networkmonitoring). With cheap and plentiful memory, space is often only a secondaryconcern after query/update time costs.In this paper, we propose the first fast solution to the problem of tracking waveletrepresentations of one-dimensional and multi-dimensional data streams, based ona novel stream synopsis, the Group-Count Sketch (GCS). By imposing a hierarchicalstructure of groups over the data and applying the GCS, our algorithmscan quickly recover the most important wavelet coefficients with guaranteed accuracy.A tradeoff between query time and update time is established, by varyingthe hierarchical structure of groups, allowing the right balance to be found forspecific data stream. Experimental analysis confirms this tradeoff, and shows thatall our methods significantly outperform previously known methods in terms ofboth update time and query time, while maintaining a high level of accuracy.