Content Summary | Recent work has demonstrated the effectiveness of the wavelet decomposition
in reducing large amounts of data to compact sets of
wavelet coefficients (termed “wavelet synopses”) that can be used
to provide fast and reasonably accurate approximate answers to
queries. A major criticism of such techniques is that unlike, for
example, random sampling, conventional wavelet synopses do not
provide informative error guarantees on the accuracy of individual
approximate answers. In fact, as this paper demonstrates, errors
can vary widely (without bound) and unpredictably, even for identical
queries on nearly-identical values in distinct parts of the data.
This lack of error guarantees severely limits the practicality of traditional
wavelets as an approximate query-processing tool, because
users have no idea of the quality of any particular approximate answer.
In this paper, we introduce Probabilistic Wavelet Synopses,
the first wavelet-based data reduction technique with guarantees on
the accuracy of individual approximate answers. Whereas earlier
approaches rely on deterministic thresholding for selecting a set
of “good” wavelet coefficients, our technique is based on a novel,
probabilistic thresholding scheme that assigns each coefficient a
probability of being retained based on its importance to the reconstruction
of individual data values, and then flips coins to select the
synopsis. We show how our scheme avoids the above pitfalls of
deterministic thresholding, providing highly-accurate answers for
individual data values in a data vector. We propose several novel
optimization algorithms for tuning our probabilistic thresholding
scheme to minimize desired error metrics. Experimental results on
real-world and synthetic data sets evaluate these algorithms, and
demonstrate the effectiveness of our probabilistic wavelet synopses
in providing fast, highly-accurate answers with error guarantees. | en |