Content Summary | Several studies have demonstrated the effectiveness of the wavelet decomposition as a tool for reducing large amounts of data down to compact wavelet synopses that can be used to obtain fast, accurate approximate query answers. Conventional wavelet synopses that greedily minimize the overall root-mean-squared (i.e., L2-norm) error in the data approximation can suffer from important problems, including severe bias and wide variance in the quality of the data reconstruction, and lack of nontrivial guarantees for individual approximate answers. Thus, probabilistic thresholding schemes have been recently proposed as a means of building wavelet synopses that try to probabilistically control maximum approximation-error metrics (e.g., maximum relative error).A key open problem is whether it is possible to design efficient deterministic wavelet-thresholding algorithms for minimizing general, non-L2 error metrics that are relevant to approximate query processing systems, such as maximum relative or maximum absolute error. Obviously, such algorithms can guarantee better maximum-error wavelet synopses and avoid the pitfalls of probabilistic techniques (e.g., “bad” coin-flip sequences) leading to poor solutions; in addition, they can be used to directly optimize the synopsis construction process for other useful error metrics, such as the mean relative error in data-value reconstruction. In this article, we propose novel, computationally efficient schemes for deterministic wavelet thresholding with the objective of optimizing general approximation-error metrics. We first consider the problem of constructing wavelet synopses optimized for maximum error, and introduce an optimal low polynomial-time algorithm for one-dimensional wavelet thresholding---our algorithm is based on a new Dynamic-Programming (DP) formulation, and can be employed to minimize the maximum relative or absolute error in the data reconstruction. Unfortunately, directly extending our one-dimensional DP algorithm to multidimensional wavelets results in a super-exponential increase in time complexity with the data dimensionality. Thus, we also introduce novel, polynomial-time approximation schemes (with tunable approximation guarantees) for deterministic wavelet thresholding in multiple dimensions. We then demonstrate how our optimal and approximate thresholding algorithms for maximum error can be extended to handle a broad, natural class of distributive error metrics, which includes several important error measures, such as mean weighted relative error and weighted Lp-norm error. Experimental results on real-world and synthetic data sets evaluate our novel optimization algorithms, and demonstrate their effectiveness against earlier wavelet-thresholding schemes. | en |