Το work with title SPARTAN: A model-based semantic compression system for massive data tables by Babu Shivnath, Garofalakis Minos, Rastogi Rajeev is licensed under Creative Commons Attribution 4.0 International
Bibliographic Citation
S. Babu, M. Garofalakis and R. Rastogi, "SPARTAN: A model-based semantic compression system for massive data tables", in International Conference on Management of Data (SIGMOD 2001), May 2001, pp. 283-294.
While a variety of lossy compression schemes have been developed for certainforms of digital data (e.g., images, audio, video), the area of lossycompression techniques for arbitrary data tables has been left relatively unexplored.Nevertheless, such techniques are clearly motivated by the everincreasingdata collection rates of modern enterprises and the need for effective,guaranteed-quality approximate answers to queries over massiverelational data sets. In this paper, we propose SPARTAN , a system thattakes advantage of attribute semantics and data-mining models to performlossy compression of massive data tables. SPARTAN is based on thenovel idea of exploiting predictive data correlations and prescribed errortolerances for individual attributes to construct concise and accurate Classificationand Regression Tree (CaRT) models for entire columns of a table.More precisely, SPARTAN selects a certain subset of attributes forwhich no values are explicitly stored in the compressed table; instead, conciseCaRTs that predict these values (within the prescribed error bounds)are maintained. To restrict the huge search space and construction costof possible CaRT predictors, SPARTAN employs sophisticated learningtechniques and novel combinatorial optimization algorithms. Our experimentationwith several real-life data sets offers convincing evidence of theeffectiveness of SPARTAN ’s model-based approach – SPARTAN isable to consistently yield substantially better compression ratios than existingsemantic or syntactic compression tools (e.g., gzip) while utilizing onlysmall data samples for model inference.