SPARTAN: A model-based semantic compression system for massive data tables

Babu Shivnath, Garofalakis Minos, Rastogi Rajeev

Full record

URI:

http://purl.tuc.gr/dl/dias/5507F55A-78BD-4114-A824-3CB697C32ED1

Year

2001

Type of Item

Conference Full Paper

License

Details

Bibliographic Citation

S. Babu, M. Garofalakis and R. Rastogi, "SPARTAN: A model-based semantic compression system for massive data tables", in International Conference on Management of Data (SIGMOD 2001), May 2001, pp. 283-294.

Appears in Collections

Conference Publications in Community School of Electrical and Computer Engineering

Conference Publications in Community Software Technology and Network Applications Laboratory

Summary

While a variety of lossy compression schemes have been developed for certainforms of digital data (e.g., images, audio, video), the area of lossycompression techniques for arbitrary data tables has been left relatively unexplored.Nevertheless, such techniques are clearly motivated by the everincreasingdata collection rates of modern enterprises and the need for effective,guaranteed-quality approximate answers to queries over massiverelational data sets. In this paper, we propose SPARTAN , a system thattakes advantage of attribute semantics and data-mining models to performlossy compression of massive data tables. SPARTAN is based on thenovel idea of exploiting predictive data correlations and prescribed errortolerances for individual attributes to construct concise and accurate Classificationand Regression Tree (CaRT) models for entire columns of a table.More precisely, SPARTAN selects a certain subset of attributes forwhich no values are explicitly stored in the compressed table; instead, conciseCaRTs that predict these values (within the prescribed error bounds)are maintained. To restrict the huge search space and construction costof possible CaRT predictors, SPARTAN employs sophisticated learningtechniques and novel combinatorial optimization algorithms. Our experimentationwith several real-life data sets offers convincing evidence of theeffectiveness of SPARTAN ’s model-based approach – SPARTAN isable to consistently yield substantially better compression ratios than existingsemantic or syntactic compression tools (e.g., gzip) while utilizing onlysmall data samples for model inference.

Search

Browse

My Space

SPARTAN: A model-based semantic compression system for massive data tables

Babu Shivnath, Garofalakis Minos, Rastogi Rajeev

Summary

Services

Export

Share

Statistics

Metadata & Content in a METS Package:

Metadata in Format: