Institutional Repository
Technical University of Crete
EN  |  EL



My Space

BAYESSTORE: Managing large, uncertain data repositories with probabilistic graphical models

Wang Daisy Zhe, Michelakis Eirinaios, Garofalakis Minos, Hellerstein, Joseph, 1952-

Full record

Year 2008
Type of Item Conference Full Paper
Bibliographic Citation D.Z. Wang, E. Michelakis, M. Garofalakis and J.M. Hellerstein, "BAYESSTORE: managing large, uncertain data repositories with probabilistic graphical models", in 34th International Conference on Very Large Data Bases, 2008.
Appears in Collections


Several real-world applications need to effectively manage and reason aboutlarge amounts of data that are inherently uncertain. For instance, pervasivecomputing applications must constantly reason about volumes of noisysensory readings for a variety of reasons, including motion prediction andhuman behavior modeling. Such probabilistic data analyses require sophisticatedmachine-learning tools that can effectively model the complexspatio/temporal correlation patterns present in uncertain sensory data. Unfortunately,to date, most existing approaches to probabilistic database systemshave relied on somewhat simplistic models of uncertainty that can beeasily mapped onto existing relational architectures: Probabilistic informationis typically associated with individual data tuples, with only limitedor no support for effectively capturing and reasoning about complex datacorrelations. In this paper, we introduce BAYESSTORE, a novel probabilisticdata management architecture built on the principle of handling statisticalmodels and probabilistic inference tools as first-class citizens of thedatabase system. Adopting a machine-learning view, BAYESSTORE employsconcise statistical relational models to effectively encode the correlationpatterns between uncertain data, and promotes probabilistic inferenceand statistical model manipulation as part of the standard DBMS operatorrepertoire to support efficient and sound query processing. We presentBAYESSTORE’s uncertainty model based on a novel, first-order statisticalmodel, and we redefine traditional query processing operators, to manipulatethe data and the probabilistic models of the database in an efficientmanner. Finally, we validate our approach, by demonstrating the value ofexploiting data correlations during query processing, and by evaluating a