Institutional Repository
Technical University of Crete
EN  |  EL



My Space

Hybrid in-database inference for declarative information extraction

Wang Daisy Zhe, Franklin Michael J., Garofalakis Minos, Hellerstein, Joseph, 1952-, Wick Michael L.

Full record

Year 2011
Type of Item Conference Full Paper
Bibliographic Citation D. Z. Wang, M. J. Franklin, M. Garofalakis, J. M. Hellerstein and M. L. Wick, "Hybrid in-database inference for declarative information extraction", in ACM SIGMOD International Conference on Management of Data, 2011, pp. 517-528. doi: 10.1145/1989323.1989378
Appears in Collections


In the database community, work on information extraction (IE)has centered on two themes: how to effectively manage IE tasks,and how to manage the uncertainties that arise in the IE processin a scalable manner. Recent work has proposed a probabilisticdatabase (PDB) based declarative IE system that supports a leadingstatistical IE model, and an associated inference algorithm toanswer top-k-style queries over the probabilistic IE outcome. Still,the broader problem of effectively supporting general probabilisticinference inside a PDB-based declarative IE system remainsopen. In this paper, we explore the in-database implementations ofa wide variety of inference algorithms suited to IE, including twoMarkov chain Monte Carlo algorithms, Viterbi and sum-product algorithms.We describe the rules for choosing appropriate inferencealgorithms based on the model, the query and the text, consideringthe trade-off between accuracy and runtime. Based on these rules,we describe a hybrid approach to optimize the execution of a singleprobabilistic IE query to employ different inference algorithmsappropriate for different records. We show that our techniques canachieve up to 10-fold speedups compared to the non-hybrid solutionsproposed in the literature.