Institutional Repository
Technical University of Crete
EN  |  EL



My Space

Probabilistic declarative information extraction

Wang Daisy Zhe, Michelakis Eirinaios, Franklin Michael J., Garofalakis Minos, Hellerstein, Joseph, 1952-

Full record

Year 2010
Type of Item Conference Full Paper
Bibliographic Citation D. Z. Wang, E. Michelakis, M. J. Franklin, M. Garofalakis and J. M. Hellerstein, "Probabilistic declarative information extraction", in 26th IEEE International Conference on Data Engineering, 2010.
Appears in Collections


Unstructured text represents a large fraction of theworld’s data. It often contains snippets of structured information(e.g., people’s names and zip codes). Information Extraction(IE) techniques identify such structured information in text. Inrecent years, database research has pursued IE on two fronts:declarative languages and systems for managing IE tasks, andprobabilistic databases for querying the output of IE. In thispaper, we make the first step to merge these two directions,without loss of statistical robustness, by implementing a state-ofthe-artstatistical IE model – Conditional Random Fields (CRF)– in the setting of a Probabilistic Database that treats statisticalmodels as first-class data objects. We show that the Viterbialgorithm for CRF inference can be specified declaratively inrecursive SQL. We also show the performance benefits relativeto a standalone open-source Viterbi implementation. This workopens up the optimization opportunities for queries involvingboth inference and relational operators over IE models.