Το work with title Probabilistic declarative information extraction by Wang Daisy Zhe, Michelakis Eirinaios, Franklin Michael J., Garofalakis Minos, Hellerstein, Joseph, 1952- is licensed under Creative Commons Attribution 4.0 International
Bibliographic Citation
D. Z. Wang, E. Michelakis, M. J. Franklin, M. Garofalakis and J. M. Hellerstein, "Probabilistic declarative information extraction", in 26th IEEE International Conference on Data Engineering, 2010.
Unstructured text represents a large fraction of theworld’s data. It often contains snippets of structured information(e.g., people’s names and zip codes). Information Extraction(IE) techniques identify such structured information in text. Inrecent years, database research has pursued IE on two fronts:declarative languages and systems for managing IE tasks, andprobabilistic databases for querying the output of IE. In thispaper, we make the first step to merge these two directions,without loss of statistical robustness, by implementing a state-ofthe-artstatistical IE model – Conditional Random Fields (CRF)– in the setting of a Probabilistic Database that treats statisticalmodels as first-class data objects. We show that the Viterbialgorithm for CRF inference can be specified declaratively inrecursive SQL. We also show the performance benefits relativeto a standalone open-source Viterbi implementation. This workopens up the optimization opportunities for queries involvingboth inference and relational operators over IE models.