URI | http://purl.tuc.gr/dl/dias/C8A10160-E770-48D3-8503-C457D55AADE8 | - |
Identifier | http://db.cs.berkeley.edu/papers/icde10-ie.pdf | - |
Language | en | - |
Extent | 4 pages | en |
Title | Probabilistic declarative information extraction | en |
Creator | Wang Daisy Zhe | en |
Creator | Michelakis Eirinaios | en |
Creator | Franklin Michael J. | en |
Creator | Garofalakis Minos | en |
Creator | Γαροφαλακης Μινως | el |
Creator | Hellerstein, Joseph, 1952- | en |
Content Summary | Unstructured text represents a large fraction of the
world’s data. It often contains snippets of structured information
(e.g., people’s names and zip codes). Information Extraction
(IE) techniques identify such structured information in text. In
recent years, database research has pursued IE on two fronts:
declarative languages and systems for managing IE tasks, and
probabilistic databases for querying the output of IE. In this
paper, we make the first step to merge these two directions,
without loss of statistical robustness, by implementing a state-ofthe-art
statistical IE model – Conditional Random Fields (CRF)
– in the setting of a Probabilistic Database that treats statistical
models as first-class data objects. We show that the Viterbi
algorithm for CRF inference can be specified declaratively in
recursive SQL. We also show the performance benefits relative
to a standalone open-source Viterbi implementation. This work
opens up the optimization opportunities for queries involving
both inference and relational operators over IE models. | en |
Type of Item | Πλήρης Δημοσίευση σε Συνέδριο | el |
Type of Item | Conference Full Paper | en |
License | http://creativecommons.org/licenses/by/4.0/ | en |
Date of Item | 2015-11-30 | - |
Date of Publication | 2010 | - |
Subject | Inforamtion systems | en |
Subject | Databases | en |
Bibliographic Citation | D. Z. Wang, E. Michelakis, M. J. Franklin, M. Garofalakis and J. M. Hellerstein, "Probabilistic declarative information extraction", in 26th IEEE International Conference on Data Engineering, 2010. | en |