URI | http://purl.tuc.gr/dl/dias/9AC9C1B2-435D-4E49-971E-4D808FF2D7C6 | - |
Αναγνωριστικό | http://db.cs.berkeley.edu/papers/vldb10-qpie.pdf | - |
Αναγνωριστικό | https://doi.org/10.14778/1920841.1920974 | - |
Γλώσσα | en | - |
Μέγεθος | 11 pages | el |
Τίτλος | Querying probabilistic information extraction | en |
Δημιουργός | Wang Daisy Zhe | en |
Δημιουργός | Franklin Michael J. | en |
Δημιουργός | Garofalakis Minos | en |
Δημιουργός | Γαροφαλακης Μινως | el |
Δημιουργός | Hellerstein, Joseph, 1952- | en |
Εκδότης | Association for Computing Machinery | en |
Περίληψη | Recently, there has been increasing interest in extending relational
query processing to include data obtained from unstructured sources.
A common approach is to use stand-alone Information Extraction
(IE) techniques to identify and label entities within blocks of text;
the resulting entities are then imported into a standard database and
processed using relational queries. This two-part approach, however,
suffers from two main drawbacks. First, IE is inherently probabilistic,
but traditional query processing does not properly handle
probabilistic data, resulting in reduced answer quality. Second,
performance inefficiencies arise due to the separation of IE from
query processing. In this paper, we address these two problems by
building on an in-database implementation of a leading IE model—
Conditional Random Fields using the Viterbi inference algorithm.
We develop two different query approaches on top of this implementation.
The first uses deterministic queries over maximumlikelihood
extractions, with optimizations to push the relational operators
into the Viterbi algorithm. The second extends the Viterbi
algorithm to produce a set of possible extraction “worlds”, from
which we compute top-k probabilistic query answers. We describe
these approaches and explore the trade-offs of efficiency and effectiveness
between them using two datasets | en |
Τύπος | Πλήρης Δημοσίευση σε Συνέδριο | el |
Τύπος | Conference Full Paper | en |
Άδεια Χρήσης | http://creativecommons.org/licenses/by/4.0/ | en |
Ημερομηνία | 2015-11-30 | - |
Ημερομηνία Δημοσίευσης | 2010 | - |
Θεματική Κατηγορία | Database management | en |
Βιβλιογραφική Αναφορά | D. Z. Wang, M. J. Franklin, M. Garofalakis and J. M. Hellerstein, "Querying probabilistic information extraction", in 36th International Conference on Very Large Data Bases, 2010. | en |