URI | http://purl.tuc.gr/dl/dias/AA9284C5-22A7-459D-BB9C-3DBF79DBEF5E | - |
Αναγνωριστικό | http://dl.acm.org/citation.cfm?doid=773153.773168 | - |
Γλώσσα | en | - |
Μέγεθος | 12 pages | en |
Τίτλος | Correlating XML data streams using tree-edit distance embeddings | en |
Δημιουργός | Garofalakis Minos | en |
Δημιουργός | Γαροφαλακης Μινως | el |
Δημιουργός | Kumar Amit | en |
Εκδότης | Association for Computing Machinery | en |
Περίληψη | We propose the first known solution to the problem of correlating, in small space, continuous streams of XML data through approximate (structure and content) matching, as defined by a general tree-edit distance metric. The key element of our solution is a novel algorithm for obliviously embedding tree-edit distance metrics into an L1 vector space while guaranteeing an upper bound of O(log2n log*n) on the distance distortion between any data trees with at most n nodes. We demonstrate how our embedding algorithm can be applied in conjunction with known random sketching techniques to: (1) build a compact synopsis of a massive, streaming XML data tree that can be used as a concise surrogate for the full tree in approximate tree-edit distance computations; and, (2) approximate the result of tree-edit-distance similarity joins over continuous XML document streams. To the best of our knowledge, these are the first algorithmic results on low-distortion embeddings for tree-edit distance metrics, and on correlating (e.g., through similarity joins) XML data in the streaming model. | en |
Τύπος | Δημοσίευση σε Συνέδριο | el |
Τύπος | Conference Publication | en |
Άδεια Χρήσης | http://creativecommons.org/licenses/by/4.0/ | en |
Ημερομηνία | 2015-12-01 | - |
Ημερομηνία Δημοσίευσης | 2003 | - |
Θεματική Κατηγορία | Database management | en |
Βιβλιογραφική Αναφορά | M. Garofalakis and A. Kumar, "Correlating XML data streams using tree-edit distance embeddings", in 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, June 2003, pp. 143-154. | en |