URI | http://purl.tuc.gr/dl/dias/12DCEC49-3FB5-4CFB-9587-A761D5EFB408 | - |
Αναγνωριστικό | http://dl.acm.org/citation.cfm?id=1061326&dl=ACM&coll=DL&CFID=723856831&CFTOKEN=21128548 | - |
Αναγνωριστικό | https://doi.org/10.1145/1061318.1061326 | - |
Γλώσσα | en | - |
Μέγεθος | 54 pages | en |
Τίτλος | XML stream processing using tree-edit distance embeddings | en |
Δημιουργός | Garofalakis Minos | en |
Δημιουργός | Γαροφαλακης Μινως | el |
Δημιουργός | Kumar Amit | en |
Εκδότης | Association for Computing Machinery | en |
Περίληψη | We propose the first known solution to the problem of correlating, in small space, continuous streams of XML data through approximate (structure and content) matching, as defined by a general tree-edit distance metric. The key element of our solution is a novel algorithm for obliviously embedding tree-edit distance metrics into an L1 vector space while guaranteeing a (worst-case) upper bound of O(log2n log*n) on the distance distortion between any data trees with at most n nodes. We demonstrate how our embedding algorithm can be applied in conjunction with known random sketching techniques to (1) build a compact synopsis of a massive, streaming XML data tree that can be used as a concise surrogate for the full tree in approximate tree-edit distance computations; and (2) approximate the result of tree-edit-distance similarity joins over continuous XML document streams. Experimental results from an empirical study with both synthetic and real-life XML data trees validate our approach, demonstrating that the average-case behavior of our embedding techniques is much better than what would be predicted from our theoretical worst-case distortion bounds. To the best of our knowledge, these are the first algorithmic results on low-distortion embeddings for tree-edit distance metrics, and on correlating (e.g., through similarity joins) XML data in the streaming model. | en |
Τύπος | Peer-Reviewed Journal Publication | en |
Τύπος | Δημοσίευση σε Περιοδικό με Κριτές | el |
Άδεια Χρήσης | http://creativecommons.org/licenses/by/4.0/ | en |
Ημερομηνία | 2015-10-29 | - |
Ημερομηνία Δημοσίευσης | 2005 | - |
Θεματική Κατηγορία | XML | en |
Βιβλιογραφική Αναφορά | M. Garofalakis and A. Kumar, "XML stream processing using tree-edit distance embeddings", ACM Trans. Dat. Syst., vol. 30, no. 1, pp. 279-332, Mar. 2005. doi:10.1145/1061318.1061326 | en |