URI | http://purl.tuc.gr/dl/dias/AA9284C5-22A7-459D-BB9C-3DBF79DBEF5E | - |
Identifier | http://dl.acm.org/citation.cfm?doid=773153.773168 | - |
Language | en | - |
Extent | 12 pages | en |
Title | Correlating XML data streams using tree-edit distance embeddings | en |
Creator | Garofalakis Minos | en |
Creator | Γαροφαλακης Μινως | el |
Creator | Kumar Amit | en |
Publisher | Association for Computing Machinery | en |
Content Summary | We propose the first known solution to the problem of correlating, in small space, continuous streams of XML data through approximate (structure and content) matching, as defined by a general tree-edit distance metric. The key element of our solution is a novel algorithm for obliviously embedding tree-edit distance metrics into an L1 vector space while guaranteeing an upper bound of O(log2n log*n) on the distance distortion between any data trees with at most n nodes. We demonstrate how our embedding algorithm can be applied in conjunction with known random sketching techniques to: (1) build a compact synopsis of a massive, streaming XML data tree that can be used as a concise surrogate for the full tree in approximate tree-edit distance computations; and, (2) approximate the result of tree-edit-distance similarity joins over continuous XML document streams. To the best of our knowledge, these are the first algorithmic results on low-distortion embeddings for tree-edit distance metrics, and on correlating (e.g., through similarity joins) XML data in the streaming model. | en |
Type of Item | Δημοσίευση σε Συνέδριο | el |
Type of Item | Conference Publication | en |
License | http://creativecommons.org/licenses/by/4.0/ | en |
Date of Item | 2015-12-01 | - |
Date of Publication | 2003 | - |
Subject | Database management | en |
Bibliographic Citation | M. Garofalakis and A. Kumar, "Correlating XML data streams using tree-edit distance embeddings", in 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, June 2003, pp. 143-154. | en |