Institutional Repository
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Correlating XML data streams using tree-edit distance embeddings

Garofalakis Minos, Kumar Amit

Simple record


URIhttp://purl.tuc.gr/dl/dias/AA9284C5-22A7-459D-BB9C-3DBF79DBEF5E-
Identifierhttp://dl.acm.org/citation.cfm?doid=773153.773168-
Languageen-
Extent12 pagesen
TitleCorrelating XML data streams using tree-edit distance embeddingsen
CreatorGarofalakis Minosen
CreatorΓαροφαλακης Μινωςel
CreatorKumar Amiten
PublisherAssociation for Computing Machineryen
Content SummaryWe propose the first known solution to the problem of correlating, in small space, continuous streams of XML data through approximate (structure and content) matching, as defined by a general tree-edit distance metric. The key element of our solution is a novel algorithm for obliviously embedding tree-edit distance metrics into an L1 vector space while guaranteeing an upper bound of O(log2n log*n) on the distance distortion between any data trees with at most n nodes. We demonstrate how our embedding algorithm can be applied in conjunction with known random sketching techniques to: (1) build a compact synopsis of a massive, streaming XML data tree that can be used as a concise surrogate for the full tree in approximate tree-edit distance computations; and, (2) approximate the result of tree-edit-distance similarity joins over continuous XML document streams. To the best of our knowledge, these are the first algorithmic results on low-distortion embeddings for tree-edit distance metrics, and on correlating (e.g., through similarity joins) XML data in the streaming model.en
Type of ItemΔημοσίευση σε Συνέδριοel
Type of ItemConference Publicationen
Licensehttp://creativecommons.org/licenses/by/4.0/en
Date of Item2015-12-01-
Date of Publication2003-
SubjectDatabase managementen
Bibliographic CitationM. Garofalakis and A. Kumar, "Correlating XML data streams using tree-edit distance embeddings", in 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, June 2003, pp. 143-154.en

Services

Statistics