Institutional Repository
Technical University of Crete
EN  |  EL

Search

Browse

My Space

XSKETCH synopses for XML data graphs

Polyzotis, Neoklis, Garofalakis Minos

Full record


URI: http://purl.tuc.gr/dl/dias/AE9D14D7-1591-4766-B93B-42721B389108
Year 2006
Type of Item Peer-Reviewed Journal Publication
License
Details
Bibliographic Citation N. Polyzotis and M. Garofalakis, "XSKETCH synopses for XML data graphs", ACM Transactions on Database Systems, vol. 31, no. 3, pp. 1014-1063, Sept. 2006. doi: 10.1145/1166074.1166082 https://doi.org/10.1145/1166074.1166082
Appears in Collections

Summary

Effective support for XML query languages is becoming increasingly important with the emergenceof new applications that access large volumes of XML data. All existing proposals for queryingXML (e.g., XQuery) rely on a pattern-specification language that allows (1) path navigation andbranching through the label structure of the XML data graph, and (2) predicates on the values ofspecific path/branch nodes, in order to reach the desired data elements. Clearly, optimizing suchqueries requires approximating the result cardinality of the referenced paths and hence hinges onthe existence of concise synopsis structures that enable accurate compile-time selectivity estimatesfor complex path expressions over the base XML data. In this article, we introduce a novel approachto building and using statistical summaries of large XML data graphs for effective path-expressionselectivity estimation. Our proposed graph-synopsis model (termed XSKETCH) exploits localizedgraph stability and value-distribution summaries (e.g., histograms) to accurately approximate (inlimited space) the path and branching distribution, as well as the complex correlation patternsthat can exist between and across path structure and element values in the data graph. To the bestof our knowledge, ours is the first work to address this timely problem in the most general settingof graph-structured XML data with values, and complex (branching) path expressions.

Services

Statistics