Institutional Repository
Technical University of Crete
EN  |  EL



My Space

Tree pattern aggregation for scalable XML data dissemination

Chan Chee-Yong, Wenfei Fan, Felber Pascal, Garofalakis Minos, Rastogi Rajeev

Full record

Year 2002
Type of Item Conference Publication
Bibliographic Citation C-Y Chan, W. Fan, P. Felber, M. Garofalakis and R. Rastogi, "Tree pattern aggregation for scalable XML data dissemination", in 28th International Conference on Very Large Data Bases, August 2002, pp. 826-837.
Appears in Collections


With the rapid growth of XML-document traffic on theInternet, scalable content-based dissemination of XMLdocuments to a large, dynamic group of consumers hasbecome an important research challenge. To indicatethe type of content that they are interested in, dataconsumers typically specify their subscriptions usingsome XML pattern specification language (e.g., XPath).Given the large volume of subscribers, system scalabilityand efficiency mandate the ability to aggregate theset of consumer subscriptions to a smaller set of contentspecifications, so as to both reduce their storagespacerequirements as well as speed up the documentsubscriptionmatching process. In this paper, we providethe first systematic study of subscription aggregationwhere subscriptions are specified with tree patterns(an important subclass of XPath expressions). Themain challenge is to aggregate an input set of tree patternsinto a smaller set of generalized tree patterns suchthat: (1) a given space constraint on the total size of thesubscriptions is met, and (2) the loss in precision (dueto aggregation) during document filtering is minimized.We propose an efficient tree-pattern aggregation algorithmthat makes effective use of document-distributionstatistics in order to compute a precise set of aggregatetree patterns within the allotted space budget. As partof our solution, we also develop several novel algorithmsfor tree-pattern containment and minimization,as well as “least-upper-bound” computation for a set oftree patterns. These results are of interest in their ownright, and can prove useful in other domains, such asXML query optimization. Extensive results from a prototypeimplementation validate our approach.