Ιδρυματικό Αποθετήριο
Πολυτεχνείο Κρήτης
EN  |  EL

Αναζήτηση

Πλοήγηση

Ο Χώρος μου

Join-distinct aggregate estimation over update streams

Ganguly, Sumit, Garofalakis Minos, Kumar Amit, Rastogi Rajeev

Απλή Εγγραφή


URIhttp://purl.tuc.gr/dl/dias/5325F237-4135-4A25-ABF2-A2CFBC74B7CF-
Αναγνωριστικόhttp://cse.iitk.ac.in/users/sganguly/jd.pdf-
Γλώσσαen-
Μέγεθος12 pagesen
ΤίτλοςJoin-distinct aggregate estimation over update streamsen
ΔημιουργόςGanguly, Sumiten
ΔημιουργόςGarofalakis Minosen
ΔημιουργόςΓαροφαλακης Μινωςel
ΔημιουργόςKumar Amiten
ΔημιουργόςRastogi Rajeeven
ΕκδότηςAssociation for Computing Machineryen
ΠερίληψηThere is growing interest in algorithms for processing and querying continuous data streams (i.e., data that is seen only once in a fixed order) with limited memory resources. Providing (perhaps approximate) answers to queries over such streams is a crucial requirement for many application environments; examples include large IP network installations where performance data from different parts of the network needs to be continuously collected and analyzed. The ability to estimate the number of distinct (sub)tuples in the result of a join operation correlating two data streams (i.e., the cardinality of a projection with duplicate elimination over a join) is an important requirement for several data-analysis scenarios. For instance, to enable real-time traffic analysis and load balancing, a network-monitoring application may need to estimate the number of distinct (source, destination) IP-address pairs occurring in the stream of IP packets observed by router R1, where the source address is also seen in packets routed through a different router R2. Earlier work has presented solutions to the individual problems of distinct counting and join-size estimation (without duplicate elimination) over streams. These solutions, however, are fundamentally different and extending or combining them to handle our more complex “Join-Distinct” estimation problem is far from obvious. In this paper, we propose the first space-efficient algorithmic solution to the general Join-Distinct estimation problem over continuous data streams (our techniques can actually handle general update streams comprising tuple deletions as well as insertions). Our estimators are probabilistic in nature and rely on novel algorithms for building and combining a new class of hash-based synopses (termed “JD sketches”) for individual update streams. We demonstrate that our algorithms can provide low error, high-confidence Join-Distinct estimates using only small space and small processing time per update. In fact, we present lower bounds showing that the space usage of our estimators is within small factors of the best possible for the Join-Distinct problem. Preliminary experimental results verify the effectiveness of our approach.en
ΤύποςΠλήρης Δημοσίευση σε Συνέδριοel
ΤύποςConference Full Paperen
Άδεια Χρήσηςhttp://creativecommons.org/licenses/by/4.0/en
Ημερομηνία2015-12-01-
Ημερομηνία Δημοσίευσης2005-
Θεματική ΚατηγορίαManagement systemsen
Θεματική ΚατηγορίαQuery-processing algorithmsen
Βιβλιογραφική ΑναφοράS. Ganguly, M. Garofalakis, A. Kumar and R. Rastogi, "Join-distinct aggregate estimation over update streams", in 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, June 2005. en

Υπηρεσίες

Στατιστικά