Per-Flow Cardinality Estimation Based On Virtual LogLog Sketching
Flow cardinality estimation is the problem of estimating the number of distinct elements in a data flow, often with a stringent memory constraint. It has wide applications in network traffic measurement and in database systems. The virtual LogLog algorithm proposed recently by Xiao, Chen, Chen and Ling estimates the cardinalities of a large number of flows with a compact memory. The purpose of this thesis is to explore two new perspectives on the estimation process of this algorithm. Firstly, we propose and investigate a family of estimators that generalizes the original vHLL estimator and evaluate the performance of the vHLL estimator compared to other estimators in this family. Secondly, we propose an alternative solution to the estimation problem by deriving a maximum-likelihood estimator. Empirical evidence from both perspectives suggests the near-optimality of the vHLL estimator for per-flow estimation, analogous to the near-optimality of the HLL estimator for single-flow estimation.
READ FULL TEXT