Class CountMinSketch

java.lang.Object
org.apache.pekko.remote.artery.compress.CountMinSketch

public class CountMinSketch extends Object
INTERNAL API: Count-Min Sketch datastructure.

Not thread-safe.

An Improved Data Stream Summary: The Count-Min Sketch and its Applications https://web.archive.org/web/20060907232042/http://www.eecs.harvard.edu/~michaelm/CS222/countmin.pdf This implementation is mostly taken and adjusted from the Apache V2 licensed project `stream-lib`, located here: https://github.com/clearspring/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/frequency/CountMinSketch.java

  • Constructor Details

    • CountMinSketch

      public CountMinSketch(int depth, int width, int seed)
  • Method Details

    • relativeError

      public double relativeError()
      Referred to as epsilon in the whitepaper
    • confidence

      public double confidence()
    • addObjectAndEstimateCount

      public long addObjectAndEstimateCount(Object item, long count)
      Similar to add, however we reuse the fact that the hask buckets have to be calculated for add already, and a separate estimateCount operation would have to calculate them again, so we do it all in one go.
    • size

      public long size()
    • estimateCount

      public long estimateCount(Object item)
      The estimate is correct within 'epsilon' * (total item count), with probability confidence.
    • toString

      public String toString()
      Overrides:
      toString in class Object