Counting Bloom Filter

Data Structure AlgorithmsAnalysis of AlgorithmsAlgorithms

Basic Concept

A Counting Bloom filter is defined as a generalized data structure of Bloom filter that is implemented to test whether a count number of a given element is less than a given threshold when a sequence of elements is given. As a generalized form, of Bloom filter there is possibility of false positive matches, but no chance of false negatives – in other words, a query returns either "possibly higher or equal than the threshold" or "definitely less than the threshold".

Algorithm description

  • Most of the parameters, used under counting bloom filter, are defined same with Bloom filter, such as n, k. m is denoted as the number of counters in Counting Bloom filter, which is expansion of m bits in Bloom filter.
  • An empty Counting Bloom filter is set as a m counters, all initialized to 0.
  • Similar to Bloom filter, there must also be k various hash functions defined, each of which responsible to map or hash some set element to one of the m counter array positions, creating a uniform random distribution. It is also same that k is a constant, much less than m, which is proportional to the number of elements to be appended.
  • The main generalization of Bloom filter is appending an element. To append an element, insert it to each of the k hash functions to obtain k array positions and increment the counters 1 at all these positions.
  • To query for an element with a threshold θ (verify whether the count number of an element is less than θ), insert it to each of the k hash functions to obtain k counter positions.
  • If any of the counters at these positions is smaller than θ, the count number of element is definitely smaller than θ – if it were higher and equal, then all the corresponding counters would have been higher or equal to θ.
  • If all are higher or equal to θ, then either the count is really higher or equal to θ, or the counters have by chance been higher or equal to θ.
  • If all are higher or equal to θ even though the count is less than θ, this situation is defined as false positive. Like Bloom filter, this also should be minimized.
Updated on 03-Jan-2020 05:56:52