Performance Metrics

There are three performance metrics for Bloom filters that can be traded off: computation or execution time (corresponds to the number k of hash functions), size of filter (corresponds to the number m of bits), and probability of error (corresponds to the false positive rate

f = (1 − p)k )

The Bloom filter (BF) introduces an error tolerance to enhance lookup performance and space efficiency. The Bloom filter either returns true or false. Thus, the result of Bloom filter is fallen under any one of the following classes: true positive, false positive, true negative, and false negative. Maximum number the Bloom filter contains false positive. The false positive as well as false negative causes overhead to a system. The Bloom filter implements an array to store the information of an element. The false positive is defined as follows: if the Bloom filter returns true when holds element. Similarly, false negative is also defined as follows: the Bloom filter returns false when holds element. Thus, the Bloom filter belongs to the probabilistic data structure.

Bloom filter size and number of Hash function

We understand that if the size of the bloom filter is too small, soon enough all of the bit fields will turn into ‘1’ and then our bloom filter will return ‘false positive’ for every inputted value. So, the size of the bloom filter is a very vital or important decision to be made. A larger filter consists of less false positives, and a smaller one more.

So, we can conclude that size of bloom filter is totally based on the ‘false positive error rate’.

Another important parameter is to determine amount of hash functions we will use. The more hash functions we implement, the slower the bloom filter will be, and the quicker it fills up. If we have too few, however, we may suffer due to many false positives.

We can compute the false positive error rate, p, based on the size of the filter, m, the number of hash functions, k, and the number of elements inserted, n, with the formula


We would actually mostly need to determine what our m and k would be. So, if we set or fix an error tolerance value p and the number of elements n by ourselves we can implement the following formulas to calculate these parameters

m=(-n ln p)/(ln 2)2

k=(m/n)*(ln 2)