Golang program to implement a Bloom filter


Bloom filters are a space-efficient data structure that find widespread use in various applications involving membership testing. In this article, we are going to explore how to create a Bloom filter in golanguage. Here we are going to write two different examples for this implementation, the first example includes the use hash functions from the FNV-1a algorithm, and in the second example we are going to use a a []bool array in order to determine element presence efficiently.

Explanation

A Bloom filter can be described as a probabilistic data structure that checks for the existence of an element in a set. It utilizes multiple hash functions and a bit array which makes it highly memory-efficient. However, on the greyer side, false positives can occur, meaning it might wrongly indicate an element's presence even when it's not present in the set.

Syntax

func NewBloomFilter(m int, k int) *BloomFilter

The syntax defines a function named NewBloomFilter which is defined to create a new instance of a Bloom filter with two parameters: the integer `m` for size and `k` for the number of hash functions, used in creating the Bloom filter instance.

func (bf *BloomFilter) Contains(item []byte) bool

The syntax defines a function named Contains, associated with a Bloom filter instance (`bf`), taking a byte array (`item`) as input to calculate hash values for the input using predefined hash functions and returning a corresponding boolean to indicate if the item is present.

Algorithm

  • Start by initializing a bit array of size 'm' with all bits set to 0.

  • Choose the optimum number of hash functions, each with a different seed.

  • To add an element, hash it with all the selected functions, and set the corresponding bits in the array to 1.

  • To check an element's membership, hash it using the same selected functions and check if all corresponding bits are set to 1.

  • If any of the bits are not set, the element is definitely not in the set; otherwise, it's probably in the set.

Example 1

In this example, we create a Bloom filter in golanguage through a []uint64 bit array to optimize memory efficiency. We use hash functions from the FNV-1a algorithm for distribution of elements across the array. The NewBloomFilter function initializes the filter, and the Add function is used to mark element presence via bit manipulation. The Contains function checks for element membership.

package main
import (
   "fmt"
   "hash/fnv"
)
type BloomFilter struct {
   bitArray []uint64
   hashFunc []func([]byte) uint32
}
func NewBloomFilter(m int, k int) *BloomFilter {
   filter := BloomFilter{
      bitArray: make([]uint64, (m+63)/64),
      hashFunc: make([]func([]byte) uint32, k),
   }
   for i := 0; i < k; i++ {
      seed := uint32(i)
      filter.hashFunc[i] = func(data []byte) uint32 {
         hash := fnv.New32a()
         hash.Write(data)
         return (hash.Sum32() + seed) % uint32(m)
      }
   }
   return &filter
}
func (bf *BloomFilter) Add(item []byte) {
   for _, hashFn := range bf.hashFunc {
      index := hashFn(item)
      bf.bitArray[index/64] |= 1 << (index % 64)
   }
}
func (bf *BloomFilter) Contains(item []byte) bool {
    for _, hashFn := range bf.hashFunc {
        index := hashFn(item)
        if bf.bitArray[index/64]&(1<<(index%64)) == 0 {
           return false
        }
    }
    return true
}
func main() {
   filter := NewBloomFilter(160, 3)
   fmt.Println("Adding 'apple'") 
   filter.Add([]byte("apple"))
   fmt.Println("\nFinding 'apple': ", filter.Contains([]byte("apple")))
   fmt.Println("Finding 'banana': ", filter.Contains([]byte("banana")))
}

Output

Adding 'apple'
 
Finding 'apple':  true
Finding 'banana':  false

Example 2

In this example, we are going to create a Bloom filter in golanguage using a []bool array in order to determine element presence efficiently. we also employ the FNV-1a hash functions to distribute elements across the array. The NewBloomFilter function is defined to initialize the filter, and the Add function sets corresponding array positions for added elements. Lastly, the Contains function checks membership using the array positions.

package main
import (
   "fmt"
   "hash/fnv"
)
type BloomFilter struct {
   bitArray []bool
   hashFunc []func([]byte) uint32
}
func NewBloomFilter(m int, k int) *BloomFilter {
    filter := BloomFilter{
        bitArray: make([]bool, m),
        hashFunc: make([]func([]byte) uint32, k),
    }
    for i := 0; i < k; i++ {
        seed := uint32(i)
        filter.hashFunc[i] = func(data []byte) uint32 {
            hash := fnv.New32a()
            hash.Write(data)
            return (hash.Sum32() + seed) % uint32(m)
        }
    }
    return &filter
}
func (bf *BloomFilter) Add(item []byte) {
    for _, hashFn := range bf.hashFunc {
        index := hashFn(item)
        bf.bitArray[index] = true
    }
}
func (bf *BloomFilter) Contains(item []byte) bool {
    for _, hashFn := range bf.hashFunc {
        index := hashFn(item)
        if !bf.bitArray[index] {
            return false
        }
    }
    return true
}
func main() {
    filter := NewBloomFilter(20, 3)
    fmt.Println("Adding 'apple'")
	filter.Add([]byte("apple"))
	fmt.Println("\nFinding 'apple': ", filter.Contains([]byte("apple")))
    fmt.Println("Finding 'banana': ", filter.Contains([]byte("banana")))
}

Output

Adding 'apple'
 
Finding 'apple':  true
Finding 'banana':  false

Real Life Implementation

  • Spam Email Filtering: Using Bloom filters allows speedy checking of whether an incoming email is spam or not spam. Some email systems maintain a Bloom filter that includes known spam keywords or phrases. When a new email arrives, the system initiates a verification by contacting the Bloom filter for determining the existence of certain keywords. If these keywords are found, the email is mostly categorized as spam.

  • DNA Sequence Matching: In bioinformatics, the use of bloom filters enable efficient search of large datasets for the purpose of DNA sequence matching. It’s mostly applied for the identification of probable matches for a certain DNA sequence within a large genomic database.

Conclusion

A Bloom filter is a space-efficient data structure that aids rapid membership testing in sets, but also has occasional false positives. In this article, we explored two methods to create a Bloom filter in golanguage. The first example has a concise implementation having the balance between memory optimization and functional efficiency. The next implementation has a trade-off between memory consumption and operational simplicity. These implementations demonstrated how hash functions and bit arrays synergize to optimize memory use.

Updated on: 18-Oct-2023

97 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements