w3resource

Python Bloom Filter implementation

Write a Python program that implements a Bloom filter for probabilistic data structures.

The problem involves implementing a Bloom filter, a probabilistic data structure that efficiently tests whether an element is part of a set. Unlike traditional sets, a Bloom filter allows for false positives but guarantees no false negatives, making it highly space-efficient. This is achieved by using multiple hash functions to set bits in a bit array, and checking membership involves verifying if all corresponding bits are set. The challenge includes selecting appropriate hash functions and managing the bit array to minimize the probability of false positives.

Sample Solution:

Python Code :

import hashlib

class BloomFilter:
    def __init__(self, size, hash_count):
        self.size = size  # Size of the bit array
        self.hash_count = hash_count  # Number of hash functions
        self.bit_array = [0] * size  # Bit array to store elements

    def _hashes(self, item):
        """Generate hash_count hashes for the item using different hash functions."""
        hashes = []
        for i in range(self.hash_count):
            # Create a unique hash for each iteration
            hash_result = int(hashlib.md5((str(item) + str(i)).encode()).hexdigest(), 16) % self.size
            hashes.append(hash_result)
        return hashes

    def add(self, item):
        """Add an item to the Bloom filter."""
        hashes = self._hashes(item)
        for hash_value in hashes:
            self.bit_array[hash_value] = 1

    def check(self, item):
        """Check if an item is possibly in the Bloom filter."""
        hashes = self._hashes(item)
        return all(self.bit_array[hash_value] == 1 for hash_value in hashes)

# Example usage
size = 1000  # Size of the bit array
hash_count = 10  # Number of hash functions

bloom_filter = BloomFilter(size, hash_count)

# Add items to the Bloom filter
bloom_filter.add("Red")
bloom_filter.add("Green")
bloom_filter.add("Blue")
bloom_filter.add("Orange")

# Check for item membership
print("Red in filter:", bloom_filter.check("Red"))  # Should be True
print("Green in filter:", bloom_filter.check("Green"))  # Should be True
print("Orange in filter:", bloom_filter.check("Orange"))  # Should be True
print("Black in filter:", bloom_filter.check("Black"))  # Should be False (most likely)

Output:

Red in filter: True
Green in filter: True
Orange in filter: True
Black in filter: False

Explanation:

  • Importing libraries:
    The "hashlib" library is imported to use hash functions, specifically 'md5'.
  • BloomFilter Class Definition:
  • Initialization:
    The "init()" method initializes the Bloom filter with a specified 'size' (for the bit array) and 'hash_count' (number of hash functions).
  • A bit array of the given size is created and initialized to 0.
  • Hash function generation:
    The "_hashes()" method generates multiple hash values for a given item. It does this by concatenating the item with an index, hashing the result using 'md5', and then taking the modulo with the bit array size to ensure the hash values fit within the bit array's bounds.
  • Adding Items:
    The add method hashes the item to get multiple hash values and sets the corresponding positions in the bit array to 1.
  • Checking membership:
    The "check()" method generates hash values for the item and checks if all the corresponding positions in the bit array are set to 1. If so, it returns 'True', indicating the item is likely in the filter; otherwise, it returns 'False'.
  • Example usage:
    A Bloom filter instance is created with a bit array size of 1000 and 10 hash functions.
  • Items "Red", "Green", "Blue", and "Orange" are added to the Bloom filter.
  • The membership of these items is checked and printed. "Red", "Green", and "Orange" should return 'True', while "Black" (not added) should return 'False' (most likely).

Python Code Editor :

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Python Custom JSON Encoder and Decoder.
Next: Python LRU Cache Implementation.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.