Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Find Duplicates of array using bit array in Python
Suppose we have an array of n different numbers; n can be 32,000 at max. The array may have duplicate entries and we do not know what is the value of n. Now if we have only 4-Kilobytes of memory, how would display all duplicates in the array?
So, if the input is like [2, 6, 2, 11, 13, 11], then the output will be [2,11] as 2 and 11 appear more than once in given array.
How Bit Array Works
A bit array uses individual bits to track whether a number has been seen before. Each bit position represents a specific number, making it memory-efficient for tracking duplicates.
Algorithm Steps
To solve this, we will follow these steps ?
Create a bit array data structure with methods to get and set bit values
For each number in the input array, check if its bit is already set
If the bit is set, the number is a duplicate
If not set, mark the bit to indicate we've seen this number
Implementation
Let us see the following implementation to get better understanding ?
class BitArray:
def __init__(self, n):
# Create array of integers to store bits
# Each integer stores 32 bits, so we need (n >> 5) + 1 integers
self.arr = [0] * ((n >> 5) + 1)
def get_val(self, pos):
# Find which integer contains our bit
index = pos >> 5 # Divide by 32
bit_no = pos & 31 # Get remainder (pos % 32)
# Check if bit is set
return (self.arr[index] & (1 << bit_no)) != 0
def set_val(self, pos):
# Find which integer contains our bit
index = pos >> 5
bit_no = pos & 31
# Set the bit using OR operation
self.arr[index] |= (1 << bit_no)
def find_duplicates(input_arr):
# Create bit array for numbers up to 32000
bit_arr = BitArray(32000)
duplicates = []
for num in input_arr:
if bit_arr.get_val(num):
# Number already seen, it's a duplicate
duplicates.append(num)
else:
# First time seeing this number, mark it
bit_arr.set_val(num)
return duplicates
# Test the function
arr = [2, 6, 2, 11, 13, 11]
result = find_duplicates(arr)
print("Duplicates:", result)
Duplicates: [2, 11]
How It Works
The bit array uses bitwise operations for efficient memory usage:
pos >> 5is equivalent topos // 32to find the array indexpos & 31is equivalent topos % 32to find the bit position1 creates a mask with only the target bit setarr[index] |= masksets the bit using OR operationarr[index] & maskchecks if the bit is set
Memory Efficiency
This approach uses only 4KB of memory (32,000 bits รท 8 bits per byte = 4,000 bytes) to track all possible numbers, making it suitable for the given memory constraint.
Example with Step-by-Step Process
def find_duplicates_verbose(input_arr):
bit_arr = BitArray(32000)
duplicates = []
for i, num in enumerate(input_arr):
print(f"Processing element {i}: {num}")
if bit_arr.get_val(num):
print(f" ? {num} already seen, adding to duplicates")
duplicates.append(num)
else:
print(f" ? {num} not seen before, marking as seen")
bit_arr.set_val(num)
return duplicates
# Test with verbose output
arr = [2, 6, 2, 11, 13, 11]
result = find_duplicates_verbose(arr)
print(f"\nFinal duplicates: {result}")
Processing element 0: 2 ? 2 not seen before, marking as seen Processing element 1: 6 ? 6 not seen before, marking as seen Processing element 2: 2 ? 2 already seen, adding to duplicates Processing element 3: 11 ? 11 not seen before, marking as seen Processing element 4: 13 ? 13 not seen before, marking as seen Processing element 5: 11 ? 11 already seen, adding to duplicates Final duplicates: [2, 11]
Conclusion
The bit array approach efficiently finds duplicates using minimal memory by representing each number as a single bit. This technique is ideal when working with memory constraints and large ranges of possible values.
