Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Program to Check for Almost Similar Strings
Strings in Python are sequences of characters used to represent textual data, enclosed in quotes. Checking for almost similar strings involves comparing and measuring their similarity or dissimilarity, enabling tasks like spell checking and approximate string matching using techniques such as Levenshtein distance or fuzzy matching algorithms.
In this article, we will learn different approaches to check if two strings are almost similar based on character frequency differences.
What Are Almost Similar Strings?
Two strings are considered almost similar if the absolute difference in frequency of any character between the strings does not exceed a given threshold k. For example, if string1 has 'a' appearing 4 times and string2 has 'a' appearing 2 times, the difference is 2. If k=2, these strings are still considered similar.
Example Demonstration
Let's understand with an example ?
# Input strings and threshold
string1 = "aazmdaa"
string2 = "aqqaccd"
k = 2
print(f"String 1: {string1}")
print(f"String 2: {string2}")
print(f"Threshold k: {k}")
String 1: aazmdaa String 2: aqqaccd Threshold k: 2
In this example, 'a' occurs 4 times in string1 and 2 times in string2. The difference (4-2=2) is within our threshold k=2, making them similar.
Method 1: Using Dictionary Comprehension and abs() Function
This approach manually counts character frequencies using dictionary comprehension and compares them ?
from string import ascii_lowercase
def find_frequency(input_string):
"""Returns frequency of each character in the string"""
# Initialize all lowercase letters with frequency 0
frequency = {c: 0 for c in ascii_lowercase}
# Count frequency of each character
for c in input_string.lower():
if c in frequency:
frequency[c] += 1
return frequency
def are_almost_similar_v1(str1, str2, k):
"""Check if strings are almost similar using dictionary approach"""
freq1 = find_frequency(str1)
freq2 = find_frequency(str2)
# Check if any character frequency difference exceeds k
for c in ascii_lowercase:
if abs(freq1[c] - freq2[c]) > k:
return False
return True
# Test the function
string1 = "aazmdaa"
string2 = "aqqaccd"
k = 2
result = are_almost_similar_v1(string1, string2, k)
print(f"Are strings almost similar? {result}")
Are strings almost similar? True
Method 2: Using Counter() and max() Functions
This method uses Python's Counter class for efficient frequency counting ?
from collections import Counter
def are_almost_similar_v2(str1, str2, k):
"""Check if strings are almost similar using Counter"""
# Get character frequencies
freq1 = Counter(str1.lower())
freq2 = Counter(str2.lower())
# Find maximum difference in either direction
diff1 = freq1 - freq2 # Characters more frequent in str1
diff2 = freq2 - freq1 # Characters more frequent in str2
# Check if any difference exceeds k
max_diff1 = max(diff1.values()) if diff1 else 0
max_diff2 = max(diff2.values()) if diff2 else 0
return max_diff1 <= k and max_diff2 <= k
# Test the function
string1 = "aazmdaa"
string2 = "aqqaccd"
k = 2
result = are_almost_similar_v2(string1, string2, k)
print(f"Are strings almost similar? {result}")
# Let's see the frequency differences
freq1 = Counter(string1.lower())
freq2 = Counter(string2.lower())
print(f"String1 frequencies: {dict(freq1)}")
print(f"String2 frequencies: {dict(freq2)}")
Are strings almost similar? True
String1 frequencies: {'a': 4, 'z': 1, 'm': 1, 'd': 1}
String2 frequencies: {'a': 2, 'q': 2, 'c': 2, 'd': 1}
Comparison of Methods
| Method | Time Complexity | Space Complexity | Advantages |
|---|---|---|---|
| Dictionary Comprehension | O(n + m) | O(1) | Fixed space for 26 letters |
| Counter | O(n + m) | O(k) | More concise, handles any characters |
Testing with Different Examples
Let's test both methods with different string combinations ?
from collections import Counter
def test_similarity(str1, str2, k):
"""Test both methods with given strings"""
freq1 = Counter(str1.lower())
freq2 = Counter(str2.lower())
# Method 2 implementation
diff1 = freq1 - freq2
diff2 = freq2 - freq1
max_diff1 = max(diff1.values()) if diff1 else 0
max_diff2 = max(diff2.values()) if diff2 else 0
result = max_diff1 <= k and max_diff2 <= k
print(f"'{str1}' vs '{str2}' with k={k}: {result}")
return result
# Test cases
test_cases = [
("hello", "hallo", 1), # Should be True
("python", "java", 2), # Should be False
("abc", "def", 1), # Should be False
("listen", "silent", 0) # Should be True (anagrams)
]
for str1, str2, k in test_cases:
test_similarity(str1, str2, k)
'hello' vs 'hallo' with k=1: True 'python' vs 'java' with k=2: False 'abc' vs 'def' with k=1: False 'listen' vs 'silent' with k=0: True
Conclusion
Both methods effectively check for almost similar strings by comparing character frequencies. The Counter approach is more concise and Pythonic, while the dictionary method offers more control over the comparison process. Choose Counter for simplicity and dictionary comprehension when you need custom frequency handling.
