Python Program to Check for Almost Similar Strings

Strings in Python are sequences of characters used to represent textual data, enclosed in quotes. Checking for almost similar strings involves comparing and measuring their similarity or dissimilarity, enabling tasks like spell checking and approximate string matching using techniques such as Levenshtein distance or fuzzy matching algorithms.

In this article, we will learn different approaches to check if two strings are almost similar based on character frequency differences.

What Are Almost Similar Strings?

Two strings are considered almost similar if the absolute difference in frequency of any character between the strings does not exceed a given threshold k. For example, if string1 has 'a' appearing 4 times and string2 has 'a' appearing 2 times, the difference is 2. If k=2, these strings are still considered similar.

Example Demonstration

Let's understand with an example ?

# Input strings and threshold
string1 = "aazmdaa"
string2 = "aqqaccd" 
k = 2

print(f"String 1: {string1}")
print(f"String 2: {string2}")
print(f"Threshold k: {k}")
String 1: aazmdaa
String 2: aqqaccd
Threshold k: 2

In this example, 'a' occurs 4 times in string1 and 2 times in string2. The difference (4-2=2) is within our threshold k=2, making them similar.

Method 1: Using Dictionary Comprehension and abs() Function

This approach manually counts character frequencies using dictionary comprehension and compares them ?

from string import ascii_lowercase

def find_frequency(input_string):
    """Returns frequency of each character in the string"""
    # Initialize all lowercase letters with frequency 0
    frequency = {c: 0 for c in ascii_lowercase}
    
    # Count frequency of each character
    for c in input_string.lower():
        if c in frequency:
            frequency[c] += 1
    
    return frequency

def are_almost_similar_v1(str1, str2, k):
    """Check if strings are almost similar using dictionary approach"""
    freq1 = find_frequency(str1)
    freq2 = find_frequency(str2)
    
    # Check if any character frequency difference exceeds k
    for c in ascii_lowercase:
        if abs(freq1[c] - freq2[c]) > k:
            return False
    
    return True

# Test the function
string1 = "aazmdaa"
string2 = "aqqaccd"
k = 2

result = are_almost_similar_v1(string1, string2, k)
print(f"Are strings almost similar? {result}")
Are strings almost similar? True

Method 2: Using Counter() and max() Functions

This method uses Python's Counter class for efficient frequency counting ?

from collections import Counter

def are_almost_similar_v2(str1, str2, k):
    """Check if strings are almost similar using Counter"""
    # Get character frequencies
    freq1 = Counter(str1.lower())
    freq2 = Counter(str2.lower())
    
    # Find maximum difference in either direction
    diff1 = freq1 - freq2  # Characters more frequent in str1
    diff2 = freq2 - freq1  # Characters more frequent in str2
    
    # Check if any difference exceeds k
    max_diff1 = max(diff1.values()) if diff1 else 0
    max_diff2 = max(diff2.values()) if diff2 else 0
    
    return max_diff1 <= k and max_diff2 <= k

# Test the function
string1 = "aazmdaa"
string2 = "aqqaccd"
k = 2

result = are_almost_similar_v2(string1, string2, k)
print(f"Are strings almost similar? {result}")

# Let's see the frequency differences
freq1 = Counter(string1.lower())
freq2 = Counter(string2.lower())
print(f"String1 frequencies: {dict(freq1)}")
print(f"String2 frequencies: {dict(freq2)}")
Are strings almost similar? True
String1 frequencies: {'a': 4, 'z': 1, 'm': 1, 'd': 1}
String2 frequencies: {'a': 2, 'q': 2, 'c': 2, 'd': 1}

Comparison of Methods

Method Time Complexity Space Complexity Advantages
Dictionary Comprehension O(n + m) O(1) Fixed space for 26 letters
Counter O(n + m) O(k) More concise, handles any characters

Testing with Different Examples

Let's test both methods with different string combinations ?

from collections import Counter

def test_similarity(str1, str2, k):
    """Test both methods with given strings"""
    freq1 = Counter(str1.lower())
    freq2 = Counter(str2.lower())
    
    # Method 2 implementation
    diff1 = freq1 - freq2
    diff2 = freq2 - freq1
    
    max_diff1 = max(diff1.values()) if diff1 else 0
    max_diff2 = max(diff2.values()) if diff2 else 0
    
    result = max_diff1 <= k and max_diff2 <= k
    
    print(f"'{str1}' vs '{str2}' with k={k}: {result}")
    return result

# Test cases
test_cases = [
    ("hello", "hallo", 1),  # Should be True
    ("python", "java", 2),  # Should be False  
    ("abc", "def", 1),      # Should be False
    ("listen", "silent", 0) # Should be True (anagrams)
]

for str1, str2, k in test_cases:
    test_similarity(str1, str2, k)
'hello' vs 'hallo' with k=1: True
'python' vs 'java' with k=2: False
'abc' vs 'def' with k=1: False
'listen' vs 'silent' with k=0: True

Conclusion

Both methods effectively check for almost similar strings by comparing character frequencies. The Counter approach is more concise and Pythonic, while the dictionary method offers more control over the comparison process. Choose Counter for simplicity and dictionary comprehension when you need custom frequency handling.

Updated on: 2026-03-27T12:52:15+05:30

329 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements