Python - Percentage similarity of lists

Measuring the similarity between two lists is a common task in Python applications like data analysis, text processing, and recommendation systems. This article explores two methods to calculate percentage similarity between lists based on shared elements.

List similarity is determined by analyzing overlap or shared elements between lists. This provides a numerical assessment of how much two lists have in common, expressed as a percentage. The choice of method depends on whether you need to handle duplicates and how you want to calculate the similarity ratio.

Method 1: Using Set Intersection

This approach uses Python's set data structure to find common elements between lists. It automatically removes duplicates and focuses on unique shared elements ?

def intersection_similarity(list1, list2):
    # Convert to sets and find intersection
    common_elements = set(list1).intersection(set(list2))
    
    # Calculate similarity based on average list length
    avg_length = (len(list1) + len(list2)) / 2
    similarity = (len(common_elements) / avg_length) * 100
    
    return similarity

# Example lists
first_list = [11, 22, 33, 44, 55]
second_list = [44, 55, 66, 77, 88]

# Calculate similarity
result = intersection_similarity(first_list, second_list)
print(f"Intersection-based similarity: {result:.2f}%")
Intersection-based similarity: 40.00%

Method 2: Element-by-Element Comparison

This approach iterates through one list and checks if each element exists in the other list. It preserves duplicate counting behavior ?

def element_similarity(list1, list2):
    # Count common elements
    common_count = 0
    
    for element in list1:
        if element in list2:
            common_count += 1
    
    # Calculate similarity based on first list length
    similarity = (common_count / len(list1)) * 100
    
    return similarity

# Example lists
first_list = [11, 22, 33, 44, 55]
second_list = [44, 55, 66, 77, 88]

# Calculate similarity
result = element_similarity(first_list, second_list)
print(f"Element-based similarity: {result:.2f}%")
Element-based similarity: 40.00%

Handling Lists with Duplicates

The difference between methods becomes apparent when lists contain duplicate elements ?

# Lists with duplicates
list_a = [1, 2, 2, 3, 3, 3]
list_b = [2, 3, 4, 5]

print("List A:", list_a)
print("List B:", list_b)
print()

# Method 1: Set intersection
intersection_result = intersection_similarity(list_a, list_b)
print(f"Intersection similarity: {intersection_result:.2f}%")

# Method 2: Element comparison  
element_result = element_similarity(list_a, list_b)
print(f"Element similarity: {element_result:.2f}%")
List A: [1, 2, 2, 3, 3, 3]
List B: [2, 3, 4, 5]

Intersection similarity: 40.00%
Element similarity: 83.33%

Comparison

Method Handles Duplicates Base Calculation Best For
Set Intersection Removes duplicates Average list length Unique element similarity
Element Comparison Preserves duplicates First list length Occurrence-based similarity

Conclusion

Use set intersection when you want to measure similarity based on unique shared elements. Use element-by-element comparison when duplicate occurrences matter. Both methods provide valuable insights depending on your specific use case and data characteristics.

---
Updated on: 2026-03-27T15:36:29+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements