Python - Percentage similarity of lists


Measuring the similarity of two lists in Python is a frequent operation performed in a variety of applications. Determining the degree of similarity across lists is essential for making wise judgments and gaining insightful knowledge, whether you are working with data analysis, text processing, recommendation systems, or even social network analysis. In this article, we'll examine two distinct methods for estimating the % similarity as we delve into the subject of list similarity.

Analyzing overlap or shared elements between lists is necessary to determine their degree of similarity. This metric offers a numerical assessment of the degree of similarity between two lists. It enables us to meaningfully measure the degree of similarity and quantify the degree of overlap. The set data structure is employed in the first approach, which is particularly useful when duplicates are not desired, to get the intersection of two lists. The second method, which compares list items one at a time, is appropriate when computing the number of shared elements is more necessary than getting rid of duplicates.

Approaches

For finding the percentage similarity of lists using Python, we can follow the two methods −

  • Intersection-based similarity calculation.

  • Element-based similarity calculation.

Let us investigate both approaches −

Intersection Based Similarity Calculation

The intersection-based technique focuses on identifying the shared components between the lists when assessing how similar two lists are to one another. It makes use of the set data structure that Python by default includes, which offers a quick way to get rid of duplicates and carry out set operations like finding the intersection. We may calculate the percentage similarity by calculating the size of the intersection and considering the average length of the lists. When duplicates are not desirable and we want to concentrate on the distinct shared elements between the lists, this strategy is quite helpful.

Algorithm

The algorithm for finding the percentage similarity of lists using Python, is given below −

  • Step 1 − Create a function that takes two lists as a parameter.

  • Step 2 − Compute the intersection between two lists.

  • Step 3 − Compute the similarity percentage among the two lists.

  • Step 4 − Return the result.

  • Step 5 − Create the first and second lists.

  • Step 6 − Call the above function and pass the above list as a parameter.

  • Step 7 − Display the result.

Example

# Create a function that takes two lists as a parameter
def similarity_compute(list_first, list_second):
   # Compute the intersection from the first and second list
   intersected_items = set(list_first).intersection(list_second)
   # Compute the similarity percentage among the two list 
   lengthOfItersectedItems = len(intersected_items)
   similarity_percentage = (lengthOfItersectedItems / ((len(list_first) + len(list_second)) / 2)) * 100
   # Return the result
   return similarity_percentage

# Create the first list as an example
firstList = [11, 22, 33, 44, 55]
# Create the second list as an example
secondList = [44, 55, 66, 77, 88]
# Call the above function
similarity_percentage = similarity_compute(firstList, secondList)
# Display the result
print("Similarity Percentage: {:.2f}%".format(similarity_percentage))

Output

Similarity Percentage: 40.00%

Element Based Similarity Calculation

The element-based approach determines the similarity between two lists by comparing elements at each place, as opposed to the intersection-based approach. It entails going over one list iteratively and determining whether each element is present in the other list. We may calculate the % similarity by collecting the shared entries and dividing that total by the length of one of the lists. When counting the occurrences of common components is more important than removing duplicates, this method is appropriate. Despite their individuality, it enables us to judge the similarity between lists based on the presence of shared items.

Algorithm

The algorithm for finding the percentage similarity of lists using Python is given below −

  • Step 1 − Create the function that takes two lists as a parameter.

  • Step 2 − Take a variable count that holds the common value count among the list.

  • Step 3 − Traverse the for loop and for each list compute the common element.

  • Step 4 − Increase the count value for each common element in the list.

  • Step 5 − Compute the similarity percentage by dividing the common values with all the list element lengths.

  • Step 6 − Call the above function and pass the two lists as a parameter.

  • Step 7 − Display the results.

Example

#Create a function that takes two lists as a parameter
def similarity_compute(first_list, second_list):
   # take count variable two count intersection numbers among the list
   count = 0
   # Traverse the first list
   for element in first_list:
      # for if the second list has the element in the first list 
      if element in second_list:
         # increment the value
         count += 1
   #Compute the similarity percentage  
   similarity_percentage = (count / len(first_list)) * 100
   return similarity_percentage

# Take an example of two lists
first_list = [11, 22, 33, 44, 55]
# The second list
second_list = [44, 55, 66, 77, 88]
# Call the above function
similarity_percentage = similarity_compute(first_list, second_list)
# Display the result
print("Similarity Percentage: {:.2f}%".format(similarity_percentage))

Output

Similarity Percentage: 40.00%

Conclusion

In this article, we investigated two approaches for computing the lists' % similarity in Python. While the second approach included element-wise comparisons, the first approach made use of the set intersection operation. For the provided example lists, both methods produced the same similarity percentage. When selecting a method, it is crucial to consider the features of the input lists and the needs of your application.

Updated on: 18-Oct-2023

200 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements