- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Python - Percentage similarity of lists
Measuring the similarity of two lists in Python is a frequent operation performed in a variety of applications. Determining the degree of similarity across lists is essential for making wise judgments and gaining insightful knowledge, whether you are working with data analysis, text processing, recommendation systems, or even social network analysis. In this article, we'll examine two distinct methods for estimating the % similarity as we delve into the subject of list similarity.
Analyzing overlap or shared elements between lists is necessary to determine their degree of similarity. This metric offers a numerical assessment of the degree of similarity between two lists. It enables us to meaningfully measure the degree of similarity and quantify the degree of overlap. The set data structure is employed in the first approach, which is particularly useful when duplicates are not desired, to get the intersection of two lists. The second method, which compares list items one at a time, is appropriate when computing the number of shared elements is more necessary than getting rid of duplicates.
Approaches
For finding the percentage similarity of lists using Python, we can follow the two methods −
Intersection-based similarity calculation.
Element-based similarity calculation.
Let us investigate both approaches −
Intersection Based Similarity Calculation
The intersection-based technique focuses on identifying the shared components between the lists when assessing how similar two lists are to one another. It makes use of the set data structure that Python by default includes, which offers a quick way to get rid of duplicates and carry out set operations like finding the intersection. We may calculate the percentage similarity by calculating the size of the intersection and considering the average length of the lists. When duplicates are not desirable and we want to concentrate on the distinct shared elements between the lists, this strategy is quite helpful.
Algorithm
The algorithm for finding the percentage similarity of lists using Python, is given below −
Step 1 − Create a function that takes two lists as a parameter.
Step 2 − Compute the intersection between two lists.
Step 3 − Compute the similarity percentage among the two lists.
Step 4 − Return the result.
Step 5 − Create the first and second lists.
Step 6 − Call the above function and pass the above list as a parameter.
Step 7 − Display the result.
Example
# Create a function that takes two lists as a parameter def similarity_compute(list_first, list_second): # Compute the intersection from the first and second list intersected_items = set(list_first).intersection(list_second) # Compute the similarity percentage among the two list lengthOfItersectedItems = len(intersected_items) similarity_percentage = (lengthOfItersectedItems / ((len(list_first) + len(list_second)) / 2)) * 100 # Return the result return similarity_percentage # Create the first list as an example firstList = [11, 22, 33, 44, 55] # Create the second list as an example secondList = [44, 55, 66, 77, 88] # Call the above function similarity_percentage = similarity_compute(firstList, secondList) # Display the result print("Similarity Percentage: {:.2f}%".format(similarity_percentage))
Output
Similarity Percentage: 40.00%
Element Based Similarity Calculation
The element-based approach determines the similarity between two lists by comparing elements at each place, as opposed to the intersection-based approach. It entails going over one list iteratively and determining whether each element is present in the other list. We may calculate the % similarity by collecting the shared entries and dividing that total by the length of one of the lists. When counting the occurrences of common components is more important than removing duplicates, this method is appropriate. Despite their individuality, it enables us to judge the similarity between lists based on the presence of shared items.
Algorithm
The algorithm for finding the percentage similarity of lists using Python is given below −
Step 1 − Create the function that takes two lists as a parameter.
Step 2 − Take a variable count that holds the common value count among the list.
Step 3 − Traverse the for loop and for each list compute the common element.
Step 4 − Increase the count value for each common element in the list.
Step 5 − Compute the similarity percentage by dividing the common values with all the list element lengths.
Step 6 − Call the above function and pass the two lists as a parameter.
Step 7 − Display the results.
Example
#Create a function that takes two lists as a parameter def similarity_compute(first_list, second_list): # take count variable two count intersection numbers among the list count = 0 # Traverse the first list for element in first_list: # for if the second list has the element in the first list if element in second_list: # increment the value count += 1 #Compute the similarity percentage similarity_percentage = (count / len(first_list)) * 100 return similarity_percentage # Take an example of two lists first_list = [11, 22, 33, 44, 55] # The second list second_list = [44, 55, 66, 77, 88] # Call the above function similarity_percentage = similarity_compute(first_list, second_list) # Display the result print("Similarity Percentage: {:.2f}%".format(similarity_percentage))
Output
Similarity Percentage: 40.00%
Conclusion
In this article, we investigated two approaches for computing the lists' % similarity in Python. While the second approach included element-wise comparisons, the first approach made use of the set intersection operation. For the provided example lists, both methods produced the same similarity percentage. When selecting a method, it is crucial to consider the features of the input lists and the needs of your application.