Program to sort out phrases based on their appearances in Python

Suppose we are given two lists: phrases that contains selected phrases and sentences that contains several sentences. We need to find which phrases appear in the sentences and sort the phrases based on their frequency of appearances. The phrase with the most appearances comes first.

So, if the input is like phrases = ['strong', 'durable', 'efficient'], sentences = ['the product is durable and efficient', 'strong and durable', 'it is efficient', 'like it because it is efficient'], then the output will be ['efficient', 'durable', 'strong']

The phrase 'efficient' appears in 3 sentences (indices 0, 2, and 3), making it the most frequent. The phrase 'durable' appears in 2 sentences (indices 0 and 1), and 'strong' appears in 1 sentence (index 1).

Algorithm

To solve this problem, we follow these steps ?

  • Create a counter dictionary to track phrase frequencies
  • Initialize all phrases with count 0
  • For each sentence, split it into words and convert to a set to avoid counting duplicates
  • Increment the counter for each phrase found in the sentence
  • Sort phrases by frequency (descending) and by original order for ties
  • Return the sorted phrase list

Example

Let us see the following implementation to get better understanding ?

def solve(phrases, sentences):
    cnt = {}
    for feature in phrases:
        cnt[feature] = 0
    
    for response in sentences:
        words = response.split()
        unique_words = set(words)
        for word in unique_words:
            if word in cnt:
                cnt[word] += 1
    
    # Create list of [phrase, count] pairs
    phrase_counts = [[phrase, cnt[phrase]] for phrase in cnt]
    
    # Sort by count (descending), then by original order for ties
    phrase_counts.sort(key=lambda x: (-x[1], phrases.index(x[0])))
    
    # Return only the phrases
    return [item[0] for item in phrase_counts]

# Test the function
phrases = ['strong', 'durable', 'efficient']
sentences = [
    'the product is durable and efficient', 
    'strong and durable', 
    'it is efficient', 
    'like it because it is efficient'
]

result = solve(phrases, sentences)
print(result)

The output of the above code is ?

['efficient', 'durable', 'strong']

How It Works

The algorithm counts phrase occurrences by:

  • Frequency counting: Each phrase gets a counter initialized to 0
  • Word extraction: Each sentence is split into individual words
  • Duplicate handling: Converting words to a set ensures each phrase is counted only once per sentence
  • Sorting logic: Primary sort by count (descending), secondary sort by original phrase order

Alternative Approach Using Counter

We can simplify the counting using Python's Counter class ?

from collections import Counter

def solve_with_counter(phrases, sentences):
    phrase_counts = Counter()
    
    # Initialize all phrases with 0 count
    for phrase in phrases:
        phrase_counts[phrase] = 0
    
    # Count occurrences
    for sentence in sentences:
        words = set(sentence.split())
        for phrase in phrases:
            if phrase in words:
                phrase_counts[phrase] += 1
    
    # Sort by count (descending), then by original order
    sorted_phrases = sorted(phrases, key=lambda x: (-phrase_counts[x], phrases.index(x)))
    
    return sorted_phrases

# Test the function
phrases = ['strong', 'durable', 'efficient']
sentences = [
    'the product is durable and efficient', 
    'strong and durable', 
    'it is efficient', 
    'like it because it is efficient'
]

result = solve_with_counter(phrases, sentences)
print(result)

The output of the above code is ?

['efficient', 'durable', 'strong']

Conclusion

This algorithm efficiently sorts phrases by their frequency of appearance in sentences. The key insight is using sets to avoid counting duplicate occurrences within the same sentence, ensuring accurate frequency calculation.

Updated on: 2026-03-26T14:35:25+05:30

187 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements