Finding Prefix frequency in a string List using Python


In this article we will learn how to find the prefix frequency in a string list using python. There are various ways to solve this program in python, we will look at some of them.

Finding prefix frequency can help in finding the pattern and distribution of word uses in the string.

Method 1: Using Simple for Loop

Example

def find_prefix_freq(strings, prefix):
   count = 0
   for string in strings:
      if string.startswith(prefix):
         count += 1
   return count

strings = ['apple', 'aptitude', 'approve', 'aplaude','application', 'applause' 'apologize']
prefix = 'app'
print("Frequency of prefix "+ prefix + " is: "+ str(find_prefix_freq(strings, prefix)))

Output

Frequency of prefix app is: 4

Explanation

Here function takes two parameter string and prefix. Inside the function we have count variable which will count the total number of string which is having same prefix. Using for loop we will traverse over each string and using startswith() method it will check if it starts with given prefix, if it is then count will get incremented by 1.

Method 2: Using List Comprehension

Using list comprehension method we can check the find the string having prefix as given prefix string. It provides us a way to filter the string which starts with the given prefix.

Example

def find_prefix_freq(strings, prefix):
   filtered_strings = [string for string in strings if string.startswith(prefix)]
   count = len(filtered_strings)
   return count
    
strings = ['apple', 'aptitude', 'approve', 'aplaude','application', 'applause' 'apologize']
prefix = 'app'
print("Frequency of prefix "+ prefix + " is: "+ str(find_prefix_freq(strings, prefix)))

Output

Frequency of prefix app is: 4

Explanation

Here function takes two parameter string and prefix. Inside the function we will use list comprehension to create new list filtered_strings. The list comprehension iterates over each string in the strings list and checks if the string starts with the given prefix using the startswith() method. Only the strings that satisfy this condition are added to the filtered_strings list. We will use the len() function to get the count of the string which starts with given prefix.

Method 3: Using the Counter Class

In this method we will use the counter class from the collection module. It provides us a concise way to count the occurrence of elements in a collection.

Example

from collections import Counter

def find_prefix_freq(strings, prefix):
   pref = [string[:len(prefix)] for string in strings if string.startswith(prefix)]
   prefix_freq = Counter(pref)
   count = prefix_freq[prefix]
   return count

strings = ['apple', 'aptitude', 'approve', 'aplaude','application', 'applause' 'apologize']
prefix = 'app'
print("Frequency of prefix "+ prefix + " is: "+ str(find_prefix_freq(strings, prefix)))

Output

Frequency of prefix app is: 4

Explanation

Here we imported the counter class from the collection module. Counter class helps us to find the frequency of any list or iterables. Same as method3 we will use the list comprehension to create new list pref. List comprehension will iterate over each string in the list and check if the string starts with given prefix using startswith() method and extract that particular portion using slicing [:len(prefix)]. Using this way we can add the string to the pref list which satisfies the criteria.

After that we will use the Counter class to create prefix_freq object by passing in the pref list. Using prefix_freq[pref] we can get the count associated with the pref and assign it to the count variable.

Method 4: Using Pandas Dataframe

We use the dataframe when we have larger string size or complex structure of the string then we can use this to calculate prefix in the list of strings. Here we converted the string list into a dataframe and after that we used the builtin function to count the string containing prefix.

Example

import pandas as pd

def find_prefix_freq(strings, prefix):
   df = pd.DataFrame(strings, columns=['String'])
   df['Prefix'] = df['String'].apply(lambda x: x[:len(prefix)])
   prefix_freq = df.groupby('Prefix').size().to_dict()
   count = prefix_freq.get(prefix, 0)
   return count

strings = ['apple', 'aptitude', 'approve', 'aplaude','application', 'applause' 'apologize']
prefix = 'app'
print("Frequency of prefix "+ prefix + " is: "+ str(find_prefix_freq(strings, prefix)))

Output

Frequency of prefix app is: 4

Explanation

Here in the program we imported the pandas library. Our function takes two parameter string and prefix. Inside the function we created a DataFrame object df using pd.DataFrame() constructor. In the constructor string list is passed as the data and a column named string is assigned to it. A new column will get added to the df dataframe using the .apply() method. Using the lambda function, we will apply the [:len(prefix)] string slicing to each of the strings and extract the prefix portion.

Using the groupby method on the dataframe we will group the row by the value in the prefix column.

Method 5: Using Regular Expressions

Regular expression is considered as a very powerful for pattern matching of complex string structure. Here we use the ‘re’ module to search the string which matches with the given prefix and count the total number of matches.

Example

import re

def find_prefix_freq(strings, prefix):
   pattern = f'^{prefix}'
   count = sum(1 for string in strings if re.match(pattern, string))
   return count

strings = ['apple', 'aptitude', 'approve', 'aplaude','application', 'applause' 'apologize']
prefix = 'app'
print("Frequency of prefix "+ prefix + " is: "+ str(find_prefix_freq(strings, prefix)))

Output

Frequency of prefix app is: 4

Explanation

In the above program we have imported the regular expressions which will be required for the prefix matching. Inside the function we will first construct the regular expression using ^ symbol, it denotes the start of the string followed by the prefix. After construction of the regular expression, we will use the list comprehension technique to iterate over each of the string in the list and first each of the string we will use re.match() function to check if it matches with the pattern we constructed using regular expression. If the pattern matches, then we will increment our count.

So, these were some methods which can be used to find the prefix frequency in s string list. Each of the methods has its own advantages like performance, simplicity. Using these methods you can extract the valuable information from the string. You can choose any of the methods according to your preference and expected performance and apply it to get the insight about the frequency of the prefix.

Updated on: 13-Oct-2023

69 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements