Python program to find the frequency of the elements which are common in a list of strings


In this Python article, the given task is to get the frequency of the elements which are common in a list of strings. Sometimes the list to be analyzed using Python is available in an Excel file. To get this list from Excel, a module called openpyxl is used. In this Python article, using three different examples, the ways of getting the frequency of the items which are repeated in a list of strings are given. In example 1, the frequency of the characters which are common in a list of strings is found. In the next two examples, the methods are given where the frequency of words that are common in a list of strings is given. In these examples, the list of strings is fetched from an Excel file’s column.

Preprocessing Steps

Step 1 − Login using Google account. Go to Google Colab. Open a new Colab Notebook and write the python code in it.

Step 2 − First upload the Excel file "oldrecord5.xlsx" to Google Colab.

Step 3 − Import “openpyxl”.

Step 4 − Use openpyxl.load_workbook function to load the excel file.

Step 5 − Open the active sheet in a variable called myxlsxsheet

Step 6 − Fetch this string column into a dataframe using Pandas.

Step 7 − Convert dataframe to a list. Call this list as “title_list”

The Excel file content used for these examples

                                      

Fig: Showing the Excel file oldrecord5.xls used in examples

Upload the Excel file to colab

                                    

Fig: Uploading the oldrecord5.xls in Google Colab

Example 1: Get the frequency of the characters that are found in a list of strings using reduce function

In this approach, the reduce function is being used.

Step 1 − Use the list “title_list” from the preprocessing steps above.

Step 2 − Use reduce, lambda, and Counter to find the frequency of the characters for all the common characters in these strings.

Step 3 − Show the results in the form of a dictionary.

Write the following code in the Google Colab Worksheet’s code cell

import openpyxl
from openpyxl import load_workbook
import pandas as pd
from functools import reduce
from collections import Counter
# load excel file with its path
myxlsx = openpyxl.load_workbook("oldrecord5.xlsx")
myxlsxsheet = myxlsx.active

# Convert to DataFrame
df = pd.DataFrame(myxlsxsheet.values)

#Select those rows that contain "Discussion" String
df1=df[df.iloc[:,3].str.contains('Discussion')]

#Select only the titles' column 
df2 = df1.iloc[:,3]

title_list=df2.values.tolist()
print(title_list)

itemFreq = reduce(lambda m, n: m & n, (Counter(elem) for elem in title_list[1:]),Counter(title_list[0]))

print("Common Characters and their occurrence : " , str(dict(itemFreq)))

Viewing The Result

Press the play button on the code cells to see the results.

['Types of Environment - Class Discussion', 'Management Structure and Nature  - Class Discussion', 'Macro- Demography, Natural, Legal & Political  - Class Discussion']
Common Characters and their occurrence :  {'e': 2, 's': 5, ' ': 5, 'o': 1, 'n': 1, 'i': 2, 'r': 1, 'm': 1, 't': 1, '-': 1, 'C': 1, 'l': 1, 'a': 1, 'D': 1, 'c': 1, 'u': 1}

Fig 1: Showing the results using Google Colab.

Example 2: Get the frequency of the words that are found in a list of strings by combining and sorting lists

To follow this approach we have used the following steps

Step 1 − Use the list “title_list” from the preprocessing steps above.

Step 2 − Use split on individual list items to separate it into words and then combine these words into a combined list.

Step 3 − Sort this combined list and find the frequency using Counter. Show the results in the form of a dictionary.

Write the following code in the Google Colab Worksheet’s code cell.

from collections import Counter
import openpyxl
from openpyxl import load_workbook
import pandas as pd

# load excel file with its path
myxlsx = openpyxl.load_workbook("oldrecord5.xlsx")
myxlsxsheet = myxlsx.active

# Convert to DataFrame
df = pd.DataFrame(myxlsxsheet.values)

#Select those rows that contain "Discussion" String
df1=df[df.iloc[:,3].str.contains('Discussion')]

#Select only titles' column 
df2 = df1.iloc[:,3]

title_list=df2.values.tolist()
print(title_list)
lst1= title_list[0].split()
lst2= title_list[1].split()
lst3= title_list[2].split()
combinedlist = [*lst1, *lst2, *lst3]

# Print output
print("Concatenated List: ",combinedlist)

for elem in sorted(combinedlist):
   print(elem)

frequencyofelements=Counter(combinedlist)
print("frequency of elements: ",frequencyofelements)

The Output of the Example 2

To see the results in colab, press the play button on the code cells.

['Types of Environment - Class Discussion', 'Management Structure and Nature  - Class Discussion', 'Macro- Demography, Natural, Legal & Political  - Class Discussion']
Concatenated List:  ['Types', 'of', 'Environment', '-', 'Class', 'Discussion', 'Management', 'Structure', 'and', 'Nature', '-', 'Class', 'Discussion', 'Macro-', 'Demography,', 'Natural,', 'Legal', '&', 'Political', '-', 'Class', 'Discussion']
&
-
-
-
Class
Class
Class
Demography,
Discussion
Discussion
Discussion
Environment
Legal
Macro-
Management
Natural,
Nature
Political
Structure
Types
and
of
frequency of elements:  Counter({'-': 3, 'Class': 3, 'Discussion': 3, 'Types': 1, 'of': 1, 'Environment': 1, 'Management': 1, 'Structure': 1, 'and': 1, 'Nature': 1, 'Macro-': 1, 'Demography,': 1, 'Natural,': 1, 'Legal': 1, '&': 1, 'Political': 1})

Fig 2: Showing the results using Google Colab.

Example 3: Get the frequency of the words that are found in a list of strings by using pandas and their functions

To follow this approach we have used the following steps

Step 1 − Use the list “title_list” from the preprocessing steps above.

Step 2 − Use split on individual list items to separate them into words and then combine these words into a combined list.

Step 3 − Use Pandas Series and then use value_counts() function to calculate the frequency of words used. Show the output.

Write the following code in the Google Colab Worksheet’s code cell.

import openpyxl
from openpyxl import load_workbook
import pandas as pd

# load excel file with its path
myxlsx = openpyxl.load_workbook("oldrecord5.xlsx")
myxlsxsheet = myxlsx.active

# Convert to DataFrame
df = pd.DataFrame(myxlsxsheet.values)

#Select those rows that contain "Discussion" String
df1=df[df.iloc[:,3].str.contains('Discussion')]

#Select only titles' column 
df2 = df1.iloc[:,3]

title_list=df2.values.tolist()
print(title_list)
lst1= title_list[0].split()
lst2= title_list[1].split()
lst3= title_list[2].split()

#combinedlist = [*lst1, *lst2, *lst3, *lst4, *lst5]
combinedlist = [*lst1, *lst2, *lst3]
# Print output
print("Concatenated List: ",combinedlist)

frequencyofelements = pd.Series(combinedlist).value_counts()
print("frequency of elements: ") 
print(frequencyofelements)   

Viewing The Result

Press the play button on the code cells to see the results

['Types of Environment - Class Discussion', 'Management Structure and Nature  - Class Discussion', 'Macro- Demography, Natural, Legal & Political  - Class Discussion']
Concatenated List:  ['Types', 'of', 'Environment', '-', 'Class', 'Discussion', 'Management', 'Structure', 'and', 'Nature', '-', 'Class', 'Discussion', 'Macro-', 'Demography,', 'Natural,', 'Legal', '&', 'Political', '-', 'Class', 'Discussion']
frequency of elements: 
-              3
Class          3
Discussion     3
Types          1
of             1
Environment    1
Management     1
Structure      1
and            1
Nature         1
Macro-         1
Demography,    1
Natural,       1
Legal          1
&              1
Political      1
dtype: int64

In this Python article, by three different examples, the ways to show how to find the frequency of the elements that are found in a list of strings are given. In the first example, the way to do this is given by treating elements as simple characters occurring in strings. In Example two and three, first the strings are separated as individual meaningful words and then they are used as elements to get the frequency.

Updated on: 10-Jul-2023

87 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements