Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Write a program in Python to filter City column elements by removing the unique prefix in a given dataframe
When working with pandas DataFrames, you might need to filter cities that share the same starting letter with other cities. This tutorial shows how to remove cities with unique prefixes (first letters) and keep only those cities whose first letter appears in multiple city names.
Understanding the Problem
Given a DataFrame with city names, we want to filter out cities that have unique starting letters. For example, if only one city starts with 'C', we exclude it. If multiple cities start with 'K', we keep all of them.
Step-by-Step Solution
Step 1: Create the DataFrame
import pandas as pd
df = pd.DataFrame({
'Id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'City': ['Chennai', 'Delhi', 'Kolkata', 'Hyderabad', 'Pune', 'Mumbai', 'Haryana', 'Bengaluru', 'Kakinada', 'Kochin']
})
print("Original DataFrame:")
print(df)
Original DataFrame: Id City 0 1 Chennai 1 2 Delhi 2 3 Kolkata 3 4 Hyderabad 4 5 Pune 5 6 Mumbai 6 7 Haryana 7 8 Bengaluru 8 9 Kakinada 9 10 Kochin
Step 2: Extract First Characters
import pandas as pd
df = pd.DataFrame({
'Id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'City': ['Chennai', 'Delhi', 'Kolkata', 'Hyderabad', 'Pune', 'Mumbai', 'Haryana', 'Bengaluru', 'Kakinada', 'Kochin']
})
# Extract first characters of all cities
first_chars = []
for city in df['City']:
first_chars.append(city[0])
print("First characters:", first_chars)
First characters: ['C', 'D', 'K', 'H', 'P', 'M', 'H', 'B', 'K', 'K']
Step 3: Find Repeated First Characters
import pandas as pd
df = pd.DataFrame({
'Id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'City': ['Chennai', 'Delhi', 'Kolkata', 'Hyderabad', 'Pune', 'Mumbai', 'Haryana', 'Bengaluru', 'Kakinada', 'Kochin']
})
# Extract first characters
first_chars = []
for city in df['City']:
first_chars.append(city[0])
# Find characters that appear more than once
repeated_chars = []
for char in first_chars:
if first_chars.count(char) > 1:
if char not in repeated_chars:
repeated_chars.append(char)
print("Characters appearing more than once:", repeated_chars)
Characters appearing more than once: ['K', 'H']
Step 4: Filter Cities and Display Results
import pandas as pd
df = pd.DataFrame({
'Id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'City': ['Chennai', 'Delhi', 'Kolkata', 'Hyderabad', 'Pune', 'Mumbai', 'Haryana', 'Bengaluru', 'Kakinada', 'Kochin']
})
# Extract first characters
first_chars = []
for city in df['City']:
first_chars.append(city[0])
# Find repeated characters
repeated_chars = []
for char in first_chars:
if first_chars.count(char) > 1:
if char not in repeated_chars:
repeated_chars.append(char)
# Filter cities with repeated first characters
filtered_cities = []
for city in df['City']:
if city[0] in repeated_chars:
filtered_cities.append(city)
# Display final result
result = df[df['City'].isin(filtered_cities)]
print("Filtered DataFrame (cities without unique prefixes):")
print(result)
Filtered DataFrame (cities without unique prefixes): Id City 2 3 Kolkata 3 4 Hyderabad 6 7 Haryana 8 9 Kakinada 9 10 Kochin
Complete Solution
import pandas as pd
def filter_cities_by_prefix(df):
# Extract first characters of all cities
first_chars = [city[0] for city in df['City']]
# Find characters that appear more than once
repeated_chars = []
for char in first_chars:
if first_chars.count(char) > 1 and char not in repeated_chars:
repeated_chars.append(char)
# Filter cities with repeated first characters
filtered_cities = [city for city in df['City'] if city[0] in repeated_chars]
# Return filtered DataFrame
return df[df['City'].isin(filtered_cities)]
# Create sample DataFrame
df = pd.DataFrame({
'Id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'City': ['Chennai', 'Delhi', 'Kolkata', 'Hyderabad', 'Pune', 'Mumbai', 'Haryana', 'Bengaluru', 'Kakinada', 'Kochin']
})
result = filter_cities_by_prefix(df)
print(result)
Id City 2 3 Kolkata 3 4 Hyderabad 6 7 Haryana 8 9 Kakinada 9 10 Kochin
How It Works
The algorithm works by:
Extracting the first character of each city name
Counting occurrences of each first character
Identifying characters that appear more than once
Filtering cities whose first character is in the repeated characters list
Using
isin()to filter the original DataFrame
Conclusion
This approach effectively filters cities by removing those with unique prefixes. The solution uses basic Python loops and pandas isin() method to identify and retain only cities whose starting letters appear multiple times in the dataset.
