Write a program in Python to filter City column elements by removing the unique prefix in a given dataframe

When working with pandas DataFrames, you might need to filter cities that share the same starting letter with other cities. This tutorial shows how to remove cities with unique prefixes (first letters) and keep only those cities whose first letter appears in multiple city names.

Understanding the Problem

Given a DataFrame with city names, we want to filter out cities that have unique starting letters. For example, if only one city starts with 'C', we exclude it. If multiple cities start with 'K', we keep all of them.

Step-by-Step Solution

Step 1: Create the DataFrame

import pandas as pd

df = pd.DataFrame({
    'Id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'City': ['Chennai', 'Delhi', 'Kolkata', 'Hyderabad', 'Pune', 'Mumbai', 'Haryana', 'Bengaluru', 'Kakinada', 'Kochin']
})

print("Original DataFrame:")
print(df)
Original DataFrame:
   Id      City
0   1   Chennai
1   2     Delhi
2   3   Kolkata
3   4  Hyderabad
4   5      Pune
5   6    Mumbai
6   7   Haryana
7   8  Bengaluru
8   9  Kakinada
9  10    Kochin

Step 2: Extract First Characters

import pandas as pd

df = pd.DataFrame({
    'Id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'City': ['Chennai', 'Delhi', 'Kolkata', 'Hyderabad', 'Pune', 'Mumbai', 'Haryana', 'Bengaluru', 'Kakinada', 'Kochin']
})

# Extract first characters of all cities
first_chars = []
for city in df['City']:
    first_chars.append(city[0])

print("First characters:", first_chars)
First characters: ['C', 'D', 'K', 'H', 'P', 'M', 'H', 'B', 'K', 'K']

Step 3: Find Repeated First Characters

import pandas as pd

df = pd.DataFrame({
    'Id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'City': ['Chennai', 'Delhi', 'Kolkata', 'Hyderabad', 'Pune', 'Mumbai', 'Haryana', 'Bengaluru', 'Kakinada', 'Kochin']
})

# Extract first characters
first_chars = []
for city in df['City']:
    first_chars.append(city[0])

# Find characters that appear more than once
repeated_chars = []
for char in first_chars:
    if first_chars.count(char) > 1:
        if char not in repeated_chars:
            repeated_chars.append(char)

print("Characters appearing more than once:", repeated_chars)
Characters appearing more than once: ['K', 'H']

Step 4: Filter Cities and Display Results

import pandas as pd

df = pd.DataFrame({
    'Id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'City': ['Chennai', 'Delhi', 'Kolkata', 'Hyderabad', 'Pune', 'Mumbai', 'Haryana', 'Bengaluru', 'Kakinada', 'Kochin']
})

# Extract first characters
first_chars = []
for city in df['City']:
    first_chars.append(city[0])

# Find repeated characters
repeated_chars = []
for char in first_chars:
    if first_chars.count(char) > 1:
        if char not in repeated_chars:
            repeated_chars.append(char)

# Filter cities with repeated first characters
filtered_cities = []
for city in df['City']:
    if city[0] in repeated_chars:
        filtered_cities.append(city)

# Display final result
result = df[df['City'].isin(filtered_cities)]
print("Filtered DataFrame (cities without unique prefixes):")
print(result)
Filtered DataFrame (cities without unique prefixes):
   Id      City
2   3   Kolkata
3   4  Hyderabad
6   7   Haryana
8   9  Kakinada
9  10    Kochin

Complete Solution

import pandas as pd

def filter_cities_by_prefix(df):
    # Extract first characters of all cities
    first_chars = [city[0] for city in df['City']]
    
    # Find characters that appear more than once
    repeated_chars = []
    for char in first_chars:
        if first_chars.count(char) > 1 and char not in repeated_chars:
            repeated_chars.append(char)
    
    # Filter cities with repeated first characters
    filtered_cities = [city for city in df['City'] if city[0] in repeated_chars]
    
    # Return filtered DataFrame
    return df[df['City'].isin(filtered_cities)]

# Create sample DataFrame
df = pd.DataFrame({
    'Id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'City': ['Chennai', 'Delhi', 'Kolkata', 'Hyderabad', 'Pune', 'Mumbai', 'Haryana', 'Bengaluru', 'Kakinada', 'Kochin']
})

result = filter_cities_by_prefix(df)
print(result)
   Id      City
2   3   Kolkata
3   4  Hyderabad
6   7   Haryana
8   9  Kakinada
9  10    Kochin

How It Works

The algorithm works by:

  • Extracting the first character of each city name

  • Counting occurrences of each first character

  • Identifying characters that appear more than once

  • Filtering cities whose first character is in the repeated characters list

  • Using isin() to filter the original DataFrame

Conclusion

This approach effectively filters cities by removing those with unique prefixes. The solution uses basic Python loops and pandas isin() method to identify and retain only cities whose starting letters appear multiple times in the dataset.

---
Updated on: 2026-03-25T16:33:51+05:30

367 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements