Python – Create a Subset of columns using filter()

To create a subset of columns in Pandas, we can use the filter() method. This allows us to filter columns with similar patterns using the like parameter, or select specific columns using indexing.

Creating a DataFrame

First, let's create a sample DataFrame with product information ?

import pandas as pd

dataFrame = pd.DataFrame({
    "Product": ["SmartTV", "ChromeCast", "Speaker", "Earphone"],
    "Opening_Stock": [300, 700, 1200, 1500],
    "Closing_Stock": [200, 500, 1000, 900]
})

print("DataFrame...")
print(dataFrame)
DataFrame...
   Closing_Stock  Opening_Stock    Product
0            200            300    SmartTV
1            500            700  ChromeCast
2           1000           1200    Speaker
3            900           1500   Earphone

Method 1: Using Indexing Operator

Select specific columns by passing a list of column names ?

import pandas as pd

dataFrame = pd.DataFrame({
    "Product": ["SmartTV", "ChromeCast", "Speaker", "Earphone"],
    "Opening_Stock": [300, 700, 1200, 1500],
    "Closing_Stock": [200, 500, 1000, 900]
})

# Single column subset
subset_single = dataFrame[['Product']]
print("Single column subset:")
print(subset_single)
Single column subset:
    Product
0   SmartTV
1  ChromeCast
2   Speaker
3   Earphone

Method 2: Multiple Columns Selection

Create a subset with multiple specific columns ?

import pandas as pd

dataFrame = pd.DataFrame({
    "Product": ["SmartTV", "ChromeCast", "Speaker", "Earphone"],
    "Opening_Stock": [300, 700, 1200, 1500],
    "Closing_Stock": [200, 500, 1000, 900]
})

# Multiple columns subset
subset_multiple = dataFrame[['Opening_Stock', 'Closing_Stock']]
print("Multiple columns subset:")
print(subset_multiple)
Multiple columns subset:
   Opening_Stock  Closing_Stock
0            300            200
1            700            500
2           1200           1000
3           1500            900

Method 3: Using filter() with Pattern Matching

Use filter(like=) to select columns with similar naming patterns ?

import pandas as pd

dataFrame = pd.DataFrame({
    "Product": ["SmartTV", "ChromeCast", "Speaker", "Earphone"],
    "Opening_Stock": [300, 700, 1200, 1500],
    "Closing_Stock": [200, 500, 1000, 900]
})

# Filter columns containing 'Stock'
subset_pattern = dataFrame.filter(like='Stock')
print("Columns with 'Stock' pattern:")
print(subset_pattern)

# Filter columns containing 'Open'
subset_open = dataFrame.filter(like='Open')
print("\nColumns with 'Open' pattern:")
print(subset_open)
Columns with 'Stock' pattern:
   Opening_Stock  Closing_Stock
0            300            200
1            700            500
2           1200           1000
3           1500            900

Columns with 'Open' pattern:
   Opening_Stock
0            300
1            700
2           1200
3           1500

Comparison

Method Syntax Best For
Indexing df[['col1', 'col2']] Specific known columns
filter(like=) df.filter(like='pattern') Pattern-based selection
filter(regex=) df.filter(regex='^pattern') Complex pattern matching

Conclusion

Use indexing for selecting specific columns by name. Use filter(like=) for pattern-based column selection when dealing with similarly named columns. The filter() method is particularly useful for datasets with many columns following naming conventions.

Updated on: 2026-03-26T01:53:46+05:30

348 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements