Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Python – Create a Subset of columns using filter()
To create a subset of columns in Pandas, we can use the filter() method. This allows us to filter columns with similar patterns using the like parameter, or select specific columns using indexing.
Creating a DataFrame
First, let's create a sample DataFrame with product information ?
import pandas as pd
dataFrame = pd.DataFrame({
"Product": ["SmartTV", "ChromeCast", "Speaker", "Earphone"],
"Opening_Stock": [300, 700, 1200, 1500],
"Closing_Stock": [200, 500, 1000, 900]
})
print("DataFrame...")
print(dataFrame)
DataFrame... Closing_Stock Opening_Stock Product 0 200 300 SmartTV 1 500 700 ChromeCast 2 1000 1200 Speaker 3 900 1500 Earphone
Method 1: Using Indexing Operator
Select specific columns by passing a list of column names ?
import pandas as pd
dataFrame = pd.DataFrame({
"Product": ["SmartTV", "ChromeCast", "Speaker", "Earphone"],
"Opening_Stock": [300, 700, 1200, 1500],
"Closing_Stock": [200, 500, 1000, 900]
})
# Single column subset
subset_single = dataFrame[['Product']]
print("Single column subset:")
print(subset_single)
Single column subset:
Product
0 SmartTV
1 ChromeCast
2 Speaker
3 Earphone
Method 2: Multiple Columns Selection
Create a subset with multiple specific columns ?
import pandas as pd
dataFrame = pd.DataFrame({
"Product": ["SmartTV", "ChromeCast", "Speaker", "Earphone"],
"Opening_Stock": [300, 700, 1200, 1500],
"Closing_Stock": [200, 500, 1000, 900]
})
# Multiple columns subset
subset_multiple = dataFrame[['Opening_Stock', 'Closing_Stock']]
print("Multiple columns subset:")
print(subset_multiple)
Multiple columns subset: Opening_Stock Closing_Stock 0 300 200 1 700 500 2 1200 1000 3 1500 900
Method 3: Using filter() with Pattern Matching
Use filter(like=) to select columns with similar naming patterns ?
import pandas as pd
dataFrame = pd.DataFrame({
"Product": ["SmartTV", "ChromeCast", "Speaker", "Earphone"],
"Opening_Stock": [300, 700, 1200, 1500],
"Closing_Stock": [200, 500, 1000, 900]
})
# Filter columns containing 'Stock'
subset_pattern = dataFrame.filter(like='Stock')
print("Columns with 'Stock' pattern:")
print(subset_pattern)
# Filter columns containing 'Open'
subset_open = dataFrame.filter(like='Open')
print("\nColumns with 'Open' pattern:")
print(subset_open)
Columns with 'Stock' pattern: Opening_Stock Closing_Stock 0 300 200 1 700 500 2 1200 1000 3 1500 900 Columns with 'Open' pattern: Opening_Stock 0 300 1 700 2 1200 3 1500
Comparison
| Method | Syntax | Best For |
|---|---|---|
| Indexing | df[['col1', 'col2']] |
Specific known columns |
| filter(like=) | df.filter(like='pattern') |
Pattern-based selection |
| filter(regex=) | df.filter(regex='^pattern') |
Complex pattern matching |
Conclusion
Use indexing for selecting specific columns by name. Use filter(like=) for pattern-based column selection when dealing with similarly named columns. The filter() method is particularly useful for datasets with many columns following naming conventions.
Advertisements
