Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Python - How to select a subset of a Pandas DataFrame
A Pandas DataFrame is a two-dimensional data structure that allows you to select specific subsets of data. You can select single columns, multiple columns, or rows based on conditions using various methods.
Creating Sample Data
Let's create a sample DataFrame to demonstrate subset selection ?
import pandas as pd
# Create sample data
data = {
'Car': ['BMW', 'Lexus', 'Audi', 'Jaguar', 'Mustang'],
'Reg_Price': [2500, 3500, 2500, 2000, 2500],
'Units': [100, 80, 120, 70, 110]
}
dataFrame = pd.DataFrame(data)
print("Original DataFrame:")
print(dataFrame)
Original DataFrame:
Car Reg_Price Units
0 BMW 2500 100
1 Lexus 3500 80
2 Audi 2500 120
3 Jaguar 2000 70
4 Mustang 2500 110
Selecting a Single Column
Use square brackets with the column name to select one column ?
import pandas as pd
data = {
'Car': ['BMW', 'Lexus', 'Audi', 'Jaguar', 'Mustang'],
'Reg_Price': [2500, 3500, 2500, 2000, 2500],
'Units': [100, 80, 120, 70, 110]
}
dataFrame = pd.DataFrame(data)
# Select single column
car_column = dataFrame['Car']
print("Single column (Car):")
print(car_column)
Single column (Car): 0 BMW 1 Lexus 2 Audi 3 Jaguar 4 Mustang Name: Car, dtype: object
Selecting Multiple Columns
Pass a list of column names to select multiple columns ?
import pandas as pd
data = {
'Car': ['BMW', 'Lexus', 'Audi', 'Jaguar', 'Mustang'],
'Reg_Price': [2500, 3500, 2500, 2000, 2500],
'Units': [100, 80, 120, 70, 110]
}
dataFrame = pd.DataFrame(data)
# Select multiple columns
subset = dataFrame[['Car', 'Units']]
print("Multiple columns (Car and Units):")
print(subset)
Multiple columns (Car and Units):
Car Units
0 BMW 100
1 Lexus 80
2 Audi 120
3 Jaguar 70
4 Mustang 110
Selecting Rows by Index
Use iloc[] for position-based selection or loc[] for label-based selection ?
import pandas as pd
data = {
'Car': ['BMW', 'Lexus', 'Audi', 'Jaguar', 'Mustang'],
'Reg_Price': [2500, 3500, 2500, 2000, 2500],
'Units': [100, 80, 120, 70, 110]
}
dataFrame = pd.DataFrame(data)
# Select first 3 rows
first_three = dataFrame.iloc[0:3]
print("First 3 rows:")
print(first_three)
# Select specific rows and columns
subset = dataFrame.loc[1:3, ['Car', 'Reg_Price']]
print("\nRows 1-3, specific columns:")
print(subset)
First 3 rows:
Car Reg_Price Units
0 BMW 2500 100
1 Lexus 3500 80
2 Audi 2500 120
Rows 1-3, specific columns:
Car Reg_Price
1 Lexus 3500
2 Audi 2500
3 Jaguar 2000
Selection Methods Comparison
| Method | Use Case | Example |
|---|---|---|
df['col'] |
Single column | df['Car'] |
df[['col1', 'col2']] |
Multiple columns | df[['Car', 'Units']] |
df.iloc[rows, cols] |
Position-based | df.iloc[0:3, 1:3] |
df.loc[rows, cols] |
Label-based | df.loc[0:2, 'Car':'Units'] |
Conclusion
Use square brackets [] for simple column selection. Use iloc[] for position-based indexing and loc[] for label-based selection. These methods provide flexible ways to extract specific data subsets from your DataFrame.
Advertisements
