Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Python Pandas - Select a subset of rows and columns combined
To select a subset of rows and columns in Pandas, use the loc indexer. The loc method allows you to filter rows based on conditions and simultaneously select specific columns using boolean indexing.
Creating Sample Data
Let's create a DataFrame with car sales data to demonstrate the concept ?
import pandas as pd
# Create sample data
data = {
'Car': ['BMW', 'Lexus', 'Audi', 'Jaguar', 'Mustang'],
'Reg_Price': [2500, 3500, 2500, 2000, 2500],
'Units': [100, 80, 120, 70, 110]
}
dataFrame = pd.DataFrame(data)
print("Original DataFrame:")
print(dataFrame)
Original DataFrame:
Car Reg_Price Units
0 BMW 2500 100
1 Lexus 3500 80
2 Audi 2500 120
3 Jaguar 2000 70
4 Mustang 2500 110
Selecting Rows with Conditions
First, let's filter rows where Units are greater than 100 ?
import pandas as pd
data = {
'Car': ['BMW', 'Lexus', 'Audi', 'Jaguar', 'Mustang'],
'Reg_Price': [2500, 3500, 2500, 2000, 2500],
'Units': [100, 80, 120, 70, 110]
}
dataFrame = pd.DataFrame(data)
# Select cars with Units more than 100
filtered_rows = dataFrame[dataFrame["Units"] > 100]
print("Cars with Units more than 100:")
print(filtered_rows)
Cars with Units more than 100:
Car Reg_Price Units
2 Audi 2500 120
4 Mustang 2500 110
Selecting Specific Columns
You can select specific columns by passing a list of column names ?
import pandas as pd
data = {
'Car': ['BMW', 'Lexus', 'Audi', 'Jaguar', 'Mustang'],
'Reg_Price': [2500, 3500, 2500, 2000, 2500],
'Units': [100, 80, 120, 70, 110]
}
dataFrame = pd.DataFrame(data)
# Display only two columns
selected_columns = dataFrame[['Reg_Price', 'Units']]
print("Displaying only Reg_Price and Units columns:")
print(selected_columns)
Displaying only Reg_Price and Units columns: Reg_Price Units 0 2500 100 1 3500 80 2 2500 120 3 2000 70 4 2500 110
Combining Row and Column Selection with loc
The loc method allows you to filter rows and select columns simultaneously ?
import pandas as pd
data = {
'Car': ['BMW', 'Lexus', 'Audi', 'Jaguar', 'Mustang'],
'Reg_Price': [2500, 3500, 2500, 2000, 2500],
'Units': [100, 80, 120, 70, 110]
}
dataFrame = pd.DataFrame(data)
# Select cars where Units > 100 and show only Car column
subset = dataFrame.loc[dataFrame["Units"] > 100, "Car"]
print("Car names with Units > 100:")
print(subset)
# Select multiple columns for filtered rows
subset_multiple = dataFrame.loc[dataFrame["Units"] > 100, ["Car", "Reg_Price"]]
print("\nCar and Price for Units > 100:")
print(subset_multiple)
Car names with Units > 100:
2 Audi
4 Mustang
Name: Car, dtype: object
Car and Price for Units > 100:
Car Reg_Price
2 Audi 2500
4 Mustang 2500
Syntax Summary
The loc method follows this syntax pattern ?
# Single column selection dataFrame.loc[condition, 'column_name'] # Multiple columns selection dataFrame.loc[condition, ['col1', 'col2']] # All columns for filtered rows dataFrame.loc[condition, :]
Conclusion
Use loc to combine row filtering and column selection in one operation. This method provides precise control over which data subsets to extract from your DataFrame.
Advertisements
