Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python - Filtering data with Pandas .query() method
The Pandas .query() method provides a powerful way to filter DataFrame rows using a string-based query expression. It offers a more readable alternative to boolean indexing for complex filtering conditions.
Basic Syntax
The .query() method accepts a query string and optional parameters ?
import pandas as pd
# Create sample dataset
data = {
'age': [25, 30, 35, 40, 45, 50, 55, 60, 65],
'salary': [30000, 45000, 55000, 65000, 70000, 80000, 85000, 90000, 95000],
'department': ['IT', 'HR', 'IT', 'Finance', 'HR', 'IT', 'Finance', 'HR', 'IT']
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Original DataFrame: age salary department 0 25 30000 IT 1 30 45000 HR 2 35 55000 IT 3 40 65000 Finance 4 45 70000 HR 5 50 80000 IT 6 55 85000 Finance 7 60 90000 HR 8 65 95000 IT
Single Condition Query
Filter rows based on a single condition using comparison operators ?
import pandas as pd
data = {
'age': [25, 30, 35, 40, 45, 50, 55, 60, 65],
'salary': [30000, 45000, 55000, 65000, 70000, 80000, 85000, 90000, 95000],
'department': ['IT', 'HR', 'IT', 'Finance', 'HR', 'IT', 'Finance', 'HR', 'IT']
}
df = pd.DataFrame(data)
# Filter employees with salary greater than 70000
result = df.query('salary > 70000')
print(result)
age salary department 5 50 80000 IT 6 55 85000 Finance 7 60 90000 HR 8 65 95000 IT
Multiple Conditions Query
Combine multiple conditions using and, or, and not operators ?
import pandas as pd
data = {
'age': [25, 30, 35, 40, 45, 50, 55, 60, 65],
'salary': [30000, 45000, 55000, 65000, 70000, 80000, 85000, 90000, 95000],
'department': ['IT', 'HR', 'IT', 'Finance', 'HR', 'IT', 'Finance', 'HR', 'IT']
}
df = pd.DataFrame(data)
# Filter IT department employees with age greater than 40
result = df.query('department == "IT" and age > 40')
print(result)
age salary department 5 50 80000 IT 8 65 95000 IT
Using Variables in Queries
Reference external variables using the @ symbol ?
import pandas as pd
data = {
'age': [25, 30, 35, 40, 45, 50, 55, 60, 65],
'salary': [30000, 45000, 55000, 65000, 70000, 80000, 85000, 90000, 95000],
'department': ['IT', 'HR', 'IT', 'Finance', 'HR', 'IT', 'Finance', 'HR', 'IT']
}
df = pd.DataFrame(data)
# Define threshold variables
min_age = 45
target_dept = 'HR'
# Use variables in query
result = df.query('age >= @min_age and department == @target_dept')
print(result)
age salary department 4 45 70000 HR 7 60 90000 HR
Inplace Parameter
Use inplace=True to modify the original DataFrame ?
import pandas as pd
data = {
'age': [25, 30, 35, 40, 45],
'salary': [30000, 45000, 55000, 65000, 70000]
}
df = pd.DataFrame(data)
print("Before query:")
print(df)
# Filter and modify original DataFrame
df.query('age > 30', inplace=True)
print("\nAfter query with inplace=True:")
print(df)
Before query: age salary 0 25 30000 1 30 45000 2 35 55000 3 40 65000 4 45 70000 After query with inplace=True: age salary 2 35 55000 3 40 65000 4 45 70000
Comparison with Boolean Indexing
| Method | Syntax | Readability | Performance |
|---|---|---|---|
| Boolean Indexing | df[(df['age'] > 30) & (df['dept'] == 'IT')] |
Complex for multiple conditions | Slightly faster |
| Query Method | df.query('age > 30 and dept == "IT"') |
More readable and intuitive | Slightly slower |
Conclusion
The .query() method provides an intuitive way to filter DataFrames using string expressions. It's particularly useful for complex conditions and when working with external variables, making data filtering more readable and maintainable.
