Python - Filtering data with Pandas .query() method

The Pandas .query() method provides a powerful way to filter DataFrame rows using a string-based query expression. It offers a more readable alternative to boolean indexing for complex filtering conditions.

Basic Syntax

The .query() method accepts a query string and optional parameters ?

import pandas as pd

# Create sample dataset
data = {
    'age': [25, 30, 35, 40, 45, 50, 55, 60, 65],
    'salary': [30000, 45000, 55000, 65000, 70000, 80000, 85000, 90000, 95000],
    'department': ['IT', 'HR', 'IT', 'Finance', 'HR', 'IT', 'Finance', 'HR', 'IT']
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Original DataFrame:
   age  salary department
0   25   30000         IT
1   30   45000         HR
2   35   55000         IT
3   40   65000    Finance
4   45   70000         HR
5   50   80000         IT
6   55   85000    Finance
7   60   90000         HR
8   65   95000         IT

Single Condition Query

Filter rows based on a single condition using comparison operators ?

import pandas as pd

data = {
    'age': [25, 30, 35, 40, 45, 50, 55, 60, 65],
    'salary': [30000, 45000, 55000, 65000, 70000, 80000, 85000, 90000, 95000],
    'department': ['IT', 'HR', 'IT', 'Finance', 'HR', 'IT', 'Finance', 'HR', 'IT']
}
df = pd.DataFrame(data)

# Filter employees with salary greater than 70000
result = df.query('salary > 70000')
print(result)
   age  salary department
5   50   80000         IT
6   55   85000    Finance
7   60   90000         HR
8   65   95000         IT

Multiple Conditions Query

Combine multiple conditions using and, or, and not operators ?

import pandas as pd

data = {
    'age': [25, 30, 35, 40, 45, 50, 55, 60, 65],
    'salary': [30000, 45000, 55000, 65000, 70000, 80000, 85000, 90000, 95000],
    'department': ['IT', 'HR', 'IT', 'Finance', 'HR', 'IT', 'Finance', 'HR', 'IT']
}
df = pd.DataFrame(data)

# Filter IT department employees with age greater than 40
result = df.query('department == "IT" and age > 40')
print(result)
   age  salary department
5   50   80000         IT
8   65   95000         IT

Using Variables in Queries

Reference external variables using the @ symbol ?

import pandas as pd

data = {
    'age': [25, 30, 35, 40, 45, 50, 55, 60, 65],
    'salary': [30000, 45000, 55000, 65000, 70000, 80000, 85000, 90000, 95000],
    'department': ['IT', 'HR', 'IT', 'Finance', 'HR', 'IT', 'Finance', 'HR', 'IT']
}
df = pd.DataFrame(data)

# Define threshold variables
min_age = 45
target_dept = 'HR'

# Use variables in query
result = df.query('age >= @min_age and department == @target_dept')
print(result)
   age  salary department
4   45   70000         HR
7   60   90000         HR

Inplace Parameter

Use inplace=True to modify the original DataFrame ?

import pandas as pd

data = {
    'age': [25, 30, 35, 40, 45],
    'salary': [30000, 45000, 55000, 65000, 70000]
}
df = pd.DataFrame(data)

print("Before query:")
print(df)

# Filter and modify original DataFrame
df.query('age > 30', inplace=True)

print("\nAfter query with inplace=True:")
print(df)
Before query:
   age  salary
0   25   30000
1   30   45000
2   35   55000
3   40   65000
4   45   70000

After query with inplace=True:
   age  salary
2   35   55000
3   40   65000
4   45   70000

Comparison with Boolean Indexing

Method Syntax Readability Performance
Boolean Indexing df[(df['age'] > 30) & (df['dept'] == 'IT')] Complex for multiple conditions Slightly faster
Query Method df.query('age > 30 and dept == "IT"') More readable and intuitive Slightly slower

Conclusion

The .query() method provides an intuitive way to filter DataFrames using string expressions. It's particularly useful for complex conditions and when working with external variables, making data filtering more readable and maintainable.

Updated on: 2026-03-15T18:20:40+05:30

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements