Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Creating a Pandas dataframe column based on a given condition in Python
Pandas DataFrames allow you to add new columns based on conditions applied to existing data. This is useful for categorizing, labeling, or transforming data based on specific criteria.
Creating the Base DataFrame
Let's start with a simple exam schedule DataFrame ?
import pandas as pd
# Lists for Exam subjects and Days
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri']
subjects = ['Chemistry', 'Physics', 'Maths', 'English', 'Biology']
# Dictionary for Exam Schedule
exam_data = {'Exam Day': days,
'Exam Subject': subjects}
# Dictionary to DataFrame
exam_df = pd.DataFrame(exam_data)
print(exam_df)
Exam Day Exam Subject 0 Mon Chemistry 1 Tue Physics 2 Wed Maths 3 Thu English 4 Fri Biology
Method 1: Using List Comprehension
Add a new column using conditional logic with list comprehension ?
import pandas as pd
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri']
subjects = ['Chemistry', 'Physics', 'Maths', 'English', 'Biology']
exam_data = {'Exam Day': days, 'Exam Subject': subjects}
exam_df = pd.DataFrame(exam_data)
# Add Time column based on condition
exam_df['Time'] = ['2 PM' if day in ('Mon', 'Thu') else '10 AM'
for day in exam_df['Exam Day']]
print(exam_df)
Exam Day Exam Subject Time 0 Mon Chemistry 2 PM 1 Tue Physics 10 AM 2 Wed Maths 10 AM 3 Thu English 2 PM 4 Fri Biology 10 AM
Method 2: Using np.where()
NumPy's where() function provides a more readable approach for simple conditions ?
import pandas as pd
import numpy as np
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri']
subjects = ['Chemistry', 'Physics', 'Maths', 'English', 'Biology']
exam_data = {'Exam Day': days, 'Exam Subject': subjects}
exam_df = pd.DataFrame(exam_data)
# Using np.where for condition
exam_df['Time'] = np.where(exam_df['Exam Day'].isin(['Mon', 'Thu']),
'2 PM', '10 AM')
print(exam_df)
Exam Day Exam Subject Time 0 Mon Chemistry 2 PM 1 Tue Physics 10 AM 2 Wed Maths 10 AM 3 Thu English 2 PM 4 Fri Biology 10 AM
Method 3: Using loc for Multiple Conditions
For more complex conditions, use loc to set values based on multiple criteria ?
import pandas as pd
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri']
subjects = ['Chemistry', 'Physics', 'Maths', 'English', 'Biology']
exam_data = {'Exam Day': days, 'Exam Subject': subjects}
exam_df = pd.DataFrame(exam_data)
# Initialize new column
exam_df['Difficulty'] = 'Medium'
# Set conditions using loc
exam_df.loc[exam_df['Exam Subject'].isin(['Chemistry', 'Physics']), 'Difficulty'] = 'Hard'
exam_df.loc[exam_df['Exam Subject'] == 'English', 'Difficulty'] = 'Easy'
print(exam_df)
Exam Day Exam Subject Difficulty 0 Mon Chemistry Hard 1 Tue Physics Hard 2 Wed Maths Medium 3 Thu English Easy 4 Fri Biology Medium
Comparison
| Method | Best For | Readability | Performance |
|---|---|---|---|
| List Comprehension | Simple conditions | Good | Fast |
| np.where() | Binary conditions | Very Good | Very Fast |
| loc | Complex/Multiple conditions | Excellent | Good |
Conclusion
Use np.where() for simple binary conditions, list comprehension for basic logic, and loc for complex multi-condition scenarios. Choose based on readability and performance needs.
Advertisements
