Creating a Pandas dataframe column based on a given condition in Python

Pandas DataFrames allow you to add new columns based on conditions applied to existing data. This is useful for categorizing, labeling, or transforming data based on specific criteria.

Creating the Base DataFrame

Let's start with a simple exam schedule DataFrame ?

import pandas as pd

# Lists for Exam subjects and Days
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri']
subjects = ['Chemistry', 'Physics', 'Maths', 'English', 'Biology']

# Dictionary for Exam Schedule
exam_data = {'Exam Day': days,
             'Exam Subject': subjects}

# Dictionary to DataFrame
exam_df = pd.DataFrame(exam_data)
print(exam_df)
  Exam Day Exam Subject
0      Mon    Chemistry
1      Tue      Physics
2      Wed        Maths
3      Thu      English
4      Fri      Biology

Method 1: Using List Comprehension

Add a new column using conditional logic with list comprehension ?

import pandas as pd

days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri']
subjects = ['Chemistry', 'Physics', 'Maths', 'English', 'Biology']

exam_data = {'Exam Day': days, 'Exam Subject': subjects}
exam_df = pd.DataFrame(exam_data)

# Add Time column based on condition
exam_df['Time'] = ['2 PM' if day in ('Mon', 'Thu') else '10 AM' 
                   for day in exam_df['Exam Day']]
print(exam_df)
  Exam Day Exam Subject   Time
0      Mon    Chemistry   2 PM
1      Tue      Physics  10 AM
2      Wed        Maths  10 AM
3      Thu      English   2 PM
4      Fri      Biology  10 AM

Method 2: Using np.where()

NumPy's where() function provides a more readable approach for simple conditions ?

import pandas as pd
import numpy as np

days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri']
subjects = ['Chemistry', 'Physics', 'Maths', 'English', 'Biology']

exam_data = {'Exam Day': days, 'Exam Subject': subjects}
exam_df = pd.DataFrame(exam_data)

# Using np.where for condition
exam_df['Time'] = np.where(exam_df['Exam Day'].isin(['Mon', 'Thu']), 
                          '2 PM', '10 AM')
print(exam_df)
  Exam Day Exam Subject   Time
0      Mon    Chemistry   2 PM
1      Tue      Physics  10 AM
2      Wed        Maths  10 AM
3      Thu      English   2 PM
4      Fri      Biology  10 AM

Method 3: Using loc for Multiple Conditions

For more complex conditions, use loc to set values based on multiple criteria ?

import pandas as pd

days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri']
subjects = ['Chemistry', 'Physics', 'Maths', 'English', 'Biology']

exam_data = {'Exam Day': days, 'Exam Subject': subjects}
exam_df = pd.DataFrame(exam_data)

# Initialize new column
exam_df['Difficulty'] = 'Medium'

# Set conditions using loc
exam_df.loc[exam_df['Exam Subject'].isin(['Chemistry', 'Physics']), 'Difficulty'] = 'Hard'
exam_df.loc[exam_df['Exam Subject'] == 'English', 'Difficulty'] = 'Easy'

print(exam_df)
  Exam Day Exam Subject Difficulty
0      Mon    Chemistry       Hard
1      Tue      Physics       Hard
2      Wed        Maths     Medium
3      Thu      English       Easy
4      Fri      Biology     Medium

Comparison

Method Best For Readability Performance
List Comprehension Simple conditions Good Fast
np.where() Binary conditions Very Good Very Fast
loc Complex/Multiple conditions Excellent Good

Conclusion

Use np.where() for simple binary conditions, list comprehension for basic logic, and loc for complex multi-condition scenarios. Choose based on readability and performance needs.

Updated on: 2026-03-15T18:21:58+05:30

368 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements