Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Join two text columns into a single column in Pandas
Combining text columns is a common data manipulation task in Pandas. When working with datasets containing multiple text fields like first name and last name, or address components, you'll often need to merge them into a single column for analysis or presentation.
Basic Syntax
The simplest way to join two text columns is using the + operator ?
# Basic concatenation df['new_column'] = df['column1'] + df['column2'] # With separator df['new_column'] = df['column1'] + ' ' + df['column2']
Method 1: Using the + Operator
The + operator provides direct string concatenation. You can add separators like spaces or other characters between columns ?
import pandas as pd
# Create sample data
data = {'Name': ['John', 'Jane', 'Alice'],
'Surname': ['Doe', 'Smith', 'Johnson']}
df = pd.DataFrame(data)
# Join columns with space separator
df['Full Name'] = df['Name'] + ' ' + df['Surname']
print(df)
Name Surname Full Name 0 John Doe John Doe 1 Jane Smith Jane Smith 2 Alice Johnson Alice Johnson
Method 2: Using str.cat() Method
The str.cat() method offers more control and handles missing values better. It's specifically designed for string concatenation ?
import pandas as pd
# Create sample data
data = {'Name': ['John', 'Jane', 'Alice'],
'Surname': ['Doe', 'Smith', 'Johnson']}
df = pd.DataFrame(data)
# Join using str.cat() with separator
df['Full Name'] = df['Name'].str.cat(df['Surname'], sep=' ')
print(df)
Name Surname Full Name 0 John Doe John Doe 1 Jane Smith Jane Smith 2 Alice Johnson Alice Johnson
Handling Missing Values
The str.cat() method handles NaN values more gracefully than the + operator ?
import pandas as pd
import numpy as np
# Data with missing values
data = {'First': ['John', 'Jane', None, 'Bob'],
'Last': ['Doe', None, 'Johnson', 'Brown']}
df = pd.DataFrame(data)
# Using + operator (creates NaN)
df['Full_Plus'] = df['First'] + ' ' + df['Last']
# Using str.cat() (skips NaN by default)
df['Full_Cat'] = df['First'].str.cat(df['Last'], sep=' ', na_rep='')
print(df)
First Last Full_Plus Full_Cat 0 John Doe John Doe John Doe 1 Jane None None Jane 2 None Johnson None Johnson 3 Bob Brown Bob Brown Bob Brown
Joining Multiple Columns
You can concatenate more than two columns using either method ?
import pandas as pd
# Create sample data with multiple columns
data = {'Title': ['Mr.', 'Ms.', 'Dr.'],
'First': ['John', 'Jane', 'Alice'],
'Last': ['Doe', 'Smith', 'Johnson']}
df = pd.DataFrame(data)
# Method 1: Using + operator
df['Full_Name_Plus'] = df['Title'] + ' ' + df['First'] + ' ' + df['Last']
# Method 2: Using str.cat() with list
df['Full_Name_Cat'] = df['Title'].str.cat([df['First'], df['Last']], sep=' ')
print(df)
Title First Last Full_Name_Plus Full_Name_Cat 0 Mr. John Doe Mr. John Doe Mr. John Doe 1 Ms. Jane Smith Ms. Jane Smith Ms. Jane Smith 2 Dr. Alice Johnson Dr. Alice Johnson Dr. Alice Johnson
Comparison
| Method | Syntax | NaN Handling | Best For |
|---|---|---|---|
+ operator |
Simple | Creates NaN | Clean data, simple concatenation |
str.cat() |
More options | Flexible control | Missing values, complex joining |
Conclusion
Use the + operator for simple text concatenation when data is clean. Choose str.cat() when you need better control over separators and missing value handling for more robust data processing.
