Reshape Data: Concatenate - Problem
DataFrame Concatenation Challenge
You're working as a data analyst and need to combine student records from two separate DataFrames into one comprehensive dataset. Both DataFrames contain the same structure with student information including
Goal: Concatenate two DataFrames vertically (stack them on top of each other) to create a single unified DataFrame containing all student records.
Input: Two DataFrames with identical column structures
Output: One DataFrame containing all rows from both input DataFrames
This is a fundamental data manipulation operation used frequently in data preprocessing and ETL (Extract, Transform, Load) processes.
You're working as a data analyst and need to combine student records from two separate DataFrames into one comprehensive dataset. Both DataFrames contain the same structure with student information including
student_id, name, and age.Goal: Concatenate two DataFrames vertically (stack them on top of each other) to create a single unified DataFrame containing all student records.
Input: Two DataFrames with identical column structures
Output: One DataFrame containing all rows from both input DataFrames
This is a fundamental data manipulation operation used frequently in data preprocessing and ETL (Extract, Transform, Load) processes.
Input & Output
basic_concatenation.py โ Python
$
Input:
df1 = pd.DataFrame({'student_id': [1, 2], 'name': ['Alice', 'Bob'], 'age': [20, 21]})
df2 = pd.DataFrame({'student_id': [3, 4], 'name': ['Charlie', 'Diana'], 'age': [22, 19]})
โบ
Output:
student_id name age
0 1 Alice 20
1 2 Bob 21
2 3 Charlie 22
3 4 Diana 19
๐ก Note:
The two DataFrames are stacked vertically, with the result having a new sequential index from 0 to 3
single_row_dataframes.py โ Python
$
Input:
df1 = pd.DataFrame({'student_id': [100], 'name': ['Eve'], 'age': [25]})
df2 = pd.DataFrame({'student_id': [200], 'name': ['Frank'], 'age': [23]})
โบ
Output:
student_id name age
0 100 Eve 25
1 200 Frank 23
๐ก Note:
Even with single-row DataFrames, concatenation works the same way, creating a two-row result
empty_dataframe_edge_case.py โ Python
$
Input:
df1 = pd.DataFrame({'student_id': [1, 2], 'name': ['Alice', 'Bob'], 'age': [20, 21]})
df2 = pd.DataFrame(columns=['student_id', 'name', 'age'])
โบ
Output:
student_id name age
0 1 Alice 20.0
1 2 Bob 21.0
๐ก Note:
When one DataFrame is empty, the result is essentially a copy of the non-empty DataFrame with potential data type changes
Constraints
- Both DataFrames have identical column structure
- DataFrames can have 0 to 106 rows each
- Column names must match exactly
- Data types should be compatible for proper concatenation
Visualization
Tap to expand
Understanding the Visualization
1
Identify Source DataFrames
Two separate DataFrames with identical column structure
2
Apply concat() Function
Pandas efficiently combines the data with optimal memory management
3
Index Management
New sequential index is created (0, 1, 2, 3, ...) when ignore_index=True
4
Return Unified DataFrame
Single DataFrame containing all rows from both inputs
Key Takeaway
๐ฏ Key Insight: Pandas concat() function provides optimal O(n) performance by efficiently managing memory allocation and data copying, making it the preferred approach for combining DataFrames in production environments.
๐ก
Explanation
AI Ready
๐ก Suggestion
Tab
to accept
Esc
to dismiss
// Output will appear here after running code