Drop Missing Data - Problem
Data Cleaning Challenge: Remove Incomplete Records
You are working as a data analyst for a university and have received a
Your task is to clean the dataset by removing all rows where the
DataFrame Schema:
Goal: Return a clean DataFrame containing only the rows with complete name information.
You are working as a data analyst for a university and have received a
students DataFrame with incomplete information. Some student records are missing their name values, which makes them unusable for analysis.Your task is to clean the dataset by removing all rows where the
name column contains missing values (NaN, None, or null).DataFrame Schema:
| Column Name | Type | Description |
|---|---|---|
| student_id | int | Unique identifier for each student |
| name | object | Student's full name (may contain missing values) |
| age | int | Student's age in years |
Goal: Return a clean DataFrame containing only the rows with complete name information.
Input & Output
example_1.py โ Basic Missing Values
$
Input:
students = pd.DataFrame({
'student_id': [1, 2, 3],
'name': ['Alice', None, 'Bob'],
'age': [20, 21, 19]
})
โบ
Output:
student_id name age
0 1 Alice 20
1 3 Bob 19
๐ก Note:
Row with student_id=2 is removed because the name column contains None (missing value). The remaining rows with valid names are kept and the index is reset.
example_2.py โ Multiple Missing Types
$
Input:
students = pd.DataFrame({
'student_id': [1, 2, 3, 4, 5],
'name': ['Alice', np.nan, 'Bob', None, 'Charlie'],
'age': [20, 21, 19, 22, 23]
})
โบ
Output:
student_id name age
0 1 Alice 20
1 3 Bob 19
2 5 Charlie 23
๐ก Note:
Rows with student_id=2 (np.nan) and student_id=4 (None) are removed as both represent missing values. Only rows with actual string names are retained.
example_3.py โ Edge Case: All Valid
$
Input:
students = pd.DataFrame({
'student_id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie'],
'age': [20, 21, 22]
})
โบ
Output:
student_id name age
0 1 Alice 20
1 2 Bob 21
2 3 Charlie 22
๐ก Note:
No rows are removed because all students have valid names. The DataFrame remains unchanged except for potentially resetting the index.
Time & Space Complexity
Time Complexity
O(n)
Single pass through the data using optimized vectorized operations
โ Linear Growth
Space Complexity
O(n)
Creates a new DataFrame with the filtered rows, but uses optimized memory management
โก Linearithmic Space
Constraints
- 1 โค number of rows โค 106
- student_id values are unique integers
- age values are positive integers between 1 and 150
- Missing values in name column can be None, NaN, or null
- Valid names are non-null strings (empty strings are considered valid in pandas)
๐ก
Explanation
AI Ready
๐ก Suggestion
Tab
to accept
Esc
to dismiss
// Output will appear here after running code