Drop Missing Data - Problem
Data Cleaning Challenge: Remove Incomplete Records

You are working as a data analyst for a university and have received a students DataFrame with incomplete information. Some student records are missing their name values, which makes them unusable for analysis.

Your task is to clean the dataset by removing all rows where the name column contains missing values (NaN, None, or null).

DataFrame Schema:
Column NameTypeDescription
student_idintUnique identifier for each student
nameobjectStudent's full name (may contain missing values)
ageintStudent's age in years

Goal: Return a clean DataFrame containing only the rows with complete name information.

Input & Output

example_1.py โ€” Basic Missing Values
$ Input: students = pd.DataFrame({ 'student_id': [1, 2, 3], 'name': ['Alice', None, 'Bob'], 'age': [20, 21, 19] })
โ€บ Output: student_id name age 0 1 Alice 20 1 3 Bob 19
๐Ÿ’ก Note: Row with student_id=2 is removed because the name column contains None (missing value). The remaining rows with valid names are kept and the index is reset.
example_2.py โ€” Multiple Missing Types
$ Input: students = pd.DataFrame({ 'student_id': [1, 2, 3, 4, 5], 'name': ['Alice', np.nan, 'Bob', None, 'Charlie'], 'age': [20, 21, 19, 22, 23] })
โ€บ Output: student_id name age 0 1 Alice 20 1 3 Bob 19 2 5 Charlie 23
๐Ÿ’ก Note: Rows with student_id=2 (np.nan) and student_id=4 (None) are removed as both represent missing values. Only rows with actual string names are retained.
example_3.py โ€” Edge Case: All Valid
$ Input: students = pd.DataFrame({ 'student_id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie'], 'age': [20, 21, 22] })
โ€บ Output: student_id name age 0 1 Alice 20 1 2 Bob 21 2 3 Charlie 22
๐Ÿ’ก Note: No rows are removed because all students have valid names. The DataFrame remains unchanged except for potentially resetting the index.

Time & Space Complexity

Time Complexity
โฑ๏ธ
O(n)

Single pass through the data using optimized vectorized operations

n
2n
โœ“ Linear Growth
Space Complexity
O(n)

Creates a new DataFrame with the filtered rows, but uses optimized memory management

n
2n
โšก Linearithmic Space

Constraints

  • 1 โ‰ค number of rows โ‰ค 106
  • student_id values are unique integers
  • age values are positive integers between 1 and 150
  • Missing values in name column can be None, NaN, or null
  • Valid names are non-null strings (empty strings are considered valid in pandas)
Asked in
25.0K Views
Medium Frequency
~15 min Avg. Time
850 Likes
Ln 1, Col 1
Smart Actions
๐Ÿ’ก Explanation
AI Ready
๐Ÿ’ก Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen