Drop Missing Data - Problem

Database Easy

You are given a DataFrame called students with the following schema:

Column Name	Type
student_id	int
name	object
age	int

Some rows in the DataFrame have missing values in the name column (represented as null or NaN).

Write a solution to remove all rows that contain missing values in the name column and return the cleaned DataFrame.

Input & Output

Example 1 — Basic Missing Data

$ Input: students = [{"student_id": 1, "name": "Alice", "age": 20}, {"student_id": 2, "name": null, "age": 21}]

› Output: [{"student_id": 1, "name": "Alice", "age": 20}]

💡 Note: Row with student_id=2 has missing name (null), so it's removed. Only Alice's record remains.

Example 2 — Multiple Missing Names

$ Input: students = [{"student_id": 1, "name": "Bob", "age": 22}, {"student_id": 2, "name": null, "age": 23}, {"student_id": 3, "name": "Charlie", "age": 24}]

› Output: [{"student_id": 1, "name": "Bob", "age": 22}, {"student_id": 3, "name": "Charlie", "age": 24}]

💡 Note: Student with ID 2 has missing name, so removed. Bob and Charlie remain.

Example 3 — No Missing Data

$ Input: students = [{"student_id": 1, "name": "David", "age": 25}, {"student_id": 2, "name": "Eve", "age": 26}]

› Output: [{"student_id": 1, "name": "David", "age": 25}, {"student_id": 2, "name": "Eve", "age": 26}]

💡 Note: All students have valid names, so no rows are removed.

Constraints

1 ≤ students.length ≤ 1000
Each row contains student_id (int), name (string or null), age (int)
Missing values in name column are represented as null

Visualization

Tap to expand

Asked in

M Meta 15 N Netflix 12

The key insight is to use pandas' built-in dropna() method with the subset parameter to efficiently remove rows with missing values in the name column. Best approach is the optimized method using students.dropna(subset=['name']). Time: O(n), Space: O(n)

Common Approaches

✓ Manual Row-by-Row Filtering

⏱️ Time: O(n) Space: O(n)

Loop through the DataFrame row by row, check if the name column has a missing value, and keep only rows where name is not null. This approach manually implements the filtering logic.

Pandas dropna() Method

⏱️ Time: O(n) Space: O(n)

Leverage pandas' optimized dropna() method to remove rows with missing values in the name column. This is the standard and most efficient approach for handling missing data in pandas DataFrames.

Manual Row-by-Row Filtering — Algorithm Steps

Iterate through each row in the DataFrame
Check if the name value is null/NaN
Keep only rows where name is not missing

Visualization

Tap to expand

Step-by-Step Walkthrough

Check Each Row

Iterate through DataFrame rows

Filter Missing

Keep rows where name is not null

Clean Result

Return filtered DataFrame

Code -

solution.c — C

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>

typedef struct {
    int student_id;
    char name[100];
    int age;
    bool hasName;
} Student;

int parseStudents(const char* input, Student* students) {
    int count = 0;
    const char* ptr = input;
    
    while (*ptr) {
        // Find start of object
        while (*ptr && *ptr != '{') ptr++;
        if (!*ptr) break;
        ptr++; // Skip '{'
        
        Student* student = &students[count];
        student->hasName = true;
        
        // Parse the object
        while (*ptr && *ptr != '}') {
            // Skip whitespace and quotes
            while (*ptr && (*ptr == ' ' || *ptr == '"' || *ptr == ',')) ptr++;
            
            if (strncmp(ptr, "student_id", 10) == 0) {
                ptr += 10;
                while (*ptr && *ptr != ':') ptr++;
                ptr++; // Skip ':'
                while (*ptr && *ptr == ' ') ptr++;
                student->student_id = strtol(ptr, (char**)&ptr, 10);
            }
            else if (strncmp(ptr, "name", 4) == 0) {
                ptr += 4;
                while (*ptr && *ptr != ':') ptr++;
                ptr++; // Skip ':'
                while (*ptr && *ptr == ' ') ptr++;
                
                if (strncmp(ptr, "null", 4) == 0) {
                    student->hasName = false;
                    strcpy(student->name, "");
                    ptr += 4;
                } else {
                    if (*ptr == '"') ptr++; // Skip opening quote
                    int i = 0;
                    while (*ptr && *ptr != '"' && *ptr != ',' && *ptr != '}' && i < 99) {
                        student->name[i++] = *ptr++;
                    }
                    student->name[i] = '\0';
                    if (*ptr == '"') ptr++; // Skip closing quote
                }
            }
            else if (strncmp(ptr, "age", 3) == 0) {
                ptr += 3;
                while (*ptr && *ptr != ':') ptr++;
                ptr++; // Skip ':'
                while (*ptr && *ptr == ' ') ptr++;
                student->age = strtol(ptr, (char**)&ptr, 10);
            }
            else {
                ptr++;
            }
        }
        
        if (*ptr == '}') ptr++;
        count++;
    }
    
    return count;
}

int solution(Student* students, int count, Student* filtered) {
    int filteredCount = 0;
    for (int i = 0; i < count; i++) {
        if (students[i].hasName) {
            filtered[filteredCount++] = students[i];
        }
    }
    return filteredCount;
}

void printStudents(Student* students, int count) {
    printf("[");
    for (int i = 0; i < count; i++) {
        if (i > 0) printf(",");
        printf("{\"student_id\":%d,\"name\":\"%s\",\"age\":%d}", 
               students[i].student_id, students[i].name, students[i].age);
    }
    printf("]\n");
}

int main() {
    static char input[10000];
    static Student students[1000];
    static Student filtered[1000];
    
    fgets(input, sizeof(input), stdin);
    
    int count = parseStudents(input, students);
    int filteredCount = solution(students, count, filtered);
    
    printStudents(filtered, filteredCount);
    
    return 0;
}

Time & Space Complexity

Time Complexity

⏱️

O(n)

Need to check each row once where n is number of rows

✓ Linear Growth

Space Complexity

O(n)

Create new DataFrame to store filtered results

⚡ Linearithmic Space

33.8K Views

High Frequency

~5 min Avg. Time

890 Likes

Ln 1, Col 1

Smart Actions

💡 Explanation

AI Ready

💡 Suggestion Tab to accept Esc to dismiss

// Output will appear here after running code

Code Editor Closed

Click the red button to reopen

Drop Missing Data - Problem

Input & Output

Constraints

Visualization

Related Problems

Common Approaches

Manual Row-by-Row Filtering — Algorithm Steps

Visualization

Code -

Time & Space Complexity

Select Compiler