Find Duplicate Subtrees - Problem

Imagine you're a botanist studying the genetic patterns in a family tree of plants. You've discovered that some branches of the tree have evolved into identical genetic structures - essentially becoming duplicates of each other!

Given the root of a binary tree, your task is to find all duplicate subtrees and return a list containing one representative root node from each group of duplicates. Two subtrees are considered duplicates if they have exactly the same structure with identical node values at corresponding positions.

For example, if you find 3 identical subtrees in the tree, you only need to return the root of one of them. The goal is to identify all the different "genetic patterns" that appear multiple times in your tree.

Input: Root of a binary tree
Output: List of root nodes representing each type of duplicate subtree

Input & Output

example_1.py — Basic Tree with Duplicates

$ Input: [1,2,3,4,null,2,4,null,null,4]

› Output: [2,4]

💡 Note: Tree has subtrees rooted at nodes with values 2 and 4 that appear multiple times. The subtree '4(null)(null)' appears 3 times, and subtree '2(4(null)(null))(null)' appears 2 times.

example_2.py — Simple Duplicate

$ Input: [2,1,1]

› Output: [1]

💡 Note: The subtree consisting of just node 1 appears twice (as left and right children of root). We return one representative.

example_3.py — No Duplicates

$ Input: [2,2,2,3,null,3,null]

› Output: [2]

💡 Note: The tree has two identical subtrees rooted at the children of the root. Both subtrees have structure '2(3(null)(null))(null)', so node 2 should be returned as a duplicate.

Constraints

The number of nodes in the tree is in the range [1, 10⁴]
-200 ≤ Node.val ≤ 200

Visualization

Tap to expand

Asked in

G Google 67 a Amazon 45 ⊞ Microsoft 38 f Meta 29

The optimal approach uses post-order DFS traversal with subtree serialization. Each subtree is converted to a unique string representation, and a hash map tracks occurrence counts. When a serialization appears for the second time, we've found a duplicate subtree. This achieves O(n) time complexity with a single pass through the tree.

Common Approaches

✓ Dijkstra

⏱️ Time: N/A Space: N/A

Brute Force (Compare All Subtrees)

⏱️ Time: O(n³) Space: O(n²)

Generate all possible subtrees and compare each pair to find duplicates. For each node, extract its subtree and compare it with all other subtrees using a recursive comparison function.

One-Pass Serialization (Optimal)

⏱️ Time: O(n) Space: O(n)

Use post-order DFS to serialize each subtree into a unique string representation. As we build the serialization, immediately check if we've seen this pattern before using a hash map. This allows us to detect duplicates in a single pass through the tree.

Algorithm Steps — Algorithm Steps

Code -

solution.c — C

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>

typedef struct {
    int time, x, y;
} State;

typedef struct {
    State* data;
    int size;
    int capacity;
} PriorityQueue;

PriorityQueue* createPQ() {
    PriorityQueue* pq = malloc(sizeof(PriorityQueue));
    pq->data = malloc(100000 * sizeof(State));
    pq->size = 0;
    pq->capacity = 100000;
    return pq;
}

void push(PriorityQueue* pq, State s) {
    pq->data[pq->size] = s;
    int i = pq->size++;
    while (i > 0) {
        int parent = (i - 1) / 2;
        if (pq->data[parent].time <= pq->data[i].time) break;
        State temp = pq->data[parent];
        pq->data[parent] = pq->data[i];
        pq->data[i] = temp;
        i = parent;
    }
}

State pop(PriorityQueue* pq) {
    State result = pq->data[0];
    pq->data[0] = pq->data[--pq->size];
    int i = 0;
    while (2 * i + 1 < pq->size) {
        int left = 2 * i + 1, right = 2 * i + 2;
        int min = left;
        if (right < pq->size && pq->data[right].time < pq->data[left].time) {
            min = right;
        }
        if (pq->data[i].time <= pq->data[min].time) break;
        State temp = pq->data[i];
        pq->data[i] = pq->data[min];
        pq->data[min] = temp;
        i = min;
    }
    return result;
}

int parseInput(char* line, int*** moveTime, int* rows, int* cols) {
    *rows = 0;
    *cols = 0;
    *moveTime = malloc(100 * sizeof(int*));
    
    char* ptr = line + 1;
    while (*ptr && *ptr != ']') {
        if (*ptr == '[') {
            ptr++;
            (*moveTime)[*rows] = malloc(100 * sizeof(int));
            *cols = 0;
            while (*ptr && *ptr != ']') {
                (*moveTime)[*rows][*cols] = 0;
                while (*ptr >= '0' && *ptr <= '9') {
                    (*moveTime)[*rows][*cols] = (*moveTime)[*rows][*cols] * 10 + (*ptr - '0');
                    ptr++;
                }
                (*cols)++;
                if (*ptr == ',') ptr++;
            }
            (*rows)++;
            if (*ptr == ']') ptr++;
        }
        if (*ptr == ',') ptr++;
    }
    return 0;
}

int solution(int** moveTime, int n, int m) {
    PriorityQueue* pq = createPQ();
    push(pq, (State){0, 0, 0});
    int** visited = calloc(n, sizeof(int*));
    for (int i = 0; i < n; i++) {
        visited[i] = calloc(m, sizeof(int));
    }
    
    int dx[] = {0, 1, 0, -1};
    int dy[] = {1, 0, -1, 0};
    
    while (pq->size > 0) {
        State curr = pop(pq);
        
        if (visited[curr.x][curr.y]) continue;
        visited[curr.x][curr.y] = 1;
        
        if (curr.x == n - 1 && curr.y == m - 1) {
            free(pq->data);
            free(pq);
            for (int i = 0; i < n; i++) free(visited[i]);
            free(visited);
            return curr.time;
        }
        
        for (int i = 0; i < 4; i++) {
            int nx = curr.x + dx[i], ny = curr.y + dy[i];
            if (nx >= 0 && nx < n && ny >= 0 && ny < m && !visited[nx][ny]) {
                int waitTime;
                if (curr.time + 1 >= moveTime[nx][ny]) {
                    waitTime = curr.time + 1;
                } else {
                    waitTime = moveTime[nx][ny];
                    if ((moveTime[nx][ny] - curr.time) % 2 == 0) {
                        waitTime++;
                    }
                }
                push(pq, (State){waitTime, nx, ny});
            }
        }
    }
    
    free(pq->data);
    free(pq);
    for (int i = 0; i < n; i++) free(visited[i]);
    free(visited);
    return -1;
}

int main() {
    char line[10000];
    fgets(line, sizeof(line), stdin);
    int** moveTime;
    int rows, cols;
    parseInput(line, &moveTime, &rows, &cols);
    printf("%d\n", solution(moveTime, rows, cols));
    for (int i = 0; i < rows; i++) {
        free(moveTime[i]);
    }
    free(moveTime);
    return 0;
}

Time & Space Complexity

Time Complexity

⏱️

✓ Linear Growth

Space Complexity

⚡ Linearithmic Space

89.3K Views

Medium Frequency

~25 min Avg. Time

2.8K Likes

Ln 1, Col 1

Smart Actions

💡 Explanation

AI Ready

💡 Suggestion Tab to accept Esc to dismiss

// Output will appear here after running code

Code Editor Closed

Click the red button to reopen