Delete Duplicate Folders in System - Problem

Due to a bug, there are many duplicate folders in a file system. You are given a 2D array paths, where paths[i] is an array representing an absolute path to the ith folder in the file system.

For example, ["one", "two", "three"] represents the path "/one/two/three".

Two folders (not necessarily on the same level) are identical if they contain the same non-empty set of identical subfolders and underlying subfolder structure. The folders do not need to be at the root level to be identical. If two or more folders are identical, then mark the folders as well as all their subfolders.

Once all the identical folders and their subfolders have been marked, the file system will delete all of them. The file system only runs the deletion once, so any folders that become identical after the initial deletion are not deleted.

Return the 2D array ans containing the paths of the remaining folders after deleting all the marked folders. The paths may be returned in any order.

Input & Output

Example 1 — Basic Duplicate Detection
$ Input: paths = [["a"], ["c"], ["d"], ["a", "b"], ["c", "b"], ["d", "a"]]
Output: [["d"], ["d", "a"]]
💡 Note: Folders /a and /c both contain subfolder "b", making them identical structures. Both /a and /c (and their subfolders) are deleted, leaving only /d and /d/a.
Example 2 — No Duplicates
$ Input: paths = [["a"], ["c"], ["a", "b"], ["c", "d"]]
Output: [["a"], ["c"], ["a", "b"], ["c", "d"]]
💡 Note: Folder /a contains subfolder "b" while /c contains subfolder "d". Since their structures are different, no folders are deleted.
Example 3 — Multiple Level Duplicates
$ Input: paths = [["a"], ["b"], ["a", "x"], ["a", "x", "y"], ["b", "x"], ["b", "x", "y"]]
Output: []
💡 Note: Folders /a and /b have identical structure (both contain x/y subfolder tree). All folders are deleted as duplicates.

Constraints

  • 1 ≤ paths.length ≤ 2 × 104
  • 1 ≤ paths[i].length ≤ 500
  • 1 ≤ paths[i][j].length ≤ 10
  • paths[i][j] consists of lowercase English letters

Visualization

Tap to expand
Delete Duplicate Folders in System INPUT Folder Structure: root a c d b b a = Duplicate (delete) = Unique (keep) paths = [["a"], ["c"], ["d"], ["a","b"], ["c","b"], ["d","a"]] 6 folder paths total a/b and c/b are identical ALGORITHM STEPS 1 Build Trie Create folder tree from paths 2 Generate Signatures Hash subfolder structure sig(a) = "(b())" = sig(c) sig(d) = "(a())" -- unique! Same signature = identical structure 3 Find Duplicates Group by signature hash "(b())" appears 2x -- MARK "(a())" appears 1x -- KEEP Count > 1 means duplicate 4 Collect Remaining Skip marked folders + children FINAL RESULT Remaining Structure: root d a Deleted (duplicates): ["a"], ["a","b"] ["c"], ["c","b"] 4 paths removed (identical subtrees) Output: [["d"], ["d","a"]] OK - 2 unique paths remain Key Insight: Hash-Based Folder Signatures Each folder's structure is encoded as a unique string signature: "(child1(grandchildren...)child2(...))" If two folders have the same signature, they have identical subtree structures and must be deleted. This transforms a tree comparison problem into a simple hash map lookup for O(n) efficiency. TutorialsPoint - Delete Duplicate Folders in System | Hash-Based Folder Signatures Approach
Asked in
Google 15 Microsoft 12 Amazon 8
18.5K Views
Medium Frequency
~35 min Avg. Time
425 Likes
Ln 1, Col 1
Smart Actions
💡 Explanation
AI Ready
💡 Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen