Imagine you're a system administrator managing a large file system with thousands of files scattered across multiple directories. Over time, users have created duplicate files with identical content but stored in different locations, wasting precious storage space.
Your mission: Given a list of directory information strings, identify all groups of duplicate files that have the same content.
Each directory info string follows this format:"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"
This means:
- Directory path:
root/d1/d2/.../dm - Files with their content in parentheses:
f1.txt(f1_content), etc. - If m = 0, the directory is just the root
Goal: Return all groups of duplicate files, where each group contains at least 2 files with identical content. Each file should be represented by its full path: "directory_path/file_name.txt"
Example: If two files have content "hello world", group them together regardless of their location in the file system.
Input & Output
Visualization
Time & Space Complexity
n is total number of files, m is average content length for string operations. Each file is processed once.
Space for hash map storing all file paths and contents, plus result groups
Constraints
- 1 โค paths.length โค 2 ร 104
- 1 โค paths[i].length โค 3000
- 1 โค sum of all file contents length โค 5 ร 105
- paths[i] has the format "dir file1.txt(content1) file2.txt(content2) ... fileN.txt(contentN)"
- Answer can be returned in any order