Leetcode 609: Find Duplicate File in System
Given a list of directory paths, each containing file names and their content, return the groups of duplicate files that have the same content.
Problem
Approach
Steps
Complexity
Input: Each string represents a directory path followed by multiple files and their contents in the format 'filename(content)'.
Example: paths = ["root/x 1.txt(abc) 2.txt(def)", "root/y 3.txt(abc)", "root/y/z 4.txt(def)", "root 5.txt(ghj)"]
Constraints:
• 1 <= paths.length <= 2 * 10^4
• 1 <= paths[i].length <= 3000
• 1 <= sum(paths[i].length) <= 5 * 10^5
• paths[i] consist of English letters, digits, '/', '.', '(', ')', and ' '.
• No two files in the same directory have the same name.
Output: Return a list of groups of file paths that have identical content.
Example: [["root/x/1.txt", "root/y/3.txt"], ["root/x/2.txt", "root/y/z/4.txt"]]
Constraints:
• The returned list should group files with the same content.
• Files should be in the order they appear in the input list.
Goal: Identify and group files that share identical content.
Steps:
• Parse each directory info to extract file paths and content.
• Use a map to store file contents as keys and their corresponding file paths as values.
• Group file paths that have the same content into a list.
Goal: The directory structure is valid and files have unique names within the same directory.
Steps:
• No two files in the same directory share the same name.
• The number of directories is limited to 2 * 10^4.
Assumptions:
• File contents are represented as strings enclosed in parentheses.
• File names are unique within a given directory.
• Input: paths = ["root/x 1.txt(abc) 2.txt(def)", "root/y 3.txt(abc)", "root/y/z 4.txt(def)", "root 5.txt(ghj)"]
• Explanation: Here, file '1.txt' in 'root/x' and '3.txt' in 'root/y' both have content 'abc', so they are grouped together. Similarly, '2.txt' in 'root/x' and '4.txt' in 'root/y/z' have identical content 'def', so they form another group.
Approach: The approach involves mapping file contents to their file paths and grouping files with identical content.
Observations:
• Files with identical content should be grouped together regardless of their location.
• A hash map can be used to store and retrieve file content efficiently.
Steps:
• 1. Initialize an empty map to store content and file paths.
• 2. Iterate over the given paths and parse directory and file info.
• 3. For each file, extract its content and update the map.
• 4. Filter and return groups with more than one file.
Empty Inputs:
• An empty list of paths.
Large Inputs:
• Handling large file systems with more than 10^4 paths.
Special Values:
• Files with empty content or unique content.
Constraints:
• Make sure to handle files of different sizes efficiently.
vector<vector<string>> findDuplicate(vector<string>& paths) {
unordered_map<string, vector<string>> mp;
vector<vector<string>> result;
for(auto path: paths) {
stringstream ss(path);
string root;
string s;
getline(ss, root, ' ');
while(getline(ss, s, ' ')) {
string fileName = root + '/' + s.substr(0, s.find('('));
string fileContent = s.substr(s.find('(') + 1, s.find(')') - s.find('(')-1);
mp[fileContent].push_back(fileName);
}
}
for (auto file : mp) {
if(file.second.size() > 1)
result.push_back(file.second);
}
return result;
}
1 : Function Declaration
vector<vector<string>> findDuplicate(vector<string>& paths) {
Declares the function `findDuplicate` which takes a reference to a vector of file paths as input and returns a vector of vectors containing duplicate file paths.
2 : Variable Initialization (mp)
unordered_map<string, vector<string>> mp;
Initializes an unordered map `mp` where the key is the content of a file and the value is a vector of file paths containing that content.
3 : Variable Initialization (result)
vector<vector<string>> result;
Initializes an empty vector `result` which will store groups of duplicate file paths.
4 : Loop Through Paths
for(auto path: paths) {
Iterates over each file path in the input `paths` vector.
5 : String Stream Setup
stringstream ss(path);
Initializes a stringstream `ss` with the current file path `path` to extract the root and file details.
6 : Root String Initialization
string root;
Declares a string variable `root` to store the root directory of the file.
7 : File String Initialization
string s;
Declares a string variable `s` to store individual file names and their content.
8 : Extract Root Directory
getline(ss, root, ' ');
Extracts the root directory (before the first space) from the stringstream and assigns it to `root`.
9 : Loop Through Files in Directory
while(getline(ss, s, ' ')) {
Iterates over the rest of the file details in the stringstream. Each file is separated by a space.
10 : File Name Extraction
string fileName = root + '/' + s.substr(0, s.find('('));
Extracts the file name (before the opening parenthesis) and constructs the full file path using the root directory.
11 : File Content Extraction
string fileContent = s.substr(s.find('(') + 1, s.find(')') - s.find('(')-1);
Extracts the content of the file from within the parentheses.
12 : Store in Map
mp[fileContent].push_back(fileName);
Stores the file path in the map `mp` under the corresponding file content as the key.
13 : Loop Through Map
for (auto file : mp) {
Iterates over each entry in the map `mp`, where the key is the file content and the value is a vector of file paths.
14 : Check for Duplicates
if(file.second.size() > 1)
Checks if there are multiple files with the same content (i.e., a duplicate).
15 : Store Duplicates
result.push_back(file.second);
Adds the group of duplicate file paths to the `result` vector.
16 : Return Result
return result;
Returns the vector `result` containing groups of duplicate file paths.
Best Case: O(N)
Average Case: O(N)
Worst Case: O(N)
Description: In the worst case, we may have to process each file and content, so the time complexity is linear in the number of files.
Best Case: O(N)
Worst Case: O(N)
Description: The space complexity is also O(N) due to storing file paths and contents in a map.
LeetCode Solutions Library / DSA Sheets / Course Catalog |
---|
comments powered by Disqus