Leetcode 820: Short Encoding of Words
Given an array of words, you are to return the length of the shortest reference string that can encode these words. The reference string consists of the words concatenated with a ‘#’ character at the end, and indices are used to encode the words by identifying their position in the reference string. Each word is encoded by the substring in the reference string that starts at its position and ends at the next ‘#’.
Problem
Approach
Steps
Complexity
Input: The input consists of an array of words, where each word is a string of lowercase letters. Each word in the array has a length between 1 and 7 characters.
Example: Input: words = ["apple", "ple", "le"]
Constraints:
• 1 <= words.length <= 2000
• 1 <= words[i].length <= 7
• words[i] consists of only lowercase letters
Output: Return the length of the shortest reference string that can encode the given words.
Example: Output: 9
Constraints:
• The output should be an integer representing the length of the shortest reference string.
Goal: The goal is to find the shortest encoding of the words using a reference string. To do this, we can eliminate words that are suffixes of other words, and concatenate only the distinct prefixes of the words to form the reference string.
Steps:
• Create a set of words to store unique words.
• For each word, check all possible suffixes and remove them from the set if they are also present.
• Finally, concatenate the remaining words, adding 1 for each '#' delimiter, and return the length of the concatenated string.
Goal: Ensure that the solution handles the constraints efficiently, especially when the number of words and the total length of words increases.
Steps:
• The number of words can be up to 2000.
• Each word has a length of at most 7 characters.
Assumptions:
• All words are lowercase and have lengths between 1 and 7 characters.
• Words in the array are unique.
• Input: Input: words = ["apple", "ple", "le"]
• Explanation: The word 'ple' is a suffix of 'apple', and 'le' is a suffix of 'ple'. So, we can encode the words with 'apple#', which results in a total length of 9.
• Input: Input: words = ["orange", "or", "ange"]
• Explanation: Here, 'or' is a prefix of 'orange', and 'ange' is a suffix of 'orange'. The shortest encoding would be 'orange#', resulting in a total length of 7.
Approach: The key to solving this problem efficiently is to first remove suffixes that are redundant. If one word is a suffix of another, the shorter word does not need to be included in the final encoding.
Observations:
• Suffixes that are part of longer words don't need to be included separately.
• The reference string must end with a '#' character, so we need to account for that when calculating the total length.
• We should identify all words that are not suffixes of other words, as these will contribute to the final reference string.
Steps:
• Step 1: Convert the array of words into a set to eliminate duplicates.
• Step 2: For each word, check for suffixes and remove them from the set if they appear in the array.
• Step 3: After removing suffixes, concatenate the remaining words with a '#' character after each word.
• Step 4: Return the total length of the concatenated reference string.
Empty Inputs:
• If the input is an empty array, return 0 since there are no words to encode.
Large Inputs:
• For large arrays (up to 2000 words), the solution should handle them efficiently.
Special Values:
• Ensure that cases where all words are prefixes or suffixes of one another are handled correctly.
Constraints:
• The solution must handle arrays of up to 2000 words efficiently.
int minimumLengthEncoding(vector<string>& words) {
unordered_set<string> s(words.begin(), words.end());
for(string w: words)
for(int i = 1; i < w.size(); i++)
s.erase(w.substr(i));
int res = 0;
for(string sk: s)
res += (sk.size() + 1);
return res;
}
1 : Function Definition
int minimumLengthEncoding(vector<string>& words) {
The function `minimumLengthEncoding` is defined, which takes a vector of strings `words` as input and returns an integer representing the minimum length encoding.
2 : Set Initialization
unordered_set<string> s(words.begin(), words.end());
An unordered set `s` is initialized with the elements from the `words` vector, ensuring all words are stored without duplicates.
3 : Outer Loop
for(string w: words)
The outer loop iterates over each word `w` in the `words` vector.
4 : Inner Loop
for(int i = 1; i < w.size(); i++)
The inner loop iterates over the characters in each word starting from index 1, essentially looking at suffixes of the word.
5 : Erase Suffix
s.erase(w.substr(i));
The substring of `w` starting from index `i` is removed from the set `s`. This ensures that any word that is a suffix of another word is excluded from the encoding.
6 : Result Initialization
int res = 0;
The variable `res` is initialized to 0, which will be used to store the total length of the encoded string.
7 : Summing Lengths
for(string sk: s)
The loop iterates over each string `sk` remaining in the set `s`.
8 : Calculate Encoding Length
res += (sk.size() + 1);
For each string `sk`, its length is added to `res`, along with an additional 1 to account for the '#' character that separates each word in the encoding.
9 : Return Result
return res;
The function returns the total length of the encoded string, which is the sum of the lengths of all the strings in the set `s`.
Best Case: O(n), where n is the number of words.
Average Case: O(n * m), where n is the number of words and m is the average length of the words.
Worst Case: O(n * m), where n is the number of words and m is the average length of the words.
Description: The time complexity is linear with respect to the number of words and the length of each word due to the need to check suffixes.
Best Case: O(n), where n is the number of words, when no suffixes need to be stored.
Worst Case: O(n * m), where n is the number of words and m is the average length of the words, for storing the word set.
Description: The space complexity is primarily determined by the size of the word set.
LeetCode Solutions Library / DSA Sheets / Course Catalog |
---|
comments powered by Disqus