Leetcode 820: Short Encoding of Words

Input: The input consists of an array of words, where each word is a string of lowercase letters. Each word in the array has a length between 1 and 7 characters.

Example: Input: words = ["apple", "ple", "le"]

Constraints:

• 1 <= words.length <= 2000

• 1 <= words[i].length <= 7

• words[i] consists of only lowercase letters

Output: Return the length of the shortest reference string that can encode the given words.

Example: Output: 9

Constraints:

• The output should be an integer representing the length of the shortest reference string.

Goal: The goal is to find the shortest encoding of the words using a reference string. To do this, we can eliminate words that are suffixes of other words, and concatenate only the distinct prefixes of the words to form the reference string.

Steps:

• Create a set of words to store unique words.

• For each word, check all possible suffixes and remove them from the set if they are also present.

• Finally, concatenate the remaining words, adding 1 for each '#' delimiter, and return the length of the concatenated string.

Goal: Ensure that the solution handles the constraints efficiently, especially when the number of words and the total length of words increases.

Steps:

• The number of words can be up to 2000.

• Each word has a length of at most 7 characters.

Assumptions:

• All words are lowercase and have lengths between 1 and 7 characters.

• Words in the array are unique.

• Input: Input: words = ["apple", "ple", "le"]

• Explanation: The word 'ple' is a suffix of 'apple', and 'le' is a suffix of 'ple'. So, we can encode the words with 'apple#', which results in a total length of 9.

• Input: Input: words = ["orange", "or", "ange"]

• Explanation: Here, 'or' is a prefix of 'orange', and 'ange' is a suffix of 'orange'. The shortest encoding would be 'orange#', resulting in a total length of 7.

Approach: The key to solving this problem efficiently is to first remove suffixes that are redundant. If one word is a suffix of another, the shorter word does not need to be included in the final encoding.

Observations:

• Suffixes that are part of longer words don't need to be included separately.

• The reference string must end with a '#' character, so we need to account for that when calculating the total length.

• We should identify all words that are not suffixes of other words, as these will contribute to the final reference string.

Steps:

• Step 1: Convert the array of words into a set to eliminate duplicates.

• Step 2: For each word, check for suffixes and remove them from the set if they appear in the array.

• Step 3: After removing suffixes, concatenate the remaining words with a '#' character after each word.

• Step 4: Return the total length of the concatenated reference string.

Empty Inputs:

• If the input is an empty array, return 0 since there are no words to encode.

Large Inputs:

• For large arrays (up to 2000 words), the solution should handle them efficiently.

Special Values:

• Ensure that cases where all words are prefixes or suffixes of one another are handled correctly.

Constraints:

• The solution must handle arrays of up to 2000 words efficiently.

int minimumLengthEncoding(vector<string>& words) {

    unordered_set<string> s(words.begin(), words.end());
    
    for(string w: words)
    for(int i = 1; i < w.size(); i++)
        s.erase(w.substr(i));

    int res = 0;        
    for(string sk: s)
    res += (sk.size() + 1);
    
    return res;
}

1 : Function Definition

int minimumLengthEncoding(vector<string>& words) {

The function `minimumLengthEncoding` is defined, which takes a vector of strings `words` as input and returns an integer representing the minimum length encoding.

2 : Set Initialization

    unordered_set<string> s(words.begin(), words.end());

An unordered set `s` is initialized with the elements from the `words` vector, ensuring all words are stored without duplicates.

3 : Outer Loop

    for(string w: words)

The outer loop iterates over each word `w` in the `words` vector.

4 : Inner Loop

    for(int i = 1; i < w.size(); i++)

The inner loop iterates over the characters in each word starting from index 1, essentially looking at suffixes of the word.

5 : Erase Suffix

        s.erase(w.substr(i));

The substring of `w` starting from index `i` is removed from the set `s`. This ensures that any word that is a suffix of another word is excluded from the encoding.

6 : Result Initialization

    int res = 0;

The variable `res` is initialized to 0, which will be used to store the total length of the encoded string.

7 : Summing Lengths

    for(string sk: s)

The loop iterates over each string `sk` remaining in the set `s`.

8 : Calculate Encoding Length

    res += (sk.size() + 1);

For each string `sk`, its length is added to `res`, along with an additional 1 to account for the '#' character that separates each word in the encoding.

9 : Return Result

    return res;

The function returns the total length of the encoded string, which is the sum of the lengths of all the strings in the set `s`.

Best Case: O(n), where n is the number of words.

Average Case: O(n * m), where n is the number of words and m is the average length of the words.

Worst Case: O(n * m), where n is the number of words and m is the average length of the words.

Description: The time complexity is linear with respect to the number of words and the length of each word due to the need to check suffixes.

Best Case: O(n), where n is the number of words, when no suffixes need to be stored.

Worst Case: O(n * m), where n is the number of words and m is the average length of the words, for storing the word set.

Description: The space complexity is primarily determined by the size of the word set.

Leetcode 820: Short Encoding of Words

Solution to LeetCode 820: Short Encoding of Words Problem

Explore →