Leetcode 393: UTF-8 Validation

grid47
grid47
Exploring patterns and algorithms
Sep 28, 2024 6 min read

A string being validated for UTF-8 encoding, with each valid byte sequence softly glowing.
Solution to LeetCode 393: UTF-8 Validation Problem

You are given an array of integers data, where each integer represents one byte of data. Your task is to check whether this sequence of bytes forms a valid UTF-8 encoded string based on the UTF-8 encoding rules for 1 to 4 bytes characters.
Problem
Approach
Steps
Complexity
Input: The input consists of an integer array `data` where each integer represents one byte of data.
Example: Input: [197, 130, 1]
Constraints:
• 1 <= data.length <= 2 * 10^4
• 0 <= data[i] <= 255
Output: The output is a boolean indicating whether the input array `data` represents a valid UTF-8 encoding.
Example: Output: true
Constraints:
• The output should be true if the byte sequence represents a valid UTF-8 encoding, otherwise false.
Goal: The goal is to validate if the given byte sequence adheres to the rules of UTF-8 encoding.
Steps:
• Iterate through the array of bytes in `data`.
• Check the first bits of each byte to determine whether it's the start of a 1, 2, 3, or 4-byte character.
• For continuation bytes (those starting with `10`), ensure that the correct number of continuation bytes follows.
• Return true if the entire sequence is valid; otherwise, return false.
Goal: The solution should efficiently handle the input size within the given constraints.
Steps:
• The solution must handle arrays of length up to 2 * 10^4 efficiently.
Assumptions:
• Each integer in `data` represents one byte of data, and the byte values range from 0 to 255.
Input: Input: [197, 130, 1]
Explanation: The byte sequence '11000101 10000010 00000001' represents a valid UTF-8 encoding: a 2-byte character followed by a 1-byte character.

Input: Input: [235, 140, 4]
Explanation: The byte sequence '11101011 10001100 00000100' represents an invalid UTF-8 encoding because the second byte does not start with '10' as required for a continuation byte.

Link to LeetCode Lab


LeetCode Solutions Library / DSA Sheets / Course Catalog
comments powered by Disqus