Leetcode 393: UTF-8 Validation

grid47
Exploring patterns and algorithms
Sep 28, 2024 6 min read

Solution to LeetCode 393: UTF-8 Validation Problem

You are given an array of integers data, where each integer represents one byte of data. Your task is to check whether this sequence of bytes forms a valid UTF-8 encoded string based on the UTF-8 encoding rules for 1 to 4 bytes characters.

Problem

Approach

Steps

Complexity

Input: The input consists of an integer array `data` where each integer represents one byte of data.

Example: Input: [197, 130, 1]

Constraints:

• 1 <= data.length <= 2 * 10^4

• 0 <= data[i] <= 255

Output: The output is a boolean indicating whether the input array `data` represents a valid UTF-8 encoding.

Example: Output: true

Constraints:

• The output should be true if the byte sequence represents a valid UTF-8 encoding, otherwise false.

Goal: The goal is to validate if the given byte sequence adheres to the rules of UTF-8 encoding.

Steps:

• Iterate through the array of bytes in `data`.

• Check the first bits of each byte to determine whether it's the start of a 1, 2, 3, or 4-byte character.

• For continuation bytes (those starting with `10`), ensure that the correct number of continuation bytes follows.

• Return true if the entire sequence is valid; otherwise, return false.

Goal: The solution should efficiently handle the input size within the given constraints.

Steps:

• The solution must handle arrays of length up to 2 * 10^4 efficiently.

Assumptions:

• Each integer in `data` represents one byte of data, and the byte values range from 0 to 255.

• Input: Input: [197, 130, 1]

• Explanation: The byte sequence '11000101 10000010 00000001' represents a valid UTF-8 encoding: a 2-byte character followed by a 1-byte character.

• Input: Input: [235, 140, 4]

• Explanation: The byte sequence '11101011 10001100 00000100' represents an invalid UTF-8 encoding because the second byte does not start with '10' as required for a continuation byte.

Link to LeetCode Lab

LeetCode Solutions Library / DSA Sheets / Course Catalog

« Leetcode 392: Is Subsequence Leetcode 394: Decode String »

Leetcode 393: UTF-8 Validation

Solution to LeetCode 393: UTF-8 Validation Problem

Explore →