根据维基百科:
这些字节序列对应着字符,但实际上不应该对应。是我做错了吗?
根据Codepage布局,Not all sequences of bytes are valid UTF-8. A UTF-8 decoder should be prepared for:
1. the red invalid bytes in the above table 2. an unexpected continuation byte 3. a start byte not followed by enough continuation bytes 4. an Overlong Encoding as described above 5. A 4-byte sequence (starting with 0xF4) that decodes to a value greater than U+10FFFF
0xC0
和0xC1
是无效的,不能出现在有效的UTF-8序列中。以下是我对CodePoints 0xC0
和0xC1
的内容:Byte 2 Byte 1 Num Char
11000011 10000000 192 À
11000011 10000001 193 Á
这些字节序列对应着字符,但实际上不应该对应。是我做错了吗?