Tuesday, March 14, 2023

Bugs in OpenBSD’s UTF-8 Decoding Logic

In this article, we’ll take a look at the [sorry] state of affairs regarding UTF-8 support on the OpenBSD kernel, at least as of OpenBSD 7.2-release. It’s not a pretty picture, but hopefully we can improve things.
[…]
Still, the debugging process we went through here to discover the cause of the problems in the first place is worth sharing from the beginning, as the code in question was particularly bad with plenty of textbook mistakes. Who knows what you might find in your own investigations elsewhere.
[…]
So effectively, when we tried sending the 0xC4, 0x80, sequence to print character 256, the kernel tried to interpret it as a three byte sequence instead of a two-byte sequence. The trailing newline caused the re-assembly of the multi-byte character to fail, but we fell through to code which then correctly processed the newline character.

Previously:

Bug OpenBSD Programming Unicode

3 Comments RSS · Twitter · Mastodon

John Gordon

March 14, 2023 4:36 PM

I use to wonder which would come first -- resolution of the CR/LF dilemma or humanity ending AI. It's still an open question...

Old Unix Geek

March 14, 2023 9:32 PM

Actually, today it became clear that civilization ending AI is much more likely that was previously estimated. If the cost of intelligence tends to zero, then there's little point educating much if not most of humanity.

Old Unix Geek

March 16, 2023 1:33 AM

Here we go.

Bugs in OpenBSD’s UTF-8 Decoding Logic

3 Comments RSS · Twitter · Mastodon

Leave a Comment