Tuesday, March 14, 2023

Bugs in OpenBSD’s UTF-8 Decoding Logic

Exotic Silicon (via Hacker News):

In this article, we’ll take a look at the [sorry] state of affairs regarding UTF-8 support on the OpenBSD kernel, at least as of OpenBSD 7.2-release. It’s not a pretty picture, but hopefully we can improve things.


Still, the debugging process we went through here to discover the cause of the problems in the first place is worth sharing from the beginning, as the code in question was particularly bad with plenty of textbook mistakes. Who knows what you might find in your own investigations elsewhere.


So effectively, when we tried sending the 0xC4, 0x80, sequence to print character 256, the kernel tried to interpret it as a three byte sequence instead of a two-byte sequence. The trailing newline caused the re-assembly of the multi-byte character to fail, but we fell through to code which then correctly processed the newline character.


3 Comments RSS · Twitter · Mastodon

I use to wonder which would come first -- resolution of the CR/LF dilemma or humanity ending AI. It's still an open question...

Old Unix Geek

Actually, today it became clear that civilization ending AI is much more likely that was previously estimated. If the cost of intelligence tends to zero, then there's little point educating much if not most of humanity.

Leave a Comment