Friday, July 20, 2018

Unicode Waiting for New Japanese Era Name

Wikipedia:

The current period is known as “Heisei” (平成). It started on 8 January 1989, the day after the death of the Emperor Hirohito. His son, the 125th Emperor Akihito, acceded to the throne. It is scheduled to end on 30 April 2019, his planned date of Abdication.

Ken Whistler:

The date of that [Unicode 12.1] release, which cannot really be moved, given the complex dependencies now in place for the corresponding CLDR and ICU releases, and for the vendor product cycles that depend, in turn, on those, poses a problem for the anticipated announcement of the new Japanese era name. The date of the abdication and start of the subsequent Japanese reign era is now fixed, but the actual name of the era will not be announced, apparently, until sometime shortly after February 24, 2019. That timeframe is way too short to adjust the data files and charts for the addition of a new character, no matter how urgent it is for implementation.

The problem, in this case, is that even though we know the code point for this new character, U+32FF, which the UTC set aside back in January, we cannot know the actual content of that code point until the era name itself is announced. The characters encoded for these calendrical symbols in Unicode have compatibility decompositions, and those decompositions depend on the actual name chosen for the era. Because the decomposition, once assigned, is immutable, involving Unicode normalization, the UTC cannot afford to make any mistakes here, nor can it just guess and release the code point early.

Via Dave DeLong:

Stuff like this makes me wonder if we’ll ever see the CLDR (the database that defines date formats, calendar semantics, localization stuff, etc) decoupled from the Unicode libraries proper, in the same way that the timezone database is distributable separately

3 Comments RSS · Twitter

This is the first I've heard about this, thank you for sharing!

I'm a little confused why a new unicode character will be needed though. 平成 doesn't use any special or unusual characters to spell it (both are in the standard Joyo Kanji set of 2000 or so taught in Japanese schools). Surely any new era would simply use characters that are already in unicode? Do we always add a new character for each new era in Japan?

@michael H
There are existing characters for the 4 era names that exist since moving no the current system which covers 1868 onwards.
A new unicode character is not required per-se, but any applications that use the characters will need to adjust: (using ㍻ and not 平成).

This is also independent of the changes required for any tools that require the associated data.
Systems will need to CLDR/ICU builds so that H31.04.30 is followed by ?1.05.01 (not H31.05.01, where ? is the new Era abbreviation, short-form, or full form).

CLDR also provides all the translations, For example, to use "May 30, H30" or "May 30, Heisei-30" in English. Those need to be updated.

[…] Previously: Unicode Waiting for New Japanese Era Name. […]

Leave a Comment