Tuesday, November 24, 2015

Dangers of NeXTSTEP Plists

Sam Marshall (comments):

Most of you are probably familiar with the fact that Xcode uses NeXTSTEP plists for the format when serializing project files.

[…]

Xcode’s implementation of deserializing the NeXTSTEP plist files is different from that of what is used in (Core)Foundation. There are assumptions made about what the output encoding is assumed to be, as well as supporting writing out this format of plist when (Core)Foundation does not. The NeXT/OpenStep plist format assumes that strings are written as ASCII, whereas Cocoa assumes strings are written in Unicode. As a result, Cocoa will happily read unescaped Unicode data from NeXT/OpenStep plists (while the parser will fail to read properly escaped sequences longer than 4 digits). This makes the format invalid as it is no longer ASCII data on disk, however will still be parsed correctly by classes like NSDictionary because of Cocoa's assumption that all strings are Unicode.

3 Comments RSS · Twitter


I have a small zsh macro I've used for years to print plists as NeXTSTEP style plists. I tried converting the plist he posted — it seems to roundtrip things just fine.

plv () {
plutil -convert xml1 -o /dev/fd/1 $1 | /usr/bin/pl
}

% cat xml.plist

hourglass

panda
🐼

% plv xml.plist | tee next.plist
{
hourglass = "\U23f3";
panda = "\Ud83d\Udc3c";
}
% plutil -convert xml1 next.plist
% cat next.plist

hourglass

panda
🐼

I'm not understanding why he is assuming \U plus 8 hex characters is supposed to work, but maybe I'm missing something…


@Nicholas Which parsers do plutil and pl use? I read her as saying that you might think \U plus 8 would work (it used to in Swift, for example), but it doesn’t.


They use Cocoa's parser — in fact, they're apparently built as part of Foundation:

% what =pl =plutil
/usr/bin/pl
PROGRAM:pl PROJECT:Foundation-1154
/usr/bin/plutil
PROGRAM:plutil PROJECT:Foundation-1154

I agree with her point that it's wrong of Xcode's parser to allow raw Unicode in supposedly-ASCII plists (at least without complaining), but not sure how this extends to 32-bit character escapes and why they are "properly escaped". It isn't clear how she is converting from "xml" to "nextstep" — and what if any code will generate these escapes.

Apparently the way to carry on discussion on Sam's blog is through GitHub issues, but right now I need to get back to work :-)

Leave a Comment