Apple LLM Generating SwiftUI
Marcus Mendes (PDF):
In the paper UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback, the researchers explain that while LLMs have gotten better at multiple writing tasks, including creative writing and coding, they still struggle to “reliably generate syntactically-correct, well-designed code for UIs.” They also have a good idea why:
Even in curated or manually authored finetuning datasets, examples of UI code are extremely rare, in some cases making up less than one percent of the overall examples in code datasets.
To tackle this, they started with StarChat-Beta, an open-source LLM specialized in coding. They gave it a list of UI descriptions, and instructed it to generate a massive synthetic dataset of SwiftUI programs from those descriptions.
The paper was published last year, but I didn’t see people talking about it until August. In the interim, Apple started using third-party AI providers in Xcode.
18-25% of the output does not even compile. (The model they started with: 97% of the results FAILED to compile. Even the BEST model fails to produce compilable code in 12% of the cases.)
This lines up with GitHub’s report that typed languages are more reliable for generative AI.
To be blunt: after testing them out, I have not used LLMs for programming for the rest of the year. Attempting to use an LLM in that way was simply too frustrating. I don’t enjoy cleaning up flawed approaches and changing every single line. I do regularly ask ChatGPT how to use specific APIs, but I’m really just using it as a better documentation search or asking for sample code that is missing from Apple’s documentation. I’m not directly using any of the code ChatGPT writes in any of my apps.
In the meantime, I have watched plenty of presentations about letting Claude Code, and other tools, completely build an “app” but the successful presentations have usually focussed on JavaScript web apps or Python wrappers around small command-line tools. The two times this year that I’ve watched developers try the same with Swift apps have led to non-working solutions and excuses claiming it does sometimes work if left to run for another 20 minutes.
Previously: