Apple LLM Generating SwiftUI
Marcus Mendes (PDF):
In the paper UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback, the researchers explain that while LLMs have gotten better at multiple writing tasks, including creative writing and coding, they still struggle to “reliably generate syntactically-correct, well-designed code for UIs.” They also have a good idea why:
Even in curated or manually authored finetuning datasets, examples of UI code are extremely rare, in some cases making up less than one percent of the overall examples in code datasets.
To tackle this, they started with StarChat-Beta, an open-source LLM specialized in coding. They gave it a list of UI descriptions, and instructed it to generate a massive synthetic dataset of SwiftUI programs from those descriptions.
The paper was published last year, but I didn’t see people talking about it until August. In the interim, Apple started using third-party AI providers in Xcode.
18-25% of the output does not even compile. (The model they started with: 97% of the results FAILED to compile. Even the BEST model fails to produce compilable code in 12% of the cases.)
This lines up with GitHub’s report that typed languages are more reliable for generative AI.
To be blunt: after testing them out, I have not used LLMs for programming for the rest of the year. Attempting to use an LLM in that way was simply too frustrating. I don’t enjoy cleaning up flawed approaches and changing every single line. I do regularly ask ChatGPT how to use specific APIs, but I’m really just using it as a better documentation search or asking for sample code that is missing from Apple’s documentation. I’m not directly using any of the code ChatGPT writes in any of my apps.
In the meantime, I have watched plenty of presentations about letting Claude Code, and other tools, completely build an “app” but the successful presentations have usually focussed on JavaScript web apps or Python wrappers around small command-line tools. The two times this year that I’ve watched developers try the same with Swift apps have led to non-working solutions and excuses claiming it does sometimes work if left to run for another 20 minutes.
Previously:
- Top Programming Languages of 2025
- What Xcode 26’s AI Chat Integration Is Missing
- Swift Assist, Part Deux
- Tim, Don’t Kill My Vibe
- Vibe Coding
3 Comments RSS · Twitter · Mastodon
The post from Gallagher is wild. “I’ve not used the technology at all this year and here’s my opinion based upon nothing whatsoever.” Coding with an AI assistant does take time and effort to perfect, but you’re rewarded with a huge productivity boost. Are they perfect? Of course not. But neither are developers.
Using Claude Code tailored to your needs with excellent plugins, skills, subagents, and MCP servers is a revelation.
@Jonathan I don’t think that’s really a fair summary. He did try multiple assistants, including Claude. I’m glad to hear that it gets better as you move up the learning curve.
@Jonathan, here's where I stand, speaking as a retired developer:
> "Coding with an AI assistant does take time and effort to perfect, but you’re rewarded with a huge productivity boost. Are they perfect? Of course not."
Convince me. Developers, retired or not, live on details.
> "Using Claude Code tailored to your needs with excellent plugins, skills, subagents, and MCP servers is a revelation."
So convince me. You know detiails. (The devil is in them!) What have you experienced as a "huge" productivity boost? Going from concept to base code? And if so, what imperfections have you experienced? Even more important - forget these pesky details - how long did it take you to use Claude for this huge productivity boost? I understand, it may be for a third party employer and you cannot provide many details, but can you provide some? How long did it take you to "perfect" this "huge" productivity boost? I'm will to learn! But as I've already said, your comment lacks these details, where "the devil" resides.