LLMs and Software Development Roundup
Certain tasks have worked well for me. These tasks tend to fit the LLM model well.
[…]
It’s probably not surprising that there is a relationship between the sunk cost fallacy and gambling. Gamblers get a huge dopamine rush when they win. Sunk cost fallacy feeds that. No matter how much they’ve lost it will be worth it because the next hand will be the big winner.
I’m kind of worried these tools are doing the same thing to developers. It’s easy to go “just one more prompt…”
Don’t get me wrong. I’ve seen places these tools excel. But I’m also seeing patterns where developers don’t know when to put them down.
[…]
As for my data structures issue? Claude Code made me realize that maybe I’ve been stalling because I need to plan better. So I closed Claude Code, opened up OmniGraffle, and started sketching some UML. Because it’s probably faster that way.
There’s a lot of talk about LLMs making programmers lazy and uneducated, but I’m learning more than ever thanks to the way LLMs help me to drill into completely unknown areas with such speed. Always wary, but learning with almost every response.
Claude is sooo much better at SwiftData and common pitfalls than ChatGTP is that it’s actually embarrassing that Apple chose ChatGPT for Xcode Code Intelligence.
ChatGPT constantly and consistently hallucinates methods that don’t exist, repeatedly said features that actually existed didn’t exist (for example,
FileManager’strashItem()does work on iOS so long as your app has the correct plist settings! (Was added in iOS 11).
I’ve actually found Gemini 2.5 Pro to be the best at the SwiftUI and AppKit questions I’m asking. They’re all master fabricators of incorrect information and made-up APIs, but I still find their flailing helpful in getting me to try new things and learning new things to search for in the docs.
My feeling is that if you account for the amount of times LLMs lead you down the wrong path coding and waste your time, and how much faster you’ll be in the future if you take the time to truly learn a new skill the first time, it’s pretty much a wash.
This is really good advice.
The code I get from LLMs is directly related how sharp or sloppy I prompt and how well I prime the context.
While debating whether AI can replace software developers or not is somewhat absurd, it’s hard to argue against the significant productivity boosts.
A friend shared with me how after a performance regression following a release, they asked AI to review all of the recent diffs for the culprit. It flagged a few suspicious ones and on review one of them was the cause.
I imagine this is happening in every industry which is why Azure/AWS/GCP have more demand for AI than they can handle.
I shit a lot on LLMs. They can be as stupid as a slice of bread and a huge waste of time when reading the documentation would suffice. That being said, I used ChatGPT o3 on the weekend to set up my new Linux server with a ZFS raid, Portainer and 20 or so containers and it was an enormous time saver for me. Just reading the documentation alone it would probably taken a week or so to get to the same result. There were some frustrating moments, but overall it was very useful.
Sean Michael Kerner (Slashdot):
While enterprise AI adoption accelerates, new data from Stack Overflow’s 2025 Developer Survey exposes a critical blind spot: the mounting technical debt created by AI tools that generate “almost right” solutions, potentially undermining the productivity gains they promise to deliver.
Claude Code has considerably changed my relationship to writing and maintaining code at scale. I still write code at the same level of quality, but I feel like I have a new freedom of expression which is hard to fully articulate.
Claude Code has decoupled myself from writing every line of code, I still consider myself fully responsible for everything I ship to Puzzmo, but the ability to instantly create a whole scene instead of going line by line, word by word is incredibly powerful.
I believe with Claude Code, we are at the “introduction of photography” period of programming. Painting by hand just doesn’t have the same appeal anymore when a single concept can just appear and you shape it into the thing you want with your code review and editing skills.
When I was fooling around with
NSSound.beep(), the AI code completion suggested a duration: parameter - JUST LIKE INSIDE MACINTOSH.
Despite claims that AI today is improving at a fever pitch, it felt largely the same as before. It’s good at writing boilerplate, especially in Javascript, and particularly in React. It’s not good at keeping up with the standards and utilities of your codebase. It tends to struggle with languages like Terraform. It still hallucinates libraries leading to significant security vulnerabilities.
AIs still struggle to absorb the context of a larger codebase, even with a great prompt and
CLAUDE.mdfile. If you use a library that isn’t StackOverflow’s favorite it will butcher it even after an agentic lookup of the documentation. Agents occasionally do something neat like fix the tests they broke. Often they just waste time and tokens, going back and forth with themselves not seeming to gain any deeper knowledge each time they fail. Thus, AI’s best use case for me remains writing one-off scripts. Especially when I have no interest in learning deeper fundamentals for a single script, like when writing a custom ESLint rule.
It’s a sober reminder that the hype around “10x engineers” and all the vibe coding mania is more about clever marketing than actual productivity, and that keeping our processes deliberate isn’t a bad thing after all.
TreeTopologyTroubado (via Dare Obasanjo):
I’ve seen a lot of flak coming from folks who don’t believe AI assisted coding can be used for production code. This is simply not true.
For some context, I’m an AI SWE with a bit over a decade of experience, half of which has been at FAANG or similar companies.
[…]
Anyhow, here’s how we’re starting to use AI for prod code.
[…]
Overall, we’re seeing a ~30% increase in speed from the feature proposal to when it hits prod. This is huge for us.
In an unambiguous message to the global developer community, GitHub CEO Thomas Dohmke warned that software engineers should either embrace AI or leave the profession.
My former colleague Rebecca Parsons, has been saying for a long time that hallucinations aren’t a bug of LLMs, they are a feature. Indeed they are the feature. All an LLM does is produce hallucinations, it’s just that we find some of them useful.
One of the consequences of this is that we should always consider asking the LLM the same question more than once, perhaps with some variation in the wording. Then we can compare answers, indeed perhaps ask the LLM to compare answers for us. The difference in the answers can be as useful as the answers themselves.
[…]
Other forms of engineering have to take into account the variability of the world. A structural engineer builds in tolerance for all the factors she can’t measure. (I remember being told early in my career that the unique characteristic of digital electronics was that there was no concept of tolerances.) Process engineers consider that humans are executing tasks, and will sometimes be forgetful or careless. Software Engineering is unusual in that it works with deterministic machines. Maybe LLMs mark the point where we join our engineering peers in a world on non-determinism.
But after just a few hours vibe coding, I had a working app. A few days later, it even got through App Store approval. You can watch the whole saga here[…]
[…]
But that wasn’t enough. I figured if this was possible, maybe I can build a more complex app. An app I would be proud to share and use daily. I was going to build a podcast app.
[…]
I’m sure in the hands of a skilled developer, these tools can save time, take care of menial bugs, and maybe even provide inspiration. But in the hands of someone with zero coding knowledge, they may be able to build a single-function coffee finder app, but they certainly can’t build a good podcast app.
You’re not alone. I’m an experienced developer (in fact I helped create Pocket Casts) and I think what you found is universal. None of the AIs match the hype. None of them are capable of building a complete app. They are all hype and no substance.
On a more positive note: AI is very good at helping you learn things. I use it almost every day to ask questions. I never get it to code my apps. What if instead of your current approach you use it to slowly learn to code?
On a good day, I’ll ship a week’s worth of product in under a day with Claude Code. On a bad day, I’ll accidentally let my brain switch off, waste the whole day looping pls fix, and start from scratch the next day, coding manually.
The early narrative was that companies would need fewer seniors, and juniors together with AI could produce quality code. At least that’s what I kept seeing. But now, partly because AI hasn’t quite lived up to the hype, it looks like what companies actually need is not junior + AI, but senior + AI.
[…]
So instead of democratizing coding, AI right now has mostly concentrated power in the hands of experts. Expectations did not quite match reality. We will see what happens next. I am optimistic about AI’s future, but in the short run we should probably reset our expectations before they warp any further.
We used to brainstorm crazy features and discuss how fun they’d be to build, but as a small team, we never had the time.
Now, with AI, we can create many of these ideas. It’s amazing how much it’s boosted our appetite for building. It’s SOO much fun.
However, the use-case that I really enjoy is when it can speed-run boring tasks for me.
[…]
I’m sure I could have written this script myself, but I didn’t want to. This is the sort of task that I put off for weeks, and maybe never get around to doing it. So being able to get AI to do this for me, makes my life much easier.
Haha, Claude Code just discovered a typo in one of my database table names that’s been there for a couple years (instead of “reportMetadata” it’s “reportMedatata”). I can’t fix it because it would break backward compatibility, but at least I can put a comment there in case I come across it again in another couple years. 😅
Presently (though this changes constantly), the court of vibe fanatics would have us write specifications in Markdown instead of code. Gone is the deep engagement and the depth of craft we are so fluent in: time spent in the corners of codebases, solving puzzles, and uncovering well-kept secrets. Instead, we are to embrace scattered cognition and context switching between a swarm of Agents that are doing our thinking for us. Creative puzzle-solving is left to the machines, and we become mere operators disassociated from our craft.
Some—more than I imagined—seem to welcome this change, this new identity: “Specification Engineering.” Excited to be an operator and cosplaying as Steve Jobs to “Play the Orchestra”. One could only wonder why they became a programmer in the first place, given their seeming disinterest in coding. Did they confuse Woz with Jobs?
[…]
Code reviewing coworkers are rapidly losing their minds as they come to the crushing realization that they are now the first layer of quality control instead of one of the last. Asked to review; forced to pick apart. Calling out freshly added functions that are never called, hallucinated library additions, and obvious runtime or compilation errors. All while the author—who clearly only skimmed their “own” code—is taking no responsibility, going “whoopsie, Claude wrote that. Silly AI, ha-ha.”
What sort of applications are people writing where AI is saving them hours and hours of hand-writing boilerplate code? This is a genuine question because, apart from in perhaps the very early stages of a new application, I very rarely find myself doing this.
I’m starting to file it away with people who think that the speed they can type at is their biggest development bottleneck. Or am I just ponderously slow and I think about my code too much so that typing is the least of my worries?
I’ve found it’s good for chugging out code that is following pretty well established patterns, but more complex than boilerplate.
To give one recent example, I had to get a new bit of info from a few levels down in my controller hierarchy up to the top-ish level of the app. This required adding similar delegate methods in a whole bunch of places and inserting all the “upward” calls. This is annoying as hell to do by hand, but an LLM does very well at this, and quickly.
One thing I find myself doing more as a result of using AI coding tools is leaving more comments in my code as a way of explaining code’s purpose to the agent when it comes across a particular piece of code. But of course this benefits future-me too, so it seems like a win-win!
Someone reported a bug in Retrobatch’s dither node when a gray profile image was passed along - and yep there it was. Had to rewrite the function that did it because I’m a dummy. But then I was like … hey ChatGPT make me a CMYK version of this AND IT DID. And it noticed a memory leak in my implementation and said so in a comment and I’m a dummy.
I’m slowly giving up on coding agents for certain things.
- Code quality overall mostly bad
- It loves to do inline
await import- It often duplicates code that already exists
I keep my notes and TODO list in a wip.md file and it’s getting big. I’m trying this:
- Convert it to PDF
- Cross out and star lines on my reMarkable
- Ask Claude Code to read the PDF and remove lines that are crossed out and move starred lines to top
It… works!
A problem with asking AI to do something you are unable to do yourself is that you are likely also unable to tell if the AI did a good job. It’s automated Dunning-Kruger.
This is the future of software. How we get there (eventually) is still anybody’s guess.
She was sitting in front of the advent calendars and asked if I still had that Gemini app installed. “Of course,” I said. To which she responded with the sudden need to create a game.
My 11 year-old daughter is vibe coding on her own volition. What’s your excuse?
the one cautious experiment I ran was a mixed bag:
- It produced the same approach to an esoteric problem (vectorized UTF16 transcoding in Swift) that I would have
- It produced a useful unit test suite
- It could discuss the code it produced in relevant abstractions and make changes I requested
but
- It had syntax errors initially
- It duplicated code unless specifically instructed to refactor
- When I accidentally asked for something impossible it generated nonsense
Building apps in Swift and SwiftUI isn’t quite as easy for AI tools as other platforms, partly because our language and frameworks evolve rapidly, partly because languages such as Python and JavaScript have a larger codebase to learn from, and partly also because AI tools struggle with Swift concurrency as much as everyone else.
As a result, tools like Claude, Codex, and Gemini often make unhelpful choices you should watch out for. Sometimes you’ll come across deprecated API, sometimes it’s inefficient code, and sometimes it’s just something we can write more concisely, but they are all easy enough to fix so the key is just to know what to expect!
In recent months there’s been a spate of forums threads involving ‘hallucinated’ entitlements. This typically pans out as follows:
- The developer, or an agent working on behalf of the developer, changes their
.entitlementsfile to claim an entitlement that’s not real. That is, the entitlement key is a value that is not, and never has been, supported in any way.- Xcode’s code signing machinery tries to find or create a provisioning profile to authorise this claim.
- That’s impossible, because the entitlement isn’t a real entitlement. Xcode reports this as a code signing error.
Martin Alderson (Hacker News):
I’ve been building software professionally for nearly 20 years. I’ve been through a lot of changes - the ‘birth’ of SaaS, the mass shift towards mobile apps, the outrageous hype around blockchain, and the perennial promise that low-code would make developers obsolete.
The economics have changed dramatically now with agentic coding, and it is going to totally transform the software development industry (and the wider economy). 2026 is going to catch a lot of people off guard.
[…]
AI Agents however in my mind massively reduce the labour cost of developing software.
[…]
A project that would have taken a month now takes a week. The thinking time is roughly the same - the implementation time collapsed. And with smaller teams, you get the inverse of Brooks’s Law: instead of communication overhead scaling with headcount, it disappears. A handful of people can suddenly achieve an order of magnitude more.
This week on AppStories, Federico and I talked about the personal productivity tools we’ve built for ourselves using Claude. They’re hyper-specific scripts and plugins that aren’t likely to be useful to anyone but us, which is fine because that’s all they’re intended to be.
Stu Maschwitz took a different approach. He’s had a complex shortcut called Drinking Buddy for years that tracks alcohol consumption and calculates your Blood Alcohol Level using an established formula. But because he was butting up against the limits of what Shortcuts can do, he vibe coded an iOS version of Drinking Buddy.
But while it took horses decades to be overcome, and chess masters years, it took me all of six months to be surpassed.
Surpassed by a system that costs one thousand times less than I do.
I recently released JustHTML, a python-based HTML5 parser. It passes 100% of the html5lib test suite, has zero dependencies, and includes a CSS selector query API. Writing it taught me a lot about how to work with coding agents effectively.
Cursor is so much better and more efficient at working with iOS projects it put’s Apple’s lame efforts at AI integration in Xcode to shame. I really miss Alex, it’s a crying shame Apple didn’t buy them and let them get sucked into OpenAI to get memory holed instead.
Working on one of my apps that traces its codebase back to iPhone OS 2, and it’s making it a breeze to go from Obj-C > Swift and then modernise the code at the same time.
The juxtaposition of my social circle of peers who are excited that AI is finally almost behind us because a snake oil bubble is about to pop, and my own experience that everything is finally all clicking and I am addicted to doing the best work of my life faster than I ever thought possible is challenging!
2025 will be looked back on as the most transformative time in software engineering. Previously, LLMs could build simple toy apps but weren’t good enough to build anything substantial. I’m convinced that has now changed, and I’m sharing my thoughts and pro tips.
AI has dramatically accelerated how software is written. But speed was never the real bottleneck.
Despite LLMs, The Mythical Man-Month is still surprisingly relevant. Not because of how code is produced, but because of what actually slows software down: coordination, shared understanding, and conceptual integrity.
AI makes code cheap. It does not make software design, architecture, integration, or alignment free.
In fact, faster code generation can amplify old problems[…]
Software engineering is in an interesting place where some of the most accomplished engineers in the industry are effectively saying their job is now just telling AI to write all the code, debug it and fix the bugs.
Some of this is obviously marketing hype from people who are selling AI tools (Boris works on Claude Code) but this is the current mindset of the industry.
Slop drives me crazy and it feels like 95+% of bug reports, but man, AI code analysis is getting really good. There are users out there reporting bugs that don’t know ANYTHING about our stack, but are great AI drivers and producing some high quality issue reports.
More people than ever before will want to learn how to build software using AI: software that works as they expect.
Demand for “fast track to becoming an AI-enabled dev” will probably skyrocket.
The tricky part is that good vibe coding still requires you to think like a programmer even if you cannot write the code yourself.
You need to break problems down, understand what is possible, and know when the AI is going off track.
About half the productivity gain from AI vibe coding comes from the fact that I can work ~80% as effectively and engaged when im tired, whereas previously it would be impossible to do anything mentally complex
At the end of last year, AI agents really came alive for me. Partly because the models got better, but more so because we gave them the tools to take their capacity beyond pure reasoning. Now coding agents are controlling the terminal, running tests to validate their work, searching the web for documentation, and using web services with skills we taught them in plain English. Reality is fast catching the hype!
[…]
See, I never really cared much for the in-editor experience of having AI autocomplete your code as you were writing it. […] But with these autonomous agents, the experience is very different. It’s more like working on a team and less like working with an overly-zealous pair programmer who can’t stop stealing the keyboard to complete the code you were in the middle of writing. With a team of agents, they’re doing their work autonomously, and I just review the final outcome, offer guidance when asked, and marvel at how this is possible at all.
Agents are amazing at coding now, no surprise there. One way to use them that I’ve found valuable is as a tutor, relating new tech stacks to me using familiar parallels from SwiftUI and iOS development.
Yes, I use Claude because it helps me fix more bugs faster. (It also helps with some useful automation.)
I’ve been trying to use Claude to write some AppKit code. It’s very interesting how different it feels using AI for a framework that has a lower quantity of decent examples in the training data. The AI is very quick to make decisions that show only a reasonably superficial understanding.
I spent last night debugging auto layout constraints. I’m not great at auto layout, but the difference is I know my limits, and won’t write any that I don’t understand the implications of!
I think my favourite feature of LLMs is how, when asked to fix a bug in a feature, their predicted solution is to simply delete the feature that has the bug
I don’t think self vibecoded software is the future for businesses
A couple of months ago I vibecoded a tool for a friends business
his entire staff has been using it for six months now (37 people)
the thing is, he’s constantly sending me feature requests, bug fixes
The app is pretty complicated since it deals with insurance benefits verification
so for someone that doesn’t have software development experience you can’t just prompt to fix it (believe me, he tried)
Traditionally, software companies have been stuck within the constraints of existing IT budgets, which tend to tap out around 3-7% of a company’s revenue. This creates an inherent upper limit for what the budget can be for software, which translates into the total addressable market of various technology categories. Now, with AI Agents, the software is actually bringing along the work with the software, which means the budget software players are going after is the total spend that goes into doing that work in the company, not just the tech to enable it. This inevitably leads to a substantial increase in TAM for most software categories whose markets were artificially held back in size previously.
[…]
Today, the vast majority of SaaS products charge on a per-seat basis, which generally corresponds to most of the usage that the software sees today by its end users. But in a world where AI agents do most of the interaction and work on software, enterprise systems will have to evolve to support more of a consumption and usage-based model over time. AI agents don’t cleanly fit as seats on software, because any given AI agent can do a varied amount of work within a system (e.g. you could have 1 agent doing a billion things or a billion agents doing one thing).
Skills, rules, plugins, MCPs, different models — I went in. And, coming out the other side, I’m not entirely certain what to think anymore. Excitement? Nervous? Pumped? All of it?
It’s all different now, but I do know that if you were already an engineer with experience before this AI boom, there has never been a better time in human history to build stuff.
I recently used ChatGPT Codex to find a terrible bug that had been lurking for decades in a codebase without being tracked down. It worked for an hour and fifteen minutes autonomously, produced a reproducing case, carefully ran the debugger, and handed me both the problem and a patch at the end.
Claims like “LLMs can’t debug code” are at complete variance with the real world experience of vast numbers of people.
Not only does an agent not have the ability to evolve a specification over a multi-week period as it builds out its lower components, it also makes decisions upfront that it later doesn’t deviate from. And most agents simply surrender once they feel the problem and solution has gotten away from them (though this rarely happens anymore, since agents will just force themselves through the walls of the maze.)
What’s worse is code that agents write looks plausible and impressive while it’s being written and presented to you. It even looks good in pull requests (as both you and the agent are well trained in what a “good” pull request looks like).
It’s not until I opened up the full codebase and read its latest state cover to cover that I began to see what we theorized and hoped was only a diminishing artifact of earlier models: slop.
It was pure, unadulterated slop. I was bewildered. Had I not reviewed every line of code before admitting it? Where did all this...gunk..come from?
Miklós Koren et al. (Hacker News):
We study the equilibrium effects of vibe coding on the OSS ecosystem. We develop a model with endogenous entry and heterogeneous project quality in which OSS is a scalable input into producing more software. Users choose whether to use OSS directly or through vibe coding. Vibe coding raises productivity by lowering the cost of using and building on existing code, but it also weakens the user engagement through which many maintainers earn returns. When OSS is monetized only through direct user engagement, greater adoption of vibe coding lowers entry and sharing, reduces the availability and quality of OSS, and reduces welfare despite higher productivity. Sustaining OSS at its current scale under widespread vibe coding requires major changes in how maintainers are paid.
I don’t know how to program, and have never made an application. I’m a designer, and an SME for the type of app I was able to create with Claude Code. This opens up a world of possibilities for me and I now have a list of apps for my hobby (astrophotography) that i’m creating. My first app, Laminar.
From what I’ve seen so far by reading the Swift code generated by these things, they are definitely not up to the task, and anyone who thinks so either is doing trivial stuff with them, or has a very different idea of software quality than me.
This is what bothers me most. If these things were actually good, we could solve all the other problems (energy, copyrights, etc). But it’s just not worth it. The result is bad and the frustrating process of iterating leads to nowhere.
Previously:
- Xcode 26.3
- Codex App
- Apple LLM Generating SwiftUI
- Script to Detect Slow USB-C Cables
- Study on AI Coding Tools
- Claude Code Experience
- Software Is Changing (Again)
- Tim, Don’t Kill My Vibe
- Vibe Coding
- How to Use Cursor for iOS Development
6 Comments RSS · Twitter · Mastodon
So many different takes. So interesting.
Couple of elephants seem to be in the room 1) Not too many people seem to be worried about uploading their entire codebase to a third party and basically giving them license to use it (or give it away to competitors).
And
2) From what I understand, all of this functionality is running on energy hungry GPUs, setting money on fire. So if the tech doesn't improve and become more energy efficient in a reasonable amount of time, what happens if these companies start running out of money?
What Jon said. What do these tools cost? And are there already patterns visible between the fans and the critics?
Oh, but foremost: thanks, Michael! More amazing work with this fantastic collection. Another big contribution to a truly great body of work that is this blog.
People talking about using agents and whatnot but not mentioning the price makes me think of agents similarly to how I think of hotel concierge services. I know they exist, but I have no idea what they're going to cost me if I use them.
Some devs might not flinch at spending hundreds of dollars a month. For others $20/month might be a stretch.
New preprint from Anthropic:
https://pivot-to-ai.com/2026/02/06/ai-coding-makes-you-worse-at-learning-and-not-even-any-faster/
https://arxiv.org/abs/2601.20245
The researchers ran 50 test subjects through five basic coding tasks using the Trio library in Python. Some subjects were given an AI assistant, some were not.
The subjects coded in an online interview platform, and the AI users also had the AI assistant.
The researchers used screen and keystroke recording to see what the test subjects did — including those no-AI test subjects who tried using an AI bot anyway.
Afterwards, the researchers tested the subjects on coding skills — debugging, code reading, code writing, and the concepts of Trio.
The coders in the AI group were slightly faster, but it was not statistically significant. The main thing was that the AI group were 17% worse in their understanding:
**The erosion of conceptual understanding, code reading, and debugging skills that we measured among participants using AI assistance suggests that workers acquiring new skills should be mindful of their reliance on AI during the learning process.**