LLMs and Software Development Roundup
Certain tasks have worked well for me. These tasks tend to fit the LLM model well.
[…]
It’s probably not surprising that there is a relationship between the sunk cost fallacy and gambling. Gamblers get a huge dopamine rush when they win. Sunk cost fallacy feeds that. No matter how much they’ve lost it will be worth it because the next hand will be the big winner.
I’m kind of worried these tools are doing the same thing to developers. It’s easy to go “just one more prompt…”
Don’t get me wrong. I’ve seen places these tools excel. But I’m also seeing patterns where developers don’t know when to put them down.
[…]
As for my data structures issue? Claude Code made me realize that maybe I’ve been stalling because I need to plan better. So I closed Claude Code, opened up OmniGraffle, and started sketching some UML. Because it’s probably faster that way.
There’s a lot of talk about LLMs making programmers lazy and uneducated, but I’m learning more than ever thanks to the way LLMs help me to drill into completely unknown areas with such speed. Always wary, but learning with almost every response.
Claude is sooo much better at SwiftData and common pitfalls than ChatGTP is that it’s actually embarrassing that Apple chose ChatGPT for Xcode Code Intelligence.
ChatGPT constantly and consistently hallucinates methods that don’t exist, repeatedly said features that actually existed didn’t exist (for example,
FileManager’strashItem()does work on iOS so long as your app has the correct plist settings! (Was added in iOS 11).
I’ve actually found Gemini 2.5 Pro to be the best at the SwiftUI and AppKit questions I’m asking. They’re all master fabricators of incorrect information and made-up APIs, but I still find their flailing helpful in getting me to try new things and learning new things to search for in the docs.
My feeling is that if you account for the amount of times LLMs lead you down the wrong path coding and waste your time, and how much faster you’ll be in the future if you take the time to truly learn a new skill the first time, it’s pretty much a wash.
This is really good advice.
The code I get from LLMs is directly related how sharp or sloppy I prompt and how well I prime the context.
While debating whether AI can replace software developers or not is somewhat absurd, it’s hard to argue against the significant productivity boosts.
A friend shared with me how after a performance regression following a release, they asked AI to review all of the recent diffs for the culprit. It flagged a few suspicious ones and on review one of them was the cause.
I imagine this is happening in every industry which is why Azure/AWS/GCP have more demand for AI than they can handle.
I shit a lot on LLMs. They can be as stupid as a slice of bread and a huge waste of time when reading the documentation would suffice. That being said, I used ChatGPT o3 on the weekend to set up my new Linux server with a ZFS raid, Portainer and 20 or so containers and it was an enormous time saver for me. Just reading the documentation alone it would probably taken a week or so to get to the same result. There were some frustrating moments, but overall it was very useful.
Sean Michael Kerner (Slashdot):
While enterprise AI adoption accelerates, new data from Stack Overflow’s 2025 Developer Survey exposes a critical blind spot: the mounting technical debt created by AI tools that generate “almost right” solutions, potentially undermining the productivity gains they promise to deliver.
Claude Code has considerably changed my relationship to writing and maintaining code at scale. I still write code at the same level of quality, but I feel like I have a new freedom of expression which is hard to fully articulate.
Claude Code has decoupled myself from writing every line of code, I still consider myself fully responsible for everything I ship to Puzzmo, but the ability to instantly create a whole scene instead of going line by line, word by word is incredibly powerful.
I believe with Claude Code, we are at the “introduction of photography” period of programming. Painting by hand just doesn’t have the same appeal anymore when a single concept can just appear and you shape it into the thing you want with your code review and editing skills.
When I was fooling around with
NSSound.beep(), the AI code completion suggested a duration: parameter - JUST LIKE INSIDE MACINTOSH.
Despite claims that AI today is improving at a fever pitch, it felt largely the same as before. It’s good at writing boilerplate, especially in Javascript, and particularly in React. It’s not good at keeping up with the standards and utilities of your codebase. It tends to struggle with languages like Terraform. It still hallucinates libraries leading to significant security vulnerabilities.
AIs still struggle to absorb the context of a larger codebase, even with a great prompt and
CLAUDE.mdfile. If you use a library that isn’t StackOverflow’s favorite it will butcher it even after an agentic lookup of the documentation. Agents occasionally do something neat like fix the tests they broke. Often they just waste time and tokens, going back and forth with themselves not seeming to gain any deeper knowledge each time they fail. Thus, AI’s best use case for me remains writing one-off scripts. Especially when I have no interest in learning deeper fundamentals for a single script, like when writing a custom ESLint rule.
It’s a sober reminder that the hype around “10x engineers” and all the vibe coding mania is more about clever marketing than actual productivity, and that keeping our processes deliberate isn’t a bad thing after all.
TreeTopologyTroubado (via Dare Obasanjo):
I’ve seen a lot of flak coming from folks who don’t believe AI assisted coding can be used for production code. This is simply not true.
For some context, I’m an AI SWE with a bit over a decade of experience, half of which has been at FAANG or similar companies.
[…]
Anyhow, here’s how we’re starting to use AI for prod code.
[…]
Overall, we’re seeing a ~30% increase in speed from the feature proposal to when it hits prod. This is huge for us.
In an unambiguous message to the global developer community, GitHub CEO Thomas Dohmke warned that software engineers should either embrace AI or leave the profession.
My former colleague Rebecca Parsons, has been saying for a long time that hallucinations aren’t a bug of LLMs, they are a feature. Indeed they are the feature. All an LLM does is produce hallucinations, it’s just that we find some of them useful.
One of the consequences of this is that we should always consider asking the LLM the same question more than once, perhaps with some variation in the wording. Then we can compare answers, indeed perhaps ask the LLM to compare answers for us. The difference in the answers can be as useful as the answers themselves.
[…]
Other forms of engineering have to take into account the variability of the world. A structural engineer builds in tolerance for all the factors she can’t measure. (I remember being told early in my career that the unique characteristic of digital electronics was that there was no concept of tolerances.) Process engineers consider that humans are executing tasks, and will sometimes be forgetful or careless. Software Engineering is unusual in that it works with deterministic machines. Maybe LLMs mark the point where we join our engineering peers in a world on non-determinism.
But after just a few hours vibe coding, I had a working app. A few days later, it even got through App Store approval. You can watch the whole saga here[…]
[…]
But that wasn’t enough. I figured if this was possible, maybe I can build a more complex app. An app I would be proud to share and use daily. I was going to build a podcast app.
[…]
I’m sure in the hands of a skilled developer, these tools can save time, take care of menial bugs, and maybe even provide inspiration. But in the hands of someone with zero coding knowledge, they may be able to build a single-function coffee finder app, but they certainly can’t build a good podcast app.
You’re not alone. I’m an experienced developer (in fact I helped create Pocket Casts) and I think what you found is universal. None of the AIs match the hype. None of them are capable of building a complete app. They are all hype and no substance.
On a more positive note: AI is very good at helping you learn things. I use it almost every day to ask questions. I never get it to code my apps. What if instead of your current approach you use it to slowly learn to code?
On a good day, I’ll ship a week’s worth of product in under a day with Claude Code. On a bad day, I’ll accidentally let my brain switch off, waste the whole day looping pls fix, and start from scratch the next day, coding manually.
The early narrative was that companies would need fewer seniors, and juniors together with AI could produce quality code. At least that’s what I kept seeing. But now, partly because AI hasn’t quite lived up to the hype, it looks like what companies actually need is not junior + AI, but senior + AI.
[…]
So instead of democratizing coding, AI right now has mostly concentrated power in the hands of experts. Expectations did not quite match reality. We will see what happens next. I am optimistic about AI’s future, but in the short run we should probably reset our expectations before they warp any further.
We used to brainstorm crazy features and discuss how fun they’d be to build, but as a small team, we never had the time.
Now, with AI, we can create many of these ideas. It’s amazing how much it’s boosted our appetite for building. It’s SOO much fun.
However, the use-case that I really enjoy is when it can speed-run boring tasks for me.
[…]
I’m sure I could have written this script myself, but I didn’t want to. This is the sort of task that I put off for weeks, and maybe never get around to doing it. So being able to get AI to do this for me, makes my life much easier.
Haha, Claude Code just discovered a typo in one of my database table names that’s been there for a couple years (instead of “reportMetadata” it’s “reportMedatata”). I can’t fix it because it would break backward compatibility, but at least I can put a comment there in case I come across it again in another couple years. 😅
Presently (though this changes constantly), the court of vibe fanatics would have us write specifications in Markdown instead of code. Gone is the deep engagement and the depth of craft we are so fluent in: time spent in the corners of codebases, solving puzzles, and uncovering well-kept secrets. Instead, we are to embrace scattered cognition and context switching between a swarm of Agents that are doing our thinking for us. Creative puzzle-solving is left to the machines, and we become mere operators disassociated from our craft.
Some—more than I imagined—seem to welcome this change, this new identity: “Specification Engineering.” Excited to be an operator and cosplaying as Steve Jobs to “Play the Orchestra”. One could only wonder why they became a programmer in the first place, given their seeming disinterest in coding. Did they confuse Woz with Jobs?
[…]
Code reviewing coworkers are rapidly losing their minds as they come to the crushing realization that they are now the first layer of quality control instead of one of the last. Asked to review; forced to pick apart. Calling out freshly added functions that are never called, hallucinated library additions, and obvious runtime or compilation errors. All while the author—who clearly only skimmed their “own” code—is taking no responsibility, going “whoopsie, Claude wrote that. Silly AI, ha-ha.”
What sort of applications are people writing where AI is saving them hours and hours of hand-writing boilerplate code? This is a genuine question because, apart from in perhaps the very early stages of a new application, I very rarely find myself doing this.
I’m starting to file it away with people who think that the speed they can type at is their biggest development bottleneck. Or am I just ponderously slow and I think about my code too much so that typing is the least of my worries?
I’ve found it’s good for chugging out code that is following pretty well established patterns, but more complex than boilerplate.
To give one recent example, I had to get a new bit of info from a few levels down in my controller hierarchy up to the top-ish level of the app. This required adding similar delegate methods in a whole bunch of places and inserting all the “upward” calls. This is annoying as hell to do by hand, but an LLM does very well at this, and quickly.
One thing I find myself doing more as a result of using AI coding tools is leaving more comments in my code as a way of explaining code’s purpose to the agent when it comes across a particular piece of code. But of course this benefits future-me too, so it seems like a win-win!
Someone reported a bug in Retrobatch’s dither node when a gray profile image was passed along - and yep there it was. Had to rewrite the function that did it because I’m a dummy. But then I was like … hey ChatGPT make me a CMYK version of this AND IT DID. And it noticed a memory leak in my implementation and said so in a comment and I’m a dummy.
I’m slowly giving up on coding agents for certain things.
- Code quality overall mostly bad
- It loves to do inline
await import- It often duplicates code that already exists
I keep my notes and TODO list in a wip.md file and it’s getting big. I’m trying this:
- Convert it to PDF
- Cross out and star lines on my reMarkable
- Ask Claude Code to read the PDF and remove lines that are crossed out and move starred lines to top
It… works!
A problem with asking AI to do something you are unable to do yourself is that you are likely also unable to tell if the AI did a good job. It’s automated Dunning-Kruger.
This is the future of software. How we get there (eventually) is still anybody’s guess.
She was sitting in front of the advent calendars and asked if I still had that Gemini app installed. “Of course,” I said. To which she responded with the sudden need to create a game.
My 11 year-old daughter is vibe coding on her own volition. What’s your excuse?
the one cautious experiment I ran was a mixed bag:
- It produced the same approach to an esoteric problem (vectorized UTF16 transcoding in Swift) that I would have
- It produced a useful unit test suite
- It could discuss the code it produced in relevant abstractions and make changes I requested
but
- It had syntax errors initially
- It duplicated code unless specifically instructed to refactor
- When I accidentally asked for something impossible it generated nonsense
Building apps in Swift and SwiftUI isn’t quite as easy for AI tools as other platforms, partly because our language and frameworks evolve rapidly, partly because languages such as Python and JavaScript have a larger codebase to learn from, and partly also because AI tools struggle with Swift concurrency as much as everyone else.
As a result, tools like Claude, Codex, and Gemini often make unhelpful choices you should watch out for. Sometimes you’ll come across deprecated API, sometimes it’s inefficient code, and sometimes it’s just something we can write more concisely, but they are all easy enough to fix so the key is just to know what to expect!
In recent months there’s been a spate of forums threads involving ‘hallucinated’ entitlements. This typically pans out as follows:
- The developer, or an agent working on behalf of the developer, changes their
.entitlementsfile to claim an entitlement that’s not real. That is, the entitlement key is a value that is not, and never has been, supported in any way.- Xcode’s code signing machinery tries to find or create a provisioning profile to authorise this claim.
- That’s impossible, because the entitlement isn’t a real entitlement. Xcode reports this as a code signing error.
Martin Alderson (Hacker News):
I’ve been building software professionally for nearly 20 years. I’ve been through a lot of changes - the ‘birth’ of SaaS, the mass shift towards mobile apps, the outrageous hype around blockchain, and the perennial promise that low-code would make developers obsolete.
The economics have changed dramatically now with agentic coding, and it is going to totally transform the software development industry (and the wider economy). 2026 is going to catch a lot of people off guard.
[…]
AI Agents however in my mind massively reduce the labour cost of developing software.
[…]
A project that would have taken a month now takes a week. The thinking time is roughly the same - the implementation time collapsed. And with smaller teams, you get the inverse of Brooks’s Law: instead of communication overhead scaling with headcount, it disappears. A handful of people can suddenly achieve an order of magnitude more.
This week on AppStories, Federico and I talked about the personal productivity tools we’ve built for ourselves using Claude. They’re hyper-specific scripts and plugins that aren’t likely to be useful to anyone but us, which is fine because that’s all they’re intended to be.
Stu Maschwitz took a different approach. He’s had a complex shortcut called Drinking Buddy for years that tracks alcohol consumption and calculates your Blood Alcohol Level using an established formula. But because he was butting up against the limits of what Shortcuts can do, he vibe coded an iOS version of Drinking Buddy.
But while it took horses decades to be overcome, and chess masters years, it took me all of six months to be surpassed.
Surpassed by a system that costs one thousand times less than I do.
I recently released JustHTML, a python-based HTML5 parser. It passes 100% of the html5lib test suite, has zero dependencies, and includes a CSS selector query API. Writing it taught me a lot about how to work with coding agents effectively.
Cursor is so much better and more efficient at working with iOS projects it put’s Apple’s lame efforts at AI integration in Xcode to shame. I really miss Alex, it’s a crying shame Apple didn’t buy them and let them get sucked into OpenAI to get memory holed instead.
Working on one of my apps that traces its codebase back to iPhone OS 2, and it’s making it a breeze to go from Obj-C > Swift and then modernise the code at the same time.
The juxtaposition of my social circle of peers who are excited that AI is finally almost behind us because a snake oil bubble is about to pop, and my own experience that everything is finally all clicking and I am addicted to doing the best work of my life faster than I ever thought possible is challenging!
2025 will be looked back on as the most transformative time in software engineering. Previously, LLMs could build simple toy apps but weren’t good enough to build anything substantial. I’m convinced that has now changed, and I’m sharing my thoughts and pro tips.
AI has dramatically accelerated how software is written. But speed was never the real bottleneck.
Despite LLMs, The Mythical Man-Month is still surprisingly relevant. Not because of how code is produced, but because of what actually slows software down: coordination, shared understanding, and conceptual integrity.
AI makes code cheap. It does not make software design, architecture, integration, or alignment free.
In fact, faster code generation can amplify old problems[…]
Software engineering is in an interesting place where some of the most accomplished engineers in the industry are effectively saying their job is now just telling AI to write all the code, debug it and fix the bugs.
Some of this is obviously marketing hype from people who are selling AI tools (Boris works on Claude Code) but this is the current mindset of the industry.
Slop drives me crazy and it feels like 95+% of bug reports, but man, AI code analysis is getting really good. There are users out there reporting bugs that don’t know ANYTHING about our stack, but are great AI drivers and producing some high quality issue reports.
More people than ever before will want to learn how to build software using AI: software that works as they expect.
Demand for “fast track to becoming an AI-enabled dev” will probably skyrocket.
The tricky part is that good vibe coding still requires you to think like a programmer even if you cannot write the code yourself.
You need to break problems down, understand what is possible, and know when the AI is going off track.
About half the productivity gain from AI vibe coding comes from the fact that I can work ~80% as effectively and engaged when im tired, whereas previously it would be impossible to do anything mentally complex
At the end of last year, AI agents really came alive for me. Partly because the models got better, but more so because we gave them the tools to take their capacity beyond pure reasoning. Now coding agents are controlling the terminal, running tests to validate their work, searching the web for documentation, and using web services with skills we taught them in plain English. Reality is fast catching the hype!
[…]
See, I never really cared much for the in-editor experience of having AI autocomplete your code as you were writing it. […] But with these autonomous agents, the experience is very different. It’s more like working on a team and less like working with an overly-zealous pair programmer who can’t stop stealing the keyboard to complete the code you were in the middle of writing. With a team of agents, they’re doing their work autonomously, and I just review the final outcome, offer guidance when asked, and marvel at how this is possible at all.
Agents are amazing at coding now, no surprise there. One way to use them that I’ve found valuable is as a tutor, relating new tech stacks to me using familiar parallels from SwiftUI and iOS development.
Yes, I use Claude because it helps me fix more bugs faster. (It also helps with some useful automation.)
I’ve been trying to use Claude to write some AppKit code. It’s very interesting how different it feels using AI for a framework that has a lower quantity of decent examples in the training data. The AI is very quick to make decisions that show only a reasonably superficial understanding.
I spent last night debugging auto layout constraints. I’m not great at auto layout, but the difference is I know my limits, and won’t write any that I don’t understand the implications of!
I think my favourite feature of LLMs is how, when asked to fix a bug in a feature, their predicted solution is to simply delete the feature that has the bug
I don’t think self vibecoded software is the future for businesses
A couple of months ago I vibecoded a tool for a friends business
his entire staff has been using it for six months now (37 people)
the thing is, he’s constantly sending me feature requests, bug fixes
The app is pretty complicated since it deals with insurance benefits verification
so for someone that doesn’t have software development experience you can’t just prompt to fix it (believe me, he tried)
Traditionally, software companies have been stuck within the constraints of existing IT budgets, which tend to tap out around 3-7% of a company’s revenue. This creates an inherent upper limit for what the budget can be for software, which translates into the total addressable market of various technology categories. Now, with AI Agents, the software is actually bringing along the work with the software, which means the budget software players are going after is the total spend that goes into doing that work in the company, not just the tech to enable it. This inevitably leads to a substantial increase in TAM for most software categories whose markets were artificially held back in size previously.
[…]
Today, the vast majority of SaaS products charge on a per-seat basis, which generally corresponds to most of the usage that the software sees today by its end users. But in a world where AI agents do most of the interaction and work on software, enterprise systems will have to evolve to support more of a consumption and usage-based model over time. AI agents don’t cleanly fit as seats on software, because any given AI agent can do a varied amount of work within a system (e.g. you could have 1 agent doing a billion things or a billion agents doing one thing).
Skills, rules, plugins, MCPs, different models — I went in. And, coming out the other side, I’m not entirely certain what to think anymore. Excitement? Nervous? Pumped? All of it?
It’s all different now, but I do know that if you were already an engineer with experience before this AI boom, there has never been a better time in human history to build stuff.
I recently used ChatGPT Codex to find a terrible bug that had been lurking for decades in a codebase without being tracked down. It worked for an hour and fifteen minutes autonomously, produced a reproducing case, carefully ran the debugger, and handed me both the problem and a patch at the end.
Claims like “LLMs can’t debug code” are at complete variance with the real world experience of vast numbers of people.
Not only does an agent not have the ability to evolve a specification over a multi-week period as it builds out its lower components, it also makes decisions upfront that it later doesn’t deviate from. And most agents simply surrender once they feel the problem and solution has gotten away from them (though this rarely happens anymore, since agents will just force themselves through the walls of the maze.)
What’s worse is code that agents write looks plausible and impressive while it’s being written and presented to you. It even looks good in pull requests (as both you and the agent are well trained in what a “good” pull request looks like).
It’s not until I opened up the full codebase and read its latest state cover to cover that I began to see what we theorized and hoped was only a diminishing artifact of earlier models: slop.
It was pure, unadulterated slop. I was bewildered. Had I not reviewed every line of code before admitting it? Where did all this...gunk..come from?
Miklós Koren et al. (Hacker News):
We study the equilibrium effects of vibe coding on the OSS ecosystem. We develop a model with endogenous entry and heterogeneous project quality in which OSS is a scalable input into producing more software. Users choose whether to use OSS directly or through vibe coding. Vibe coding raises productivity by lowering the cost of using and building on existing code, but it also weakens the user engagement through which many maintainers earn returns. When OSS is monetized only through direct user engagement, greater adoption of vibe coding lowers entry and sharing, reduces the availability and quality of OSS, and reduces welfare despite higher productivity. Sustaining OSS at its current scale under widespread vibe coding requires major changes in how maintainers are paid.
I don’t know how to program, and have never made an application. I’m a designer, and an SME for the type of app I was able to create with Claude Code. This opens up a world of possibilities for me and I now have a list of apps for my hobby (astrophotography) that i’m creating. My first app, Laminar.
From what I’ve seen so far by reading the Swift code generated by these things, they are definitely not up to the task, and anyone who thinks so either is doing trivial stuff with them, or has a very different idea of software quality than me.
This is what bothers me most. If these things were actually good, we could solve all the other problems (energy, copyrights, etc). But it’s just not worth it. The result is bad and the frustrating process of iterating leads to nowhere.
Previously:
- Xcode 26.3
- Codex App
- Apple LLM Generating SwiftUI
- Script to Detect Slow USB-C Cables
- Study on AI Coding Tools
- Claude Code Experience
- Software Is Changing (Again)
- Tim, Don’t Kill My Vibe
- Vibe Coding
- How to Use Cursor for iOS Development
Update (2026-02-09): Steve Troughton-Smith:
Much as you don’t generally go auditing the bytecode or intermediate representation generated by your compiler, I think the idea of manually reviewing LLM-written code will fall by the wayside too. Like it or not, these agents are the new compilers, and prompting them is the new programming. Regardless of what happens with any AI bubble, this is just how things will be from now on; we’ve experienced a permanent, irreversible increase to the level of abstraction. We are all assembly programmers
AI coding assistants like Claude Code and Cursor have changed the way I work. My daily programming today looks nothing like it did even a couple years ago. Today, I hardly ever write individual lines of code. AI coding assistants has relieved me of this. It’s better at it than I am. I’m OK with that.
[…]
Now that I can delegate a lot of this work AI coding assistants, and that means I can focus more on thinking about exactly what I want to make, rather than tediously and laboriously trying to achieve my desired effects. I now spend more time thinking about the edifice as a whole, rather than on building it up brick by brick.
[…]
Today, optimizing [assemblers] lie several levels beneath the notice of contemporary real programmers. Over time, we have simply come to accept the loss of detail Mel thought was essential to proper work—since it wasn’t actually essential to the task. It was merely essential to Mel’s view of himself as a programmer.
Dan Shapiro (via Matt Massicotte):
Ward Cunningham coined the phrase “technical debt” in 1992. He was working on a financial application called WyCash and needed a metaphor to explain to his boss why they should spend time improving their code instead of shipping the next feature. For decades, the balance was simple: carry a little debt to move faster, but pay it down as soon as you can, or the accumulated mess will overwhelm and bankrupt you.
[…]
Usually, deflation is bad for debtors because money becomes harder to come by. But technical debt is different: you don’t owe money, you owe work. And the cost of work is what’s deflating. The cost to pay off your debt – the literal dollars and hours required to fix the mess – is diminishing. It is cheaper to clean up your code today than it has ever been. And if you put it off? It becomes cheaper still. This leads to a striking reversal: technical debt becomes a wise investment.
[…]
This leads to a surprising conclusion for anyone managing a roadmap. You should be willing to take on more technical debt than you ever would have before.
Please don’t give Federighi any ideas.
Nolan Lawson (Hacker News, Mastodon):
I didn’t ask for a robot to consume every blog post and piece of code I ever wrote and parrot it back so that some hack could make money off of it.
I didn’t ask for the role of a programmer to be reduced to that of a glorified TSA agent, reviewing code to make sure the AI didn’t smuggle something dangerous into production.
[…]
If you would like to grieve, I invite you to grieve with me. We are the last of our kind, and those who follow us won’t understand our sorrow. Our craft, as we have practiced it, will end up like some blacksmith’s tool in an archeological dig, a curio for future generations. It cannot be helped, it is the nature of all things to pass to dust, and yet still we can mourn. Now is the time to mourn the passing of our craft.
Do you think there will remain a market for “artisanal” coders making “luxury” apps even after AI takes over the mainstream? Like how you can still buy bespoke and boutique furniture even in a world of IKEA?
LLM AI programming agents are not good for mental health, it supercharges FOMO and takes procrastination to next level.
I have about 20 large changesets across 2 computers and 5 repos generated with AI agents ready to be pushed that I just can’t force myself to review/take a look at.
Meanwhile there are small changes my apps need that I’m avoiding to do because I HAVE TO BE ON THE FUTURE TRAIN.
I’ve always been skeptical of the alleged productivity gains of Swift over Objective-C. That goes double for the alleged productivity gains of LLMs over manual coding. ;-)
My view is that if coding speed is the bottleneck in your development process, you’re probably coding too fast. Perhaps you should slow down and THINK more about your product and especially your users.
Nobody who claims AI is oversold hype has ever had a 5 day problem reduced to a 15 minute, interactive solution. Again and again. You do have to know how to use it.
I’ve also seen codebases where many five day problems have been added in 1000+ line commits from misuse of the these tools.
I shipped more code last quarter than any quarter in my career. I also felt more drained than any quarter in my career. These two facts are not unrelated.
[…]
AI genuinely makes individual tasks faster. That’s not a lie. What used to take me 3 hours now takes 45 minutes. Drafting a design doc, scaffolding a new service, writing test cases, researching an unfamiliar API. All faster.
But my days got harder. Not easier. Harder.
Update (2026-02-20): Andrej Karpathy:
I think it must be a very interesting time to be in programming languages and formal methods because LLMs change the whole constraints landscape of software completely. Hints of this can already be seen, e.g. in the rising momentum behind porting C to Rust or the growing interest in upgrading legacy code bases in COBOL or etc. In particular, LLMs are especially good at translation compared to de-novo generation because 1) the original code base acts as a kind of highly detailed prompt, and 2) as a reference to write concrete tests with respect to. That said, even Rust is nowhere near optimal for LLMs as a target language. What kind of language is optimal? What concessions (if any) are still carved out for humans? Incredibly interesting new questions and opportunities. It feels likely that we’ll end up re-writing large fractions of all software ever written many times over.
100% agree with you Andrej. We’re building Mojo to be that target and seeing great results. People are already one-shotting large python conversions to Mojo and getting 1000x speedups.
The frequency of me launching and just leave Xcode in this state becomes higher and higher with every model update…
So if you’re trying this agentic coding thing out, there are a couple key pieces of advice that made a huge difference for me.
22 Comments RSS · Twitter · Mastodon
So many different takes. So interesting.
Couple of elephants seem to be in the room 1) Not too many people seem to be worried about uploading their entire codebase to a third party and basically giving them license to use it (or give it away to competitors).
And
2) From what I understand, all of this functionality is running on energy hungry GPUs, setting money on fire. So if the tech doesn't improve and become more energy efficient in a reasonable amount of time, what happens if these companies start running out of money?
What Jon said. What do these tools cost? And are there already patterns visible between the fans and the critics?
Oh, but foremost: thanks, Michael! More amazing work with this fantastic collection. Another big contribution to a truly great body of work that is this blog.
People talking about using agents and whatnot but not mentioning the price makes me think of agents similarly to how I think of hotel concierge services. I know they exist, but I have no idea what they're going to cost me if I use them.
Some devs might not flinch at spending hundreds of dollars a month. For others $20/month might be a stretch.
New preprint from Anthropic:
https://pivot-to-ai.com/2026/02/06/ai-coding-makes-you-worse-at-learning-and-not-even-any-faster/
https://arxiv.org/abs/2601.20245
The researchers ran 50 test subjects through five basic coding tasks using the Trio library in Python. Some subjects were given an AI assistant, some were not.
The subjects coded in an online interview platform, and the AI users also had the AI assistant.
The researchers used screen and keystroke recording to see what the test subjects did — including those no-AI test subjects who tried using an AI bot anyway.
Afterwards, the researchers tested the subjects on coding skills — debugging, code reading, code writing, and the concepts of Trio.
The coders in the AI group were slightly faster, but it was not statistically significant. The main thing was that the AI group were 17% worse in their understanding:
**The erosion of conceptual understanding, code reading, and debugging skills that we measured among participants using AI assistance suggests that workers acquiring new skills should be mindful of their reliance on AI during the learning process.**
https://github.com/serrebi/BlindRSS
Vibe-coded RSS feed reader for Windows that's screen-reader friendly and syncs to a bunch of services. The author admits on the Double Tap podcast that he barely knows how to code a Windows app in Python.
Whatever else you think of vibe-coding—and I certainly have my issues with the destruction of the care and the craft—there's know denying that this is empowering for those of us who've not had options in the past. It's a paper covering for a flaw in our society, but it's there and it's real.
And I just *knew* it would be the first vibe-coded app for blind people on Windows. RSS feed reader options for macOS are already here, including lire, NetNewsWire and Vienna.
That wrapup looks like a lot of different ways to say "the documentation is so bad, you can't use it to learn anything you don't know, and so poorly organised, you can't use it to find anything you already know".
Makes sense, given how much of AI in non-programmer contexts seems to be about avoiding having to make a UI that makes a process addressable with user-tools.
I hold out paying for a ChatGPT subscription for a long time. Since then I develop almost everything together with ChatGPT.
It's still an issue when ChatGPT goes around in circles. But it makes developing software so much easier.
I had a problem in a query for the chat database of Messages. I dropped the database definition into ChatGPT and it solved the SQL problem immediately.
Then I found an odd issue with search result for an fts SQLite database. ChatGPT told me that I have been using fts wrong. It suggested using fts5 and not 4, gave me the updated code and added some special cases.
It's will be interesting to see what the unknown unknowns are.
For now, coding with Claude in Vs code is so much fun, and so extremely productive.
"So if the tech doesn't improve and become more energy efficient in a reasonable amount of time, what happens if these companies start running out of money?"
Nothing.
Take Kimi K2.5. It's an extremely competent model that competes with Anthropic's and OpenAI's top models. It's open weights, so there are tons of providers that offer it. These providers are not losing money like Anthropic and OpenAI; they are making money. They're offering K2.5 at prices that compete with Anthropic and OpenAI.
Even if all the frontier labs went out of business immediately and nobody picked up the pieces, we'd still have models on par with their high-end models available at prices competitive with what they're charging now.
"It’s a sober reminder that the hype around “10x engineers” and all the vibe coding mania is more about clever marketing than actual productivity"
Well, that really depends. Do you care about your code? Then you won't be much faster using an LLM. A little bit faster, sure. Maybe 10-30% faster, but not 10×.
Do you give zero shits about security, bugs, code quality, unit test quality, regressions, or anything like that? Then you will be 10× faster at shipping new features.
"there's know denying that this is empowering for those of us who've not had options in the past"
Yes, and that's great. Make your own tools tailored to you and your needs; that's powerful and fun and amazing. The issue arises when people publish their apps for others to use without any idea of what they actually do.
Oh, about that:
"I wish people would say how much they're spending per day or per week."
I'm playing around with a test instance of OpenClaw. I'm mostly running it on GLM-4.6V. This currently costs about $300 a year for what essentially amounts to unlimited inference, but it isn't adequate for more complex behavior.
For a better model, a Kimi subscription at the Allegretto level should probably be sufficient for most people. That's $500 a year, but with limited inference. One option is to combine the two subscriptions and switch between the models based on task complexity.
GLM will probably also release GLM-5 soon, which should be a much stronger model at reasonable cost.
However, many people are hooking OpenClaw up to the Claude API, which can lead to absolutely insane costs - hundreds of dollars a day.
> I think my favourite feature of LLMs is how, when asked to fix a bug in a feature, their predicted solution is to simply delete the feature that has the bug
Isn't it exactly what Apple does to fix bugs?
Funny story: I showed Miguel Arroz https://github.com/rcarmo/daisy and he had a lot of criticism for the Switt version (which is pretty common with LLMs because there isn't a lot of Swift code out there, and the various breaking changes across versions make the corpus hard to use). I took all of his complaints, created a Copilot skill, and built more tools :)
*grumble grumble shakes fist at AI-shaped cloud*
I'm with the people who are excitedly awaiting the snake-oil-salesmen bubble bursting. The amount of money, energy, and everything else being poured into this has never been sustainable and something has to give at some point, hopefully sooner rather than later.
> > I think my favourite feature of LLMs is how, when asked to fix a bug in a feature, their predicted solution is to simply delete the feature that has the bug
> Isn't it exactly what Apple does to fix bugs?
Other than when they do wholesale rewrites of apps (iTunes -> Music), they generally don't remove major features from existing apps. But they often also don't fix the bugs, so...
@ObjC - Wrt #2, this is exactly why I’m taking advantage of these tools now to assist me with the biggest tasks in my backlog. I’m figuring that there’s going to be a reckoning, maybe as soon as this year, where these companies either shutter or the prices go sky-high. I have no problem burning through investors money/allowing investors to subsidize my product development - a form of privatizing the profits (for me) and socializing the losses (on investors). 🤣
@Jon H - I personally subscribe to JetBrains AI Pro which is $100/year, or about $8/month. I know other developers local to me who spent like 1.5k in the last month. They consider it a business write off and much more affordable than hiring an employee (they’re not yet at a level of profitability that could sustain an employee).
"I think the idea of manually reviewing LLM-written code will fall by the wayside too"
Spoken like someone who doesn't look at the shit LLMs generate. Even the best models, like 5.3-Codex, are only good at getting things to kind of work. They require manual cleanup.
I would be ashamed of myself if I sent unfixed LLM code into a PR. I would look like an incompetent clown on acid that just accidentally typed the right letters to get something to compile.
Perhaps that will change in a year; who knows. And perhaps in some cases, it makes sense to commit garbage and hope that LLMs in a year are smart enough to fix that garbage. But in the meantime, you're still shipping buggy garbage.
And yet, reality check:
Software is as bad as before and the trend of quality going down overtime didn't stop. Even just the ChatGPT website on iOS freezes and lags as you type text if the thread is too long to the point of being unusable. That's the superior code powered by AI makers and experts?
If AI was so good, we would right now be in a golden age of quality software with better performance and more polished. The difference would be day and night. We got none of that, whatever ex-Apple engineer, ex-famous dev or whomever claims.
> I have no problem burning through investors money/allowing investors to subsidize my product development - a form of privatizing the profits (for me) and socializing the losses (on investors). 🤣
@Ben Haha I hear you.
I'm still worried about #1 (sharing entire codebase with them). In theory the more codebases we share with them the better their models can get. So we *could* be helping them train something that replaces us (still skeptical about that). I find myself using ChatGPT temporary chats a lot. But who knows if they really are throwing these conversations away?
It's easy to just say to yourself "I'm just one tiny little developer" my code can't help them that much. But collectively developers are giving these companies a lot.
> Nobody who claims AI is oversold hype has ever had a 5 day problem reduced to a 15 minute, interactive solution. Again and again. You do have to know how to use it.
There is a huge gap between being this useful tool (I agree it is) versus being able to have the LLM do everything. I read a thread on Hacker News yesterday. So many people just want to shut their brains off and play. And then there are others who want to use this because it's cheaper than hiring a developer. But if these companies keep burning that money and all of a sudden your Claude subscription costs like 7k a month and you vibe coded a calculator app the whole thing kind of collapses.
I think Jason Gorman put it nicely when he said: "Where is all this AI-generated software? […] I've spent three years looking into this. I feel like James Randi at a spoon-bending convention sometimes."
I can relate. All I see still are just silly, unimpressive and isolated prototypes. And please do not mention the OpenClaw security nightmare to me. That’s a piece of unimpressive junk as well.
"In an unambiguous message to the global developer community, GitHub CEO Thomas Dohmke warned that software engineers should either embrace AI or leave the profession."
Rude but also why am I not surprised that Github, owned by Microsoft, a company that has bet its entire future on AI, wants to do whatever they can to make people use AI. It's just marketing and people should treat it the way they treat any advertisement
"Like it or not, these agents are the new compilers, and prompting them is the new programming."
I respectfully disagree with this take. The relationship between prompts and source code in a high level language is not the same as that between the source code and the machine code. An actual compiler preserves the precise semantics of the source code when compiling—if it doesn't that's a bug in the compiler. Natural language does not have such semantics.
Maybe one day the agents will be smart enough to prompt you for the exact semantics you want. Or maybe they will be smart enough to make the same (or better decisions). I expect we'll see that day in most of our lifetimes, but it is a bit of a frightening future and not at all like having a really good compiler.