{"id":48539,"date":"2025-07-18T13:57:53","date_gmt":"2025-07-18T17:57:53","guid":{"rendered":"https:\/\/mjtsai.com\/blog\/?p=48539"},"modified":"2025-07-21T14:59:51","modified_gmt":"2025-07-21T18:59:51","slug":"study-on-ai-coding-tools","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2025\/07\/18\/study-on-ai-coding-tools\/","title":{"rendered":"Study on AI Coding Tools"},"content":{"rendered":"<p><a href=\"https:\/\/metr.org\/blog\/2025-07-10-early-2025-ai-experienced-os-dev-study\/\">METR<\/a> (<a href=\"https:\/\/news.ycombinator.com\/item?id=44524109\">Hacker News<\/a>):<\/p>\n<blockquote cite=\"https:\/\/metr.org\/blog\/2025-07-10-early-2025-ai-experienced-os-dev-study\/\"><p>We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without&mdash;AI makes them slower. We view this result as a snapshot of early-2025 AI capabilities in one relevant setting; as these systems continue to rapidly evolve, we plan on continuing to use this methodology to help estimate AI acceleration from AI R&amp;D automation.<\/p><p>See the <a href=\"https:\/\/arxiv.org\/abs\/2507.09089\">full paper<\/a> for more detail.<\/p><\/blockquote>\n\n<p>Via <a href=\"https:\/\/www.theregister.com\/2025\/07\/11\/ai_code_tools_slow_down\/\">Thomas Claburn<\/a>:<\/p>\n<blockquote cite=\"https:\/\/www.theregister.com\/2025\/07\/11\/ai_code_tools_slow_down\/\"><p>Not only did the use of AI tools hinder developers, but it led them to hallucinate, much like the AIs have a tendency to do themselves. The developers predicted a 24 percent speedup, but even <em>after<\/em> the study concluded, they believed AI had helped them complete tasks 20 percent faster when it had actually delayed their work by about that percentage.<\/p><p>[&#8230;]<\/p><p>The study involved 16 experienced developers who work on large, open source projects. The developers provided a list of real issues (e.g. bug fixes, new features, etc.) they needed to address &#x2013; 246 in total &#x2013; and then forecast how long they expected those tasks would take. The issues were randomly assigned to allow or disallow AI tool usage.<\/p><\/blockquote>\n\n<p>I&rsquo;m skeptical about the experimental design, and I suspect there&rsquo;s huge variance in how much developers in the real world get out of AI.<\/p>\n\n<p><a href=\"https:\/\/x.com\/ruben_bloom\/status\/1943532547935473800\">Ruben Bloom<\/a>:<\/p>\n<blockquote cite=\"https:\/\/x.com\/ruben_bloom\/status\/1943532547935473800\"><p>I was one of the developers in the \n@METR_Evals\n study. Thoughts:<\/p><p>1. This is much less true of my participation in the study where I was more conscientious, but I feel like historically a lot of my AI speed-up gains were eaten by the fact that while a prompt was running, I&rsquo;d look at something else (FB, X, etc) and continue to do so for much longer than it took the prompt to run.<\/p><p>I discovered two days ago that Cursor has (or now has) a feature you can enable to ring a bell when the prompt is done. I expect to reclaim a lot of the AI gains this way.<\/p><p>[&#8230;]<\/p><p>4. As a developer in the study, it&rsquo;s striking to me how much more capable the models have gotten since February (when I was participating in the study)<\/p><p>[&#8230;]<\/p><p>5. There was a selection effect in which tasks I submitted to the study. (a) I didn&rsquo;t want to risk getting randomized to &ldquo;no AI&rdquo; on tasks that felt sufficiently important or daunting to do without AI assistance. (b) Neatly packaged and well-scoped tasks felt suitable for the study, large open-ended greenfield stuff felt harder to legibilize, so I didn&rsquo;t submit those tasks to study even though AI speed up might have been larger.<\/p><\/blockquote>\n\n<p>Previously:<\/p>\n<ul>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2025\/06\/27\/claude-code-experience\/\">Claude Code Experience<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2025\/06\/26\/software-is-changing-again\/\">Software Is Changing (Again)<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2025\/03\/21\/vibe-coding\/\">Vibe Coding<\/a><\/li>\n<\/ul>\n\n<p id=\"study-on-ai-coding-tools-update-2025-07-21\">Update (<a href=\"#study-on-ai-coding-tools-update-2025-07-21\">2025-07-21<\/a>): <a href=\"https:\/\/mas.to\/@carnage4life\/114886321782807123\">Dare Obasanjo<\/a>:<\/p>\n<blockquote cite=\"https:\/\/mas.to\/@carnage4life\/114886321782807123\">\n<p>Remember the study that showed developers think vibe coding saves them time but measurements show it doesn&rsquo;t after factoring in time prompting and reviewing the AI&rsquo;s work?<\/p>\n<p>A startup founder is on X <a href=\"https:\/\/x.com\/jasonlk\/status\/1946589071519948952\">documenting his vibe coding struggles with Replit<\/a> which includes deleting the production database and ignoring requests not to make changes without asking for permission.<\/p>\n<\/blockquote>","protected":false},"excerpt":{"rendered":"<p>METR (Hacker News): We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without&mdash;AI makes them slower. We view this result as a snapshot of early-2025 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"2025-07-18T17:57:55Z","apple_news_api_id":"83d729bd-6144-4fd5-b6d4-4555310ff51b","apple_news_api_modified_at":"2025-07-21T18:59:54Z","apple_news_api_revision":"AAAAAAAAAAAAAAAAAAAAAA==","apple_news_api_share_url":"https:\/\/apple.news\/Ag9cpvWFET9W21EVVMQ_1Gw","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[2],"tags":[1351,2682,75,71],"class_list":["post-48539","post","type-post","status-publish","format-standard","hentry","category-technology","tag-artificial-intelligence","tag-claude","tag-developertool","tag-programming"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/48539","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=48539"}],"version-history":[{"count":2,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/48539\/revisions"}],"predecessor-version":[{"id":48552,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/48539\/revisions\/48552"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=48539"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=48539"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=48539"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}