Codex for Almost Everything
Codex can now operate your computer alongside you, work with more of the tools and apps you use everyday, generate images, remember your preferences, learn from previous actions, and take on ongoing and repeatable work. The Codex app also now includes deeper support for developer workflows, like reviewing PRs, viewing multiple files & terminals, connecting to remote devboxes via SSH, and an in-app browser to make it faster to iterate on frontend designs, apps, and games.
With background computer use, Codex can now use all of the apps on your computer by seeing, clicking, and typing with its own cursor. Multiple agents can work on your Mac in parallel, without interfering with your own work in other apps. For developers, this is helpful for iterating on frontend changes, testing apps, or working in apps that don’t expose an API.
It was just over a week ago that OpenAI raised $122 billion in financing and announced it was shifting its focus to building a superapp that brings the capabilities of its models into a unified experience. It turns out that app is Codex, OpenAI’s app that, until today, was focused primarily on developing software.
However, according to OpenAI, 50% of Codex’s users were already giving it non-coding tasks to complete. Combined with the OS flexibility of a desktop environment, that made Codex the natural place to bring together a wide range of new productivity and coding features.
[…]
OpenAI has drawn aspects of its Atlas browser into Codex, too. This allows Codex to prototype websites and apps that users can comment on in-line, creating a tight feedback loop for refining designs. Currently, this feature is limited to running sites and apps via a local server setup, but OpenAI says it will be extended to incorporate actions like interacting with the greater Internet, taking screenshots, and stepping through user flows in the future.
The feature that OpenAI rolled out in Codex is literally based on the Sky app that I exclusively previewed last year, and which was later acquired by OpenAI along with the team that built it.
[…]
I’m not exaggerating when I say that Codex now features the best computer use feature I have ever tested in any LLM or desktop agent. In fact, it’s even better than the computer use feature I used in Sky last year: Sky’s computer use was great, but it was considerably slower than Codex’s current one because it was running on Anthropic’s Claude models. With Codex for Mac today, even the (kind of slow) GPT 5.4 is faster than Sky ever was. But, using Codex with fast mode or – for simpler tasks – the Cerebras-hosted GPT-5.3-Codex-Spark model yields dramatically faster performance than Sky for Mac delivered in 2025.
[…]
We all have Apple’s Accessibility team to thank for the technology that allows Codex’s computer use tool to exist. To build it, the Codex team took advantage of an advanced accessibility feature that allows third-party apps to read the “accessibility hierarchy” (also known as “AX Tree”) of any app open on macOS. My understanding is that this technology was primarily created to allow screen-readers and other assistive tools to work with Mac apps regardless of their automation/scripting features. In this case, it’s been repurposed as a way for Codex to ingest the full contents and hierarchy of any window and, essentially, load it as context for the LLM.
Developers, I recommend you do this asap: ask Codex to run your app and try to figure out how to do a task, without seeding it with any information.
It’s like putting a new user in front of the screen, and watching how they operate it. It will very rapidly expose any problems you have in messaging or user education, and it’s a little eye-opening if you’ve never (or not recently) run user tests.
Previously:
- Perplexity Personal Computer
- Gemini App for Mac
- OpenClaw Developer Joins OpenAI
- Codex App
- Sky Acquired by OpenAI
- ChatGPT Atlas
- UI Browser 4
Update (2026-05-18): Tim Hardwick:
OpenAI has brought its Codex coding agent to the ChatGPT mobile app, providing iPhone and Android users with remote access to Codex sessions running on a Mac.
2 Comments RSS · Twitter · Mastodon
My paradox is that any simple task I’d trust this with, I probably can just quickly do myself, whereas any task that’s involved and better suited to automation, I would never trust this to do correctly. And even if it did I would always have to check its work, which defeats the purpose.
Riding with Keyboard Maestro and AppleScript til I die (or move to Linux).
I've been using software to drive a browser to download my financial information on a regular basis for the last twenty years or so, so I can think of a number of useful applications for AI controlling a computer. Companies have greatly increased the friction involved in extracting desired information. AI could serve as a counter-force.
For example, an AI agent could log into Facebook and look up individual friends, explore their feeds and present their combined posts in chronological order. It could log into X/Twitter and act similarly, presenting me with a chronological feed of posts from a list of preferred posters. It could log into Amazon and search for a book given its title and author and put books that match at the top of the search results. It could improve search engine results by examining the target pages and eliminating or summarizing the obviously AI generated ones and presenting the results in descending order of relevance to the search terms. Now that Apple supports mirroring one's iPhone on one's desktop, AI could consolidate messaging currently requiring multiple applications.
AI could be a powerful force for dis-enshittification. It would set the internet back ten or twenty years which for most people would be a good thing.
I'm sure this would involve violating the terms of service, but once everyone has an AI that can drive a browser, it would be hard to detect. There would definitely be a cat and mouse game, but this is one area where AI's flexibility may be a real asset.