Codex for Almost Everything
Codex can now operate your computer alongside you, work with more of the tools and apps you use everyday, generate images, remember your preferences, learn from previous actions, and take on ongoing and repeatable work. The Codex app also now includes deeper support for developer workflows, like reviewing PRs, viewing multiple files & terminals, connecting to remote devboxes via SSH, and an in-app browser to make it faster to iterate on frontend designs, apps, and games.
With background computer use, Codex can now use all of the apps on your computer by seeing, clicking, and typing with its own cursor. Multiple agents can work on your Mac in parallel, without interfering with your own work in other apps. For developers, this is helpful for iterating on frontend changes, testing apps, or working in apps that don’t expose an API.
It was just over a week ago that OpenAI raised $122 billion in financing and announced it was shifting its focus to building a superapp that brings the capabilities of its models into a unified experience. It turns out that app is Codex, OpenAI’s app that, until today, was focused primarily on developing software.
However, according to OpenAI, 50% of Codex’s users were already giving it non-coding tasks to complete. Combined with the OS flexibility of a desktop environment, that made Codex the natural place to bring together a wide range of new productivity and coding features.
[…]
OpenAI has drawn aspects of its Atlas browser into Codex, too. This allows Codex to prototype websites and apps that users can comment on in-line, creating a tight feedback loop for refining designs. Currently, this feature is limited to running sites and apps via a local server setup, but OpenAI says it will be extended to incorporate actions like interacting with the greater Internet, taking screenshots, and stepping through user flows in the future.
The feature that OpenAI rolled out in Codex is literally based on the Sky app that I exclusively previewed last year, and which was later acquired by OpenAI along with the team that built it.
[…]
I’m not exaggerating when I say that Codex now features the best computer use feature I have ever tested in any LLM or desktop agent. In fact, it’s even better than the computer use feature I used in Sky last year: Sky’s computer use was great, but it was considerably slower than Codex’s current one because it was running on Anthropic’s Claude models. With Codex for Mac today, even the (kind of slow) GPT 5.4 is faster than Sky ever was. But, using Codex with fast mode or – for simpler tasks – the Cerebras-hosted GPT-5.3-Codex-Spark model yields dramatically faster performance than Sky for Mac delivered in 2025.
[…]
We all have Apple’s Accessibility team to thank for the technology that allows Codex’s computer use tool to exist. To build it, the Codex team took advantage of an advanced accessibility feature that allows third-party apps to read the “accessibility hierarchy” (also known as “AX Tree”) of any app open on macOS. My understanding is that this technology was primarily created to allow screen-readers and other assistive tools to work with Mac apps regardless of their automation/scripting features. In this case, it’s been repurposed as a way for Codex to ingest the full contents and hierarchy of any window and, essentially, load it as context for the LLM.
Developers, I recommend you do this asap: ask Codex to run your app and try to figure out how to do a task, without seeding it with any information.
It’s like putting a new user in front of the screen, and watching how they operate it. It will very rapidly expose any problems you have in messaging or user education, and it’s a little eye-opening if you’ve never (or not recently) run user tests.
Previously:
- Perplexity Personal Computer
- Gemini App for Mac
- OpenClaw Developer Joins OpenAI
- Codex App
- Sky Acquired by OpenAI
- ChatGPT Atlas
- UI Browser 4
1 Comment RSS · Twitter · Mastodon
My paradox is that any simple task I’d trust this with, I probably can just quickly do myself, whereas any task that’s involved and better suited to automation, I would never trust this to do correctly. And even if it did I would always have to check its work, which defeats the purpose.
Riding with Keyboard Maestro and AppleScript til I die (or move to Linux).