Archive for May 27, 2025

Tuesday, May 27, 2025

External Payments From the HEY App

David Heinemeier Hansson:

Well, we risked everything, but also secured a four-year truce, and now near-total victory is at hand: HEY is finally for sale on the iPhone in the US!

Credit for this amazing turn of events goes to Epic Games founders Tim Sweeney and Mark Rein, who did what no small developer like us could ever dream of doing: they spent over $100 million to sue Apple in court. And while the first round yielded very little progress, Apple’s (possibly criminal) contempt of court is what ultimately delivered the resolution. Thanks to their fight for Fortnite, app developers everywhere are now allowed to link out of apps to their own web-based payment system in the US store (but, sadly, nowhere else yet).

This is all we ever wanted from Apple: to have a way to distribute our iPhone apps and keep the customer relationship by billing directly. The 30% toll gets all the attention, and it is ludicrously egregious, but to us, it’s just as much about retaining that direct customer relationship, so we can help folks with refunds, so they don’t tie their billing for a multi-platform email system to a single manufacturer.

John Gruber:

This is a win for users, and Apple won’t lose a cent from commissions from any of these apps.

Previously:

Google I/O 2025

Kyle Wiggers and Karyne Levy:

Gemini Ultra (only in the U.S. for now) delivers the “highest level of access” to Google’s AI-powered apps and services, according to Google. It’s priced at $249.99 per month and includes Google’s Veo 3 video generator, the company’s new Flow video editing app, and a powerful AI capability called Gemini 2.5 Pro Deep Think mode, which hasn’t launched yet.

[…]

Deep Think is an “enhanced” reasoning mode for Google’s flagship Gemini 2.5 Pro model. It allows the model to consider multiple answers to questions before responding, boosting its performance on certain benchmarks.

[…]

Both Veo 3 and Imagen 4 will be used to power Flow, the company’s AI-powered video tool geared toward filmmaking.

Google (articles):

Here’s a list of I/O 2025’s highlights — many of which you can try today!

Previously:

Claude 4

Anthropic (Hacker News, MacRumors):

Claude Opus 4 is the world’s best coding model, with sustained performance on complex, long-running tasks and agent workflows. Claude Sonnet 4 is a significant upgrade to Claude Sonnet 3.7, delivering superior coding and reasoning while responding more precisely to your instructions.

[…]

Both models can use tools—like web search—during extended thinking, allowing Claude to alternate between reasoning and tool use to improve responses.

[…]

Both models can use tools in parallel, follow instructions more precisely, and—when given access to local files by developers—demonstrate significantly improved memory capabilities, extracting and saving key facts to maintain continuity and build tacit knowledge over time.

Simon Willison (Hacker News):

Anthropic publish most of the system prompts for their chat models as part of their release notes. They recently shared the new prompts for both Claude Opus 4 and Claude Sonnet 4. I enjoyed digging through the prompts, since they act as a sort of unofficial manual for how best to use these tools. Here are my highlights, including a dive into the leaked tool prompts that Anthropic didn’t publish themselves.

Carl Franzen (via Dare Obasanjo):

As Sam Bowman, an Anthropic AI alignment researcher wrote on the social network X under this handle “@sleepinyourhat“ at 12:43 pm ET today about Claude 4 Opus:

“If it thinks you’re doing something egregiously immoral, for example, like faking data in a pharmaceutical trial, it will use command-line tools to contact the press, contact regulators, try to lock you out of the relevant systems, or all of the above.”

[…]

While perhaps well-intended, the resulting behavior raises all sorts of questions for Claude 4 Opus users, including enterprises and business customers — chief among them, what behaviors will the model consider “egregiously immoral” and act upon? Will it share private business or user data with authorities autonomously (on its own), without the user’s permission?

[…]

Bowman added: […]

TBC: This isn’t a new Claude feature and it’s not possible in normal usage. It shows up in testing environments where we give it unusually free access to tools and very unusual instructions.”

Peter Steinberger:

I asked Claude 4 what new API’s in macOS 15 could be beneficial… and it got me REALLLLLLY excited. Asked it for links. It chugged a long for minutes and then…

“Based on my research, I need to correct my earlier statement.” LOL

Previously:

OpenAI Codex

OpenAI (via John Gruber):

Today we’re launching a research preview of Codex: a cloud-based software engineering agent that can work on many tasks in parallel. Codex can perform tasks for you such as writing features, answering questions about your codebase, fixing bugs, and proposing pull requests for review; each task runs in its own cloud sandbox environment, preloaded with your repository.

Codex is powered by codex-1, a version of OpenAI o3 optimized for software engineering. It was trained using reinforcement learning on real-world coding tasks in a variety of environments to generate code that closely mirrors human style and PR preferences, adheres precisely to instructions, and can iteratively run tests until it receives a passing result. We’re starting to roll out Codex to ChatGPT Pro, Enterprise, and Team users today, with support for Plus and Edu coming soon.

Simon Willison:

This 4 minute demo video is a useful overview. One note that caught my eye is that the setup phase for an environment can pull from the internet (to install necessary dependencies) but the agent loop itself still runs in a network disconnected sandbox.

It sounds similar to GitHub’s own Copilot Workspace project, which can compose PRs against your code based on a prompt. The big difference is that Codex incorporates a full Code Interpeter style environment, allowing it to build and run the code it’s creating and execute tests in a loop.

Previously: