Archive for May 11, 2023

Thursday, May 11, 2023

Getting Ready for Dataless Files


In a modern file system, a file’s content may not be available locally on the device. A file that contains only metadata is known as a dataless file. The file’s content typically resides on a remote server and is available to people or apps, transparently, when they access the file.


The system, or a person using the device, can make dataless files whenever they determine it’s appropriate, and your app needs to be ready to handle them. Specifically, avoid unnecessarily materializing dataless files and, when your app requires access to a file’s contents, perform that work asynchronously off the main thread.


UIDocument and NSDocument automatically access the file system in a coordinated and asynchronous manner.


If your app or framework uses low-level POSIX APIs to access the file system and you’re unable to migrate to the preferred methods, consider the following two options[…] Be aware that stat and getattrlist both trigger the materialization of any intermediate folders in the file’s path, if they themselves are dataless.

I find this rather confusing. On macOS, it seems like nearly any file could potentially be dataless. It’s less likely for files in Library but probably possible via symlinking. Even an action as simple as checking whether a file exists can now take an unexpectedly long amount of time. This breaks many longstanding assumptions.

If your app deals with user-created files, I guess the best practice is to do everything asynchronously and using file coordination. Without coordination—at least on older systems—you can run into the opposite problem: instead of accessing an evicted file being slow, it might stay unmaterialized. So you need to use the special APIs even if you already have your file code on a background thread.

But the NSFileCoordinator APIs are awkward, error-prone, and slow, and they infect your entire codebase. Hopefully you aren’t relying on any cross-platform code that’s not aware of them. And even with Apple-specific code, they make it hard to reuse the same code for working with folders that may or may not contain dataless files.

It all feels shoehorned in, like with the security scope URL APIs. Most APIs don’t do the right thing automatically, so you have to wrap uses of them. (But then some other APIs may secretly use coordination so you have to not use it yourself in order to avoid deadlocks.) Any file-related code could potentially need special handling, but there’s no way to make sure that you didn’t miss a spot somewhere. But then, once you’ve done this, your code is much harder to read and much slower for the common case of regular locally stored files.


Update (2023-05-12): Thomas Clement:

Out of curiosity I tried to stat() a non-local file as described in the tech note, but I get a “no such file” error. Same when trying to access it from Terminal. Not sure how we are supposed to test whether a file is dataless then.

Another thing that is not explained is what is the right way to monitor download progress in case the file is dataless.

Update (2023-08-10): Howard Oakley:

Over the last couple of weeks I have been exploring how macOS and its features handle dataless files. While apps that take advantage of AppKit’s NSDocument to read and write files should handle these problems seamlessly, there are some definite seams when it comes to macOS services. These result from three constraints:

  • features reliant on the contents of file data can’t be used with dataless files;
  • features reliant on file data stored outside the file aren’t available to other systems accessing that file from iCloud;
  • limitations on the total size of extended attributes in iCloud storage may require some to be removed.

Discord’s Username Change

Umar Shakir (via Hacker News):

Starting in the next couple of weeks, millions of Discord users will be forced to say goodbye to their old four-digit-appended names. Discord is requiring everyone to take up a new common platform-wide handle. For Discord, it’s a move toward mainstream social network conventions. For some users, though, it’s a change to the basics of what Discord is — a shift that’s as much about culture as technology.

Discord has historically handled usernames with a numeric suffix system. Instead of requiring a completely unique handle, it allowed duplicate names by adding a four-digit code known as a “discriminator” — think TheVerge#1234. But earlier this week, it announced it was changing course and moving toward unique identifiers that resemble Twitter-style “@” handles.


Over on Reddit, Vishnevskiy argued that the new handles wouldn’t even show up in the interface that often since Discord will allow users to set a separate display name that’s not unique.


During the change, Discord users will have to navigate a process that’s fraught with uncertainty and cutthroat competition.

Google Codey

Frederic Lardinois:

At its annual I/O developer conference, Google today announced the launch of a number of AI-centric coding tools, including its competitor to GitHub’s Copilot, a chat tool for asking questions about coding and Google Cloud services, as well as AI-assisted coding in Google’s no-code AppSheet product.

At the core of virtually all of these new code completion and code generation tools is Codey. Based on Google’s PaLM 2 large language model, the company specifically trained Codey to handle coding-related prompts, but it also trained the model to handle queries related to Google Cloud in general (all of this, by the way, falls under Google’s Duet AI branding).


Developers will get access to these new tools through an extension for Visual Studio Code, JetBrains IDEs, the Google Shell Editor, as well as in Google’s cloud-hosted Workstations service.

June Yang:

This code generation model supports 20+ coding languages, including Go, Google Standard SQL, Java, Javascript, Python, and Typescript. It enables a wide variety of coding tasks, helping developers to work faster and close skills gaps[…]


Corellium Wins iOS Simulator Copyright Case

Isaiah Poritz (via Corellium, Hacker News):

Apple Inc. failed to fully revive a long-running copyright lawsuit against cybersecurity firm Corellium Inc. over its software that simulates the iPhone’s iOS operating systems, letting security researchers identify flaws in the software.

The US Court of Appeals for the Eleventh Circuit on Monday ruled that Corellium’s CORSEC simulator is protected by copyright law’s fair use doctrine, which allows the duplication of copyrighted work under certain circumstances.


Apple argued that Corellium’s software was “wholesale copying and reproduction” of iOS and served as a market substitute for its own security research products.

Corellium countered that its copying of Apple’s computer code and app icons was only for the purposes of security research and was sufficiently “transformative” under the fair use standard.