Thursday, February 15, 2018

Can macOS Tell How Large a File Really Is?

Howard Oakley:

The macOS programming class which provides most information about files is URL. It has quite an elaborate interface which involves telling a file URL object which ‘keys’ you want it to reveal, then accessing those that you want. In this case, the URLResourceKey in question is totalFileSize, which Apple’s developer documentation describes as:

Key for the total displayable size of the file in bytes, returned as an NSNumber object (read-only). This includes the size of any file metadata.

But apparently this refers to metadata from the resource fork. It does not count extended attributes.

The evidence from Precize is that the only accurate way to measure the full size of a Mac file is to total the sizes of each of its xattrs, and add those to the size of its data fork. That doesn’t appear to be a function performed by macOS, or at least it is not exposed anywhere to developers or users. So, as far as I can tell, macOS itself doesn’t have any direct access to the total size of any of its files – which seems a startling omission.

Howard Oakley:

I had not expected xattrs to be so heavily used in the /Library folder, but the average size of xattrs across its files which have xattrs is just over 7 KB per file. I had expected them to be commonplace in my Home folder, but am surprised that the average total size of xattrs across all the files there (not just with xattrs) is just over 2 KB.

[…]

The largest contribution is in ~/Documents, which has a total of 2.6 GB of xattrs across less than half a million files. However, a lot of my images in ~/Pictures still seem to sport thumbnails, so the average total of xattrs per file with xattrs is there almost 21 KB – that’s 0.796 GB in only 38018 files.

2 Comments RSS · Twitter

macOS and file sizes are a botch, including the naming schemes. You should stick to Unix and the command line, if you want to find out the actual size (correctly: sizes) of a file, or a bundle; as an example, see Quinn.app, which has resource forks and xattrs – https://i.imgur.com/GyLM1Tz.jpg –, and TextEdit, which has HFS+ compression: https://i.imgur.com/8AQJtNK.jpg

You can run the du command to get the actual disk usage (incl. slack space), but (in case of directories) you can also run stat on all file contents and add their sizes (increased to the next allocation block size). Both are what we know as "size on disk", or physical size. However, with HFS compression, macOS (e.g. via mdls) will output a wrong physical size, namely the size on disk that the file or directory *would* have, if there were no HFS compression. So macOS tells you that TextEdit is 8 MB on disk (physical size), whereas in reality it's only about half of that (!), i.e. what macOS prints as physical size is actually just a "virtual size" (as it's called here). This already tells you that you should stay far away from macOS tools (like mdls), if you want to know something about the size of an object.

Then there's the data size part, which is just the stat output, or the combined stat outputs of all files in a directory, disregarding allocation block size, i.e. not counting slack space. The totalFileSize in macOS is what's called "Apparent Size" here, i.e. data size plus resource fork size. But the resource fork is *not* part of the file it's associated with. It's separate, otherwise it would be counted in the size output of the stat command. (Side note: macOS mdls command also calls this "logical size", and there's also a key called "file system size", which seems to be the same; don't know if that's always the case.)

Then there's the xattrs, and they are separate from the original file too, like the resource fork. So there's what's called "data size on volume" here, i.e. the data size of the file(s) plus its (their) resource fork(s) plus its (their) xattr(s).

In case of a bundle or directory, the root directory (*.app) should actually be added to the size count, which is 102 bytes, but it can have xattrs too. (Naturally, for regular files this is not relevant.) So in this case the data size on volume would even increase further (called "total data size on volume" here).

Yep, stick to Unix, if you want the truth.

Just the other day I Get Info’d a folder to see how much space it was using. The panel immediately showed something in the range of 80GB, and didn’t change for at least 10 seconds, so I assumed that was that and went about my day. Some time later, I’m cleaning up windows and go to close that Get Info panel, and do a double-take as it’s now showing the folder as over 200GB in size. (This is all on an SSD, btw)

Remember the old times, when we had to click that little Calculate Size button in the Get Info panel? That was annoying, but the automatic sizing we have now can be downright deceptive.

Leave a Comment