Friday, February 26, 2021

Excessive Mac SSD Wear

Hartley Charlton (Hacker News):

Across Twitter and the MacRumors forums, users are reporting that M1 Macs are experiencing extremely high drive writes over a short space of time. In what appear to be the most severe cases, M1 Macs are said to be consuming as much as 10 to 13 percent of the maximum warrantable total bytes written (TBW) value of its SSD.

[…]

It is not known how widespread the TBW issue is, but reports of strange SSD behavior are also now emerging from users with Intel-based Macs, suggesting that the TBW issue may not be exclusive to M1 Macs.

Dan Moren:

I ran the command-line tests on my own M1 MacBook Air versus my 2017 iMac, and it certainly did seem as though some of the numbers on the Air were higher than they should be, given the amount of relative use.

The numbers for older Macs reported on Accidental Tech Podcast also seem higher than I would have expected.

Update (2021-03-11): Ben Lovejoy (tweet):

Second, he says than SSD vendors have to be very conservative in their wear ratings, as it leaves them on the hook for warranty claims if a drive fails before reaching its rated wear limit. In practice, SSDs can commonly cope with four times as much wear.

Regardless, I think we need to figure out why macOS is writing (or reporting that it’s writing) so much more data than intuitively seems reasonable. And this is not an issue limited to M1 Macs.

Update (2021-05-25): Hector Martin:

Update on the macOS SSD thrashing issue: It seems the issue is fixed in 11.4. Feel free to try the betas if you’re adventurous, or wait for the final release.

It’s going to be interesting diffing the XNU kernel source once it drops and seeing what the bug was…

Previously:

9 Comments RSS · Twitter

I’m getting about 120 gigabytes written per hour on my 16 GB M1 Mac, which makes no sense to me, with the vast majority of writes coming from kernel task and the window server. According to Activity Monitor I’m never getting above 20% memory pressure, so I’m not sure why MacOS is even bothering to send so much data to a swap file.

So you will need to replace your Mac sooner. But the news has died down now, and since Apple dont act unless there is enough media coverage I would not be surprised the fix will never come.

As John Siracusa points out on a more recent ATP episode, we don't really know if these numbers are accurate. It could be that smartmontools is parsing the values incorrectly (SMART data isn't as standardizes as one might think), and it could also be that Apple's firmware doesn't write the values correctly in the first place.

@Sören,
FWIW, the numbers from smartmontools match what I’m seeing from Activity Monitor.

I just checked my ancient MacBook Pro running Mojave: With 15 days of uptime, kernel_task has written 25 GB, iStat Menus with 13 GB, mds_stores with 11 GB, launchd with 6 GB, and Firefox with 5 GB. This is a system that largely sits idle though.

For a more relevant data point, I also checked my 1TB SATA SSD, which has been my primary drive for the past 3.5 years, and it's at 51 TBW, or about 41 GB/day, far less than the 800-1000 GB/day reported on ATP. And this is a drive that has seen a lot of activity: Multiple OS installs, multiple FileVault activations (which involve a full drive write), many Xcode installations, lots of video recording, app updates, etc. This drive has never had Catalina or Big Sur on it.

To put it another way, Casey's drive reported more writes in 54 days than mine reported in ~1240 days.

Has anyone looked into where these writes are going? Or has anyone installed Big Sur to an HDD and see if the drive does in fact keep itself busy with 90 Mb/s of writes?

A little more data (via DriveDX) from a daily production system, in case it's helpful:

1TB Apple SSD in 2015 MacBook Pro, currently running macOS 10.14.6 Mojave

Overall Health Rating                : GOOD 100%
SSD Lifetime Left Indicator          : GOOD 93.0%
Model Family                         : Apple (Samsung-based) SSDs
Model                                : APPLE SSD SM1024G
Power On Time                        : 12,194 hours (16 months 28 days 2 hours)
Power Cycles Count                   : 23,231
Failed Indicators (life-span / pre-fail)  : 0 (0 / 0)
Failing Indicators (life-span / pre-fail) : 0 (0 / 0)
Warnings (life-span / pre-fail)           : 0 (0 / 0)
Recently failed Self-tests (Short / Full) : 0 (0 / 0)
I/O Error Count                           : 0 (0 / 0)


=== DRIVE HEALTH INDICATORS ===
  ID   | NAME                        |      RAW VALUE           | STATUS          
 173   Wear Leveling Count                  0x500DE004C          93.0%  OK          
 174   Host Reads MiB                  200,202,676 (190.9 TB)    99.0%  OK          
 175   Host Writes MiB                  52,768,984 (50.3 TB)     99.0%  OK          
 192   Unsafe Shutdown Count                    218              99.0%  OK          
 199   UDMA CRC Error Count                      0                100%  OK

Nanosecond timestamps have to be used since they're baked into AFPS. Wouldn't want all the groovy new spyware vectors to go to waste.

Think I'm exaggerating? Read some forensics websites that have done things like track the ever-increasing trickery and breadth/depth of the OS X/macOS expansion of all this, like metadata and holding files hostage in unallocated space. Want to turn off a bit of spyware? A hidden file will be created to let investigators know you did it (and, no doubt, a plethora of additional spyware data grabs will be instigated). And on and on.

There is minimal doubt in my mind that this is spyware gone awry. But, since AFPS was slower even on SSDs than "not modern" HFS+, it must be good for us.

Probably trying to use all the AI/ML hardware in M1 and having difficulties.

The penchant of Catalina and Big Sur for crashing and having extremely bad bugs (like TextEdit refusing to left one save documents after a crash — Catalina, or TextEdit crashing and erasing much of your recent work — Big Sur) makes me very positive about driverless vehicles. Apparently, text editing is too hard but driving on highways is not.

The famous assembly programmer who exposed Intel's scheme to make AMD CPUs slow (particularly in benchmarks), via GenuineIntel (Agner Fog) joked that the only thing ordinary people really need dual core processors for is to run the spyware faster.

That joke's not quite as amusing these days. I had more reliability editing text documents on an Apple Lisa than I've had on Big Sur. Time to go back to CP/M and WordStar? Thanks Apple!

Leave a Comment