{"id":35060,"date":"2022-02-17T16:35:13","date_gmt":"2022-02-17T21:35:13","guid":{"rendered":"https:\/\/mjtsai.com\/blog\/?p=35060"},"modified":"2022-03-09T16:00:30","modified_gmt":"2022-03-09T21:00:30","slug":"apple-ssd-benchmarks-and-f_fullsync","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2022\/02\/17\/apple-ssd-benchmarks-and-f_fullsync\/","title":{"rendered":"Apple SSD Benchmarks and F_FULLSYNC"},"content":{"rendered":"<p><a href=\"https:\/\/twitter.com\/marcan42\/status\/1494213855387734019\">Hector Martin<\/a> (<a href=\"https:\/\/news.ycombinator.com\/item?id=30370551\">Hacker News<\/a>):<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/marcan42\/status\/1494213855387734019\"><p>It turns out Apple&rsquo;s custom NVMe drives are amazingly fast - if you don&rsquo;t care about data integrity.<\/p><p>[&#8230;]<\/p><p>On Linux, fsync() will both flush writes to the drive, and ask it to flush its write cache to stable storage.<\/p><p>But on macOS, <code>fsync()<\/code> only flushes writes to the drive. Instead, they provide an <code>F_FULLSYNC<\/code> operation to do what <code>fsync()<\/code> does on Linux.<\/p><p>[&#8230;]<\/p><p>So effectively macOS cheats on benchmarks; fio on macOS does not give numbers comparable to Linux, and databases and other applications requiring data integrity on macOS need to special case it and use <code>F_FULLSYNC<\/code>.<\/p><p>[&#8230;]<\/p><p>So, effectively, Apple&rsquo;s drive is faster than all the others without cache flushes, but it is more than 3 times slower than a lowly SATA SSD at flushing its cache.<\/p><\/blockquote>\n\n<p>As far as I can tell, the summary is:<\/p>\n<ol>\n<li><code><a href=\"https:\/\/developer.apple.com\/library\/archive\/documentation\/System\/Conceptual\/ManPages_iPhoneOS\/man2\/fsync.2.html\">fsync()<\/a><\/code> does different things on Mac and Linux for historical reasons.<\/li>\n<li>Many non-Apple SSDs don&rsquo;t actually flush their cache when doing <code>F_FULLSYNC<\/code>; they seem faster because they lie.<\/li>\n<li>Compared with other SSDs that actually do flush, Apple&rsquo;s are (for unknown reasons) much slower, though they are faster when not flushing. Or, perhaps, these non-Apple SSDs are lying, too.<\/li>\n<li>Often, what you really want is <code>F_BARRIERFSYNC<\/code>, not <code>F_FULLSYNC<\/code>.<\/li>\n<\/ol>\n\n<p><a href=\"https:\/\/twitter.com\/oldmanuk\/status\/1494246227667468289\">Dominic Evans<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/oldmanuk\/status\/1494246227667468289\">\n<p>Surely that&rsquo;s a mischaracterisation to claim they&rsquo;re &ldquo;cheating&rdquo; &mdash; this is just legacy diversions. On earlier versions of the Linux kernel and in posix <code>fsync()<\/code> didn&rsquo;t used to flush the cache either. Darwin independently added the special <code>fnctl<\/code> to do a &ldquo;<code>FULLSYNC<\/code>&rdquo; long ago<\/p>\n<p>Yes newer kernels (2.6 onward or something?) changed the semantics of <code>fsync()<\/code> to request the full cache flush too. Darwin didn&rsquo;t change their <code>fsync<\/code> because they already had their <code>fnctl<\/code> to provide the option where needed.<\/p>\n<\/blockquote>\n\n<p><a href=\"https:\/\/lists.apple.com\/archives\/darwin-dev\/2005\/Feb\/msg00087.html\">Dominic Giampaolo<\/a>, in 2005:<\/p>\n<blockquote cite=\"https:\/\/lists.apple.com\/archives\/darwin-dev\/2005\/Feb\/msg00087.html\">\n<p>On MacOS X, <code>fsync()<\/code> always has and always will flush all file data\nfrom host memory to the drive on which the file resides.  The behavior\nof <code>fsync()<\/code> on MacOS X is the same as it is on every other version of\nUnix since the dawn of time (well, since the introduction of <code>fsync<\/code>\nanyway :-).<\/p>\n<p>I believe that what the above comment refers to is the fact that\n<code>fsync()<\/code> is not sufficient to guarantee that your data is on stable\nstorage and on MacOS X we provide a <code>fcntl()<\/code>, called <code>F_FULLFSYNC<\/code>,\nto ask the drive to flush all buffered data to stable storage.<\/p>\n<\/blockquote>\n\n<p><a href=\"https:\/\/twitter.com\/rosyna\/status\/1494248499067514883\">Rosyna Keller<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/rosyna\/status\/1494248499067514883\">\n<p>Force Unit Access, what this &ldquo;flush to permanent storage, not disk cache&rdquo; command is called, is ignored by the majority of drive types (either through lying firmware or a bridge).<\/p>\n<p>It&rsquo;s not enabled by default in most kernels (Linux, Windows) due to synchronous writes being slow.<\/p>\n<p>[&#8230;]<\/p>\n<p>However, every disk Apple ships actually supports Force Unit Access (<code>F_FULLSYNC<\/code>), and is under a different flag because most cross-platform developers don&rsquo;t expect <code>fsync()<\/code> to actually be synchronous, leading to massive performance losses compared to drives that don&rsquo;t support it.<\/p>\n<p>If you write software they uses full flushing on firmware that isn&rsquo;t a lying liar and actually goes through a flush to permanent storage, remember that every time you&rsquo;re doing the full sync, you significantly impact the performance of the entire system, not just your software.<\/p>\n<\/blockquote>\n\n<p><a href=\"https:\/\/twitter.com\/wooster\/status\/1494253898026029057\">Andrew Wooster<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/wooster\/status\/1494253898026029057\">\n<p>I was the backupd performance lead and I&rsquo;d love to move on but it keeps coming up. &#x1F937;&#x200D;&#x2642;&#xFE0F;&#x1F937;&#x200D;&#x2642;&#xFE0F;<\/p>\n<p>[&#8230;]<\/p>\n<p>I am thankful for the various people at Apple who made sure Apple hardware functioned correctly. Otherwise it would&rsquo;ve been impossible to have both performance and correctness. The former is easy if you ignore the latter.<\/p>\n<\/blockquote>\n\n<p>Unfortunately, there were problems in another layer that made Time Capsules corrupt their data all the time.<\/p>\n\n<p><a href=\"https:\/\/twitter.com\/marcan42\/status\/1494405295811923980\">Hector Martin<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/marcan42\/status\/1494405295811923980\"><p>So you&rsquo;re saying my WD NVMe drive lies about flushes, and yet they&rsquo;re 10x slower than not flushing? Must be really bad at lying then&#8230;<\/p><p>The problem is Apple SSDs are 1000x slower when flushing. That&rsquo;s called a firmware bug.<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/twitter.com\/handleym99\/status\/1494385468162527235\">Maynard Handley<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/handleym99\/status\/1494385468162527235\"><p>As I described elsewhere, the traditional solution to ordering writes on unix is <code>fsync<\/code>. This is a highly sub-optimal solution because it does much more than required.<\/p><p>Apple&rsquo;s solution is to use the equivalent of barriers, rather than flushes, to enforce ordering; and it works every bit as well as the equivalent solution (barriers rather than flushes) in a CPU pipeline.<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/twitter.com\/numist\/status\/1494392674014531593\">Scott Perry<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/numist\/status\/1494392674014531593\">\n<p>There&rsquo;s a third sync operation that lets you have your performance and write ordering too: <code>F_BARRIERFSYNC<\/code>. SQLite already uses it on Darwin, and it&rsquo;s part of the <a href=\"https:\/\/developer.apple.com\/documentation\/xcode\/reducing-disk-writes\">best practices guide<\/a> for I\/O reduction<\/p>\n<\/blockquote>\n\n<p id=\"apple-ssd-benchmarks-and-f_fullsync-update-2022-03-09\">Update (2022-03-09): See also: <a href=\"https:\/\/eclecticlight.co\/2022\/02\/18\/how-can-you-trust-a-disk-to-write-data\/\">Howard Oakley<\/a>, <a href=\"https:\/\/forums.macrumors.com\/threads\/ssd-flash-storage-in-macs-premium-prices-for-garbage.2335073\/\">MacRumors<\/a>, <a href=\"https:\/\/eclecticlight.co\/2022\/02\/23\/how-to-prevent-errors-on-ssds\/\">Howard Oakley<\/a>, <a href=\"https:\/\/twitter.com\/simjp\/status\/1494483768396197891\">JP Simard<\/a>.<\/p>\n\n<p><a href=\"https:\/\/twitter.com\/xenadu02\/status\/1495693475584557056\">Russ Bishop<\/a> (<a href=\"https:\/\/news.ycombinator.com\/item?id=30419618\">Hacker News<\/a>):<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/xenadu02\/status\/1495693475584557056\">\n<p>I tested a random selection of four NVMe SSDs from four vendors. Half lose FLUSH&rsquo;d data on power loss. That is the flush went to the drive, confirmed, success reported all the way back to userspace. Then I manually yanked the cable. Boom, data gone.<\/p>\n<p>The other half never lost data confirmed after a flush (F_FULLFSYNC on macOS) no matter how much I abused them. All four had perf hit from flushing so they are doing some work.<\/p>\n<p>Top two performers on flush? One lost data 40% of the time. The other never lost any.<\/p>\n<p>I guess review sites don&rsquo;t test this stuff. Everyone just assumes data disappearing on crash\/power loss is just how computers work?<\/p>\n<p>I feel bad for the other two vendors who must have test suites and spent engineering hours making sure FLUSH works, only to find out no one cares<\/p>\n<\/blockquote>","protected":false},"excerpt":{"rendered":"<p>Hector Martin (Hacker News): It turns out Apple&rsquo;s custom NVMe drives are amazingly fast - if you don&rsquo;t care about data integrity.[&#8230;]On Linux, fsync() will both flush writes to the drive, and ask it to flush its write cache to stable storage.But on macOS, fsync() only flushes writes to the drive. Instead, they provide an [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"2022-02-17T21:35:16Z","apple_news_api_id":"3460061c-e09f-4e0e-966b-3f66247c8a7c","apple_news_api_modified_at":"2022-03-09T21:00:33Z","apple_news_api_revision":"AAAAAAAAAAAAAAAAAAAABQ==","apple_news_api_share_url":"https:\/\/apple.news\/ANGAGHOCfTg6Waz9mJHyKfA","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[2],"tags":[1321,143,31,2078,448,30,2077,138,71,183,425,503,216],"class_list":["post-35060","post","type-post","status-publish","format-standard","hentry","category-technology","tag-data-integrity","tag-database","tag-ios","tag-ios-15","tag-linux","tag-mac","tag-macos-12","tag-optimization","tag-programming","tag-ssd","tag-sqlite","tag-timecapsule","tag-timemachine"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/35060","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=35060"}],"version-history":[{"count":5,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/35060\/revisions"}],"predecessor-version":[{"id":35244,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/35060\/revisions\/35244"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=35060"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=35060"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=35060"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}