{"id":30767,"date":"2020-11-23T17:01:50","date_gmt":"2020-11-23T22:01:50","guid":{"rendered":"https:\/\/mjtsai.com\/blog\/?p=30767"},"modified":"2020-11-25T16:11:50","modified_gmt":"2020-11-25T21:11:50","slug":"m1-memory-and-performance","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2020\/11\/23\/m1-memory-and-performance\/","title":{"rendered":"M1 Memory and Performance"},"content":{"rendered":"<p><a href=\"https:\/\/blog.metaobject.com\/2020\/11\/m1-memory-and-performance.html\">Marcel Weiher<\/a> (<a href=\"https:\/\/news.ycombinator.com\/item?id=25081498\">Hacker News<\/a>):<\/p>\n<blockquote cite=\"https:\/\/blog.metaobject.com\/2020\/11\/m1-memory-and-performance.html\"><p>The M1 is apparently a multi-die package that contains both the actual processor die and the\nDRAM.  As such, it has a very high-speed interface between the DRAM and the processors.\nThis high-speed interface, in addition to the absolutely humongous caches, is key to keeping the various functional\nunits fed.  Memory bandwidth and latency are probably <em>the<\/em> determining factors for many\nof today&rsquo;s workloads, with a single access to main memory taking easily hundreds of clock cycles\nand the CPU capable of doing a good number of operations in each of these clock cycles.\nAs Andrew Black <a href=\"http:\/\/web.cecs.pdx.edu\/~black\/publications\/O-JDahl.pdf\">wrote<\/a>:  &ldquo;[..] computation is essentially free, because it happens &lsquo;in the cracks&rsquo; between data fetch and data store; ..&rdquo;.<\/p><p>[&#8230;]<\/p><p>The benefit of sticking to RC is much-reduced memory consumption.  It <a href=\"https:\/\/people.cs.umass.edu\/~emery\/pubs\/gcvsmalloc.pdf\">turns out<\/a> that for\na tracing GC to achieve performance comparable with manual allocation, it needs several\ntimes the memory (different studies find different overheads, but at least 4x is a conservative\nlower bound).  While I haven&rsquo;t seen a study comparing RC, my personal experience is that the\noverhead is much lower, much more predictable, and can usually be driven down with little\nadditional effort if needed.<\/p><p>So Apple can afford to live with more &ldquo;limited&rdquo; total memory because they need much less\nmemory for the system to be fast.  And so they can do a system design that imposes this\nlimitation, but allows them to make that memory wicked fast. <em>Nice<\/em>.<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/www.starcoder.com\/wordpress\/2020\/11\/memory-bandwidth-in-apples-m1\/\">Mike<\/a>:<\/p>\n<blockquote cite=\"https:\/\/www.starcoder.com\/wordpress\/2020\/11\/memory-bandwidth-in-apples-m1\/\"><p>The memory bandwidth on the new Macs is impressive. Benchmarks peg it at around 60GB\/sec&#x2013;about 3x faster than a 16&rdquo; MBP. Since the M1 CPU only has 16GB of RAM, it can replace the entire contents of RAM 4 times every second.<\/p>\n<p>[&#8230;]<\/p>\n<p>Some say we&rsquo;re moving into a phase where we don&rsquo;t need as much RAM, simply because as SSDs get faster there is less of a bottleneck for swap. [&#8230;] However, with the huge jump in performance on the M1, the SSD is back to being an order of magnitude slower than main memory.<\/p>\n<p>So we&rsquo;re left with the question: will SSD performance increase faster than memory bandwidth? And at what point does the SSD to RAM speed ratio become irrelevant?<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/www.sicpers.info\/2020\/11\/apple-silicon-xeon-phi-and-amigas\/\">Graham Lee<\/a>:<\/p>\n<blockquote cite=\"https:\/\/www.sicpers.info\/2020\/11\/apple-silicon-xeon-phi-and-amigas\/\"><p>And that makes me think that a Mac would either not go full NUMA, or would not have public API for it. <em>Maybe<\/em> Apple would let the kernel and some OS processes have exclusive access to the on-package RAM, but even that seems overly complex (particularly where you have more than one M1 in a computer, so you need to specify core affinity for your memory allocations in addition to memory type). My guess is that an early workstation Mac with 16GB of M1 RAM and 64GB of DDR4 RAM would look like it has 64GB of RAM, with the on-package memory used for the GPU and as cache. NUMA APIs, if they come at all, would come later.<\/p><\/blockquote>\n\n<p>Previously:<\/p>\n<ul>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2020\/11\/12\/apple-m1-benchmarks\/\">Apple M1 Benchmarks<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2020\/11\/10\/one-more-thing-apple-silicon-macs\/\">One More Thing: Apple Silicon Macs<\/a><\/li>\n<\/ul>\n\n<p id=\"m1-memory-and-performance-update-2020-11-25\">Update (2020-11-25): <a href=\"https:\/\/twitter.com\/Catfish_Man\/status\/1326298205034696705\">David Smith<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/Catfish_Man\/status\/1326298205034696705\">\n<p>this further improvement is because uncontended acquire-release atomics are about the same speed as regular load\/store on A14<\/p>\n<\/blockquote>\n\n<p><a href=\"https:\/\/www.macrumors.com\/2020\/11\/23\/m1-macbook-pro-ram-differences\/\">Juli Clover<\/a>:<\/p>\n<blockquote cite=\"https:\/\/www.macrumors.com\/2020\/11\/23\/m1-macbook-pro-ram-differences\/\">\n<p>The video includes a series of benchmark tests, ranging from Geekbench and Cinebench to RAW exporting tests. Geekbench and Cinebench benchmarks didn&rsquo;t demonstrate a difference in performance between the 8GB and 16GB models, but other tests designed to maximize RAM usage did show some differences.<\/p>\n<p>A Max Tech Xcode benchmark that mimics compiling code saw the 16GB model score 122 compared to the 136 scored by the 8GB model, with the lower score being better.<\/p>\n<\/blockquote>\n\n<p><a href=\"https:\/\/forums.macrumors.com\/threads\/video-demos-performance-differences-between-8gb-and-16gb-apple-m1-macbook-pro.2271101\/?post=29301175#post-29301175\">Populus<\/a>:<\/p>\n<blockquote cite=\"https:\/\/forums.macrumors.com\/threads\/video-demos-performance-differences-between-8gb-and-16gb-apple-m1-macbook-pro.2271101\/?post=29301175#post-29301175\"><p>Beware of the swap disk space!<\/p>\n<p>In most of the benchmarks performed on 8GB M1 machines, if Activity Monitor is shown, the swap space usage is always between 2,5GB and 4GB or even more. In my 10 years of being a mac user, I&rsquo;ve never seen such big swap space being used unless I&rsquo;m stressing my machine heavily, and that usage may be aging your SSD.<\/p><\/blockquote>","protected":false},"excerpt":{"rendered":"<p>Marcel Weiher (Hacker News): The M1 is apparently a multi-die package that contains both the actual processor die and the DRAM. As such, it has a very high-speed interface between the DRAM and the processors. This high-speed interface, in addition to the absolutely humongous caches, is key to keeping the various functional units fed. Memory [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"2020-11-23T22:01:54Z","apple_news_api_id":"43d827df-1a4c-427b-89ae-3e06c3263915","apple_news_api_modified_at":"2020-11-25T21:11:53Z","apple_news_api_revision":"AAAAAAAAAAAAAAAAAAAAAQ==","apple_news_api_share_url":"https:\/\/apple.news\/AQ9gn3xpMQnuJrj4GwyY5FQ","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[2],"tags":[2014,55,30,1891,71,1056],"class_list":["post-30767","post","type-post","status-publish","format-standard","hentry","category-technology","tag-apple-m1","tag-arc","tag-mac","tag-macos-11-0","tag-programming","tag-ram"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/30767","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=30767"}],"version-history":[{"count":3,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/30767\/revisions"}],"predecessor-version":[{"id":30806,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/30767\/revisions\/30806"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=30767"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=30767"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=30767"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}