Tuesday, March 1, 2022

M1 Icestorm Performance and Asymmetric M1 Pro Core Management

Howard Oakley:

At its heart, each M1 chip has a total of eight processor cores, all based on Apple’s development of technology licensed from Arm. Four are described as Performance cores, dubbed Firestorm, and four are Efficiency cores, or Icestorm. These primarily differ in their compromise between performance and power consumption, with Firestorm cores performing in the same class as better Intel cores, and Icestorm delivering lower performance with much less power requirement and heat production.

Howard Oakley (Hacker News):

In real-world use, what are the penalties for processes running on Icestorm rather than Firestorm cores? Here I report one initial comparison, of performance when calculating floating-point dot products, a task which you might not consider a good fit for the Icestorm.

Central to this is my previous observation that different Quality-of-Service (QoS) settings for processes determine which cores they are run on. OperationQueue processes given a QoS of 17 or higher are invariably run by macOS 11 and 12 on Firestorm cores (and can load Icestorms too), while those with a QoS of 9 are invariably run only on Icestorm cores. Those might change in the face of extreme loading of either core pool, but when there are few other active processes it appears consistent.

[…]

Maynard Handley previously commented that Icestorm cores use about 10% of the power (net 25% of energy) of Firestorm cores. For SIMD vector arithmetic, at least, they perform extremely well for their economy.

Howard Oakley:

Most scheduled background activities in macOS are now managed by this combination of DAS and CTS, which has proved itself to be superior to fixed time intervals managed by launchd.

[…]

As I noted above, one of the significant factors when scheduling background activities is the QoS set for that activity. The great majority of these have the minimum QoS, so that when they’re scheduled to run on an M1 Mac, they do so on the Efficiency (Icestorm) cores. That’s why Time Machine backups occur at slightly irregular intervals, and when they do, they occupy the Icestorms and leave the Firestorms free for the user.

Howard Oakley:

I’ve been unable to find any way of ensuring that normal commands and scripts run from Terminal are constrained to the Efficiency cores, although I may well be missing a trick here.

[…]

When inside macOS, running from Swift, Objective-C or an equivalent language, there are several opportunities to set the Quality of Service when running command tools.

[…]

For those who want to try out running commands on different combinations of cores, I’ve made a new version of my exploratory utility DispatchRider, which has previously allowed you to explore running commands using NSBackgroundActivityScheduler and the DAS despatching system.

Howard Oakley:

On paper, the major difference between the M1 and M1 Pro/Max CPUs is core count: the original M1 has a total of eight, half of which are E (Efficiency, Icestorm) and half P (Performance, Firestorm) cores. The M1 Pro and Max have two more cores in total, but redistribute their type to give eight P cores and only 2 E cores. It would be easy to conclude that experience with the first design showed that the E cores were only lightly loaded, so fewer were needed, but delivering better performance to the user merited twice the number of P cores. While that may well be true, you also have to look at how the cores are actually used.

[…]

Taken together, these results show that process allocation to cores in the M1 Pro and Max is carefully managed according to QoS (as in the M1) and between the two groups of P cores. This management aims to keep the second group of P cores unloaded as much as possible, and within each group of P cores loads lower-numbered cores more than higher-numbered. This is very different from the even-balancing seen in symmetric cores, and in the M1.

[…]

There are also interesting implications for developers wishing to optimise performance on multiple cores. With the advent of eight P cores in the M1 Pro/Max, it’s tempting to increase the maximum number of processes which can be used outside of an app’s main process. While this may still lead to improved performance on Intel Macs with more than four cores, the core management of these new chips may limit processes to the first block of four cores. Careful testing is required, both under low overall CPU load and when other processes are already loading that first block. Interpreting the results may be tricky.

Previously:

Comments RSS · Twitter

Leave a Comment