Trust No One, Not Even Performance Counters
Paul Khuong (via David Smith):I can guess why we observe this effect; it’s not like Intel is intentionally messing with us.
mfence
is a full pipeline flush: it slows code down because it waits for all in-flight instructions to complete their execution. Thus, while it’s flushing that slows us down, the profiling machinery will assign these cycles to any of the instructions that are being flushed. Locked instructions instead affect stores that are still queued. By forcing such stores to retire, locked instructions become responsible for the extra cycles and end up “paying” for writes that would have taken up time anyway.