Tuesday, October 6, 2009

DRAM Error Rates

Robin Harris (via John Siracusa):

A two-and-a-half year study of DRAM on 10s of thousands Google servers found DIMM error rates are hundreds to thousands of times higher than thought — a mean of 3,751 correctable errors per DIMM per year.

[…]

Most DIMMs don’t include ECC because it costs more. Without ECC the system doesn’t know a memory error has occurred.

Everything is fine until the data corruption means a missed memory reference or an incorrect value or a flipped bit in a file writing to disk. What you see is a “file not found” or a “file not readable” message or, worse yet, silent data corruption - or even a system crash. And nothing that says “memory error.”

Comments RSS · Twitter

Leave a Comment