Tuesday, October 6, 2009 [Tweets] [Favorites]

DRAM Error Rates

Robin Harris (via John Siracusa):

A two-and-a-half year study of DRAM on 10s of thousands Google servers found DIMM error rates are hundreds to thousands of times higher than thought — a mean of 3,751 correctable errors per DIMM per year.

[…]

Most DIMMs don’t include ECC because it costs more. Without ECC the system doesn’t know a memory error has occurred.

Everything is fine until the data corruption means a missed memory reference or an incorrect value or a flipped bit in a file writing to disk. What you see is a “file not found” or a “file not readable” message or, worse yet, silent data corruption - or even a system crash. And nothing that says “memory error.”

Comments

Stay up-to-date by subscribing to the Comments RSS Feed for this post.

Leave a Comment