Data Corruption – The Silent Killer (aka Cosmic Rays are baaaad mmmkay?)

If you have worked in the IT industry for a reasonable amount of time, you have probably heard the term bit rot, referring to the gradual decay of storage media over time, or simply Data Corruption. What I never realised was what one of the primary causes is behind bit rot, and the amount of effort the storage industry goes to prevent it!

At Storage Field Day 9 we attended one of the most genuinely fascinating and enjoyable sessions I have ever seen. It was “proper science”!

Apparently one of the dominant causes of data corruption in SSDs,Β is in fact something which completely blew my mind when I heard it! Believe it or not, bit rot and data corruption is often caused by cosmic rays!

data corruption rust.png

Cosmic Rays cause Data Corruption!

These cosmic rays are actually protons and other heavy ions which originate from the Sun, or even distant stars! Next thing you know these evil buggers are coming down here, taking our bits and stealing our women! Ok, maybe not the last part, but they’re certainly interacting with other elements in our atmosphere and generating storms of neutrons (we walking flesh bags actually get hit by about 10 of them every second but as we’re not made primarily of silicon, no biggie on the data corruption front!).

These neutrons occasionally also then slam into integrated circuits, and more occasionally still, this causes a bit to flip from a 0 to a 1, or vice versa.

data corruption mind blown.jpgNow a flip of a single bit might not seem like a lot, especially with CRC and other features in modern HDDs, but the cumulative effect or a large number of these flips can lead to corrupt data. Furthermore, corruption of even a single bit of certain data types, such as the vast quantities of DNA data we plan to store in the future, could mean the difference between you being diagnosed with cancer or not!

As such, Intel have introduced a feature within their SSDs which will deliberately brick the drives if they detect too many bit flips / errors! More amusingly, they adopt “aggressive bricking“, i.e. brick the drives even when minimal data corruption is detected! A brilliantly ironic description for something which is actually trying to protect data, as this has the effect of causing your RAID or Erasure coding data protection to rebuild the drive contents on another drive, therefore ensuring that you don’t end up with corrupt data replicating etc.

Intel actually test this using a particle accelerator at Los Alamos Neutron Science Centre, by firing neutron beams at their drives and checking the data corruption rates! But don’t worry about the poor drives… it’s all over in a flash! πŸ˜‰

This is genuinely an absolutely fascinating video and well worth spending 45 minutes watching it:

Also for those of you who may notice some snickering and shaking of shoulders going on in the video, it was partly down to the crazy awesomeness of the subject, but also due to some very humorous twitter conversations going on at the same time! I finally understand the meaning of the term corpsing now, having most definitely experienced it during this session! Vinod did an awesome job of putting up with us! πŸ™‚

data corruption cosmic rays.png

Further Info

You can catch the full Intel Session at the link below, which covers other fascinating subjects such as 3D XPoint, NVMe, and SDS – They’re all well worth a watch!

Intel Storage Presents at Storage Field Day 9

Further Reading

Some of the other SFD9 delegates had their own takes on the presentation we saw. Check them out here:

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 9 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

Storage, Tech Field Day , , , , , , , , , , , ,