Data Corruption – The Silent Killer (aka Cosmic Rays are baaaad mmmkay?)

If you have worked in the IT industry for a reasonable amount of time, you have probably heard the term bit rot, referring to the gradual decay of storage media over time, or simply Data Corruption. What I never realised was what one of the primary causes is behind bit rot, and the amount of effort the storage industry goes to prevent it!

At Storage Field Day 9 we attended one of the most genuinely fascinating and enjoyable sessions I have ever seen. It was “proper science”!

Apparently one of the dominant causes of data corruption in SSDs, is in fact something which completely blew my mind when I heard it! Believe it or not, bit rot and data corruption is often caused by cosmic rays!

data corruption rust.png

Cosmic Rays cause Data Corruption!

These cosmic rays are actually protons and other heavy ions which originate from the Sun, or even distant stars! Next thing you know these evil buggers are coming down here, taking our bits and stealing our women! Ok, maybe not the last part, but they’re certainly interacting with other elements in our atmosphere and generating storms of neutrons (we walking flesh bags actually get hit by about 10 of them every second but as we’re not made primarily of silicon, no biggie on the data corruption front!).

These neutrons occasionally also then slam into integrated circuits, and more occasionally still, this causes a bit to flip from a 0 to a 1, or vice versa.

data corruption mind blown.jpgNow a flip of a single bit might not seem like a lot, especially with CRC and other features in modern HDDs, but the cumulative effect or a large number of these flips can lead to corrupt data. Furthermore, corruption of even a single bit of certain data types, such as the vast quantities of DNA data we plan to store in the future, could mean the difference between you being diagnosed with cancer or not!

As such, Intel have introduced a feature within their SSDs which will deliberately brick the drives if they detect too many bit flips / errors! More amusingly, they adopt “aggressive bricking“, i.e. brick the drives even when minimal data corruption is detected! A brilliantly ironic description for something which is actually trying to protect data, as this has the effect of causing your RAID or Erasure coding data protection to rebuild the drive contents on another drive, therefore ensuring that you don’t end up with corrupt data replicating etc.

Intel actually test this using a particle accelerator at Los Alamos Neutron Science Centre, by firing neutron beams at their drives and checking the data corruption rates! But don’t worry about the poor drives… it’s all over in a flash! 😉

This is genuinely an absolutely fascinating video and well worth spending 45 minutes watching it:

Also for those of you who may notice some snickering and shaking of shoulders going on in the video, it was partly down to the crazy awesomeness of the subject, but also due to some very humorous twitter conversations going on at the same time! I finally understand the meaning of the term corpsing now, having most definitely experienced it during this session! Vinod did an awesome job of putting up with us! 🙂

data corruption cosmic rays.png

Further Info

You can catch the full Intel Session at the link below, which covers other fascinating subjects such as 3D XPoint, NVMe, and SDS – They’re all well worth a watch!

Intel Storage Presents at Storage Field Day 9

Further Reading

Some of the other SFD9 delegates had their own takes on the presentation we saw. Check them out here:

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 9 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

Storage, Tech Field Day , , , , , , , , , , , ,

7 Reasons Why You Should Read The Phoenix Project

I began reading The Phoenix Project with no preconceptions, other than having been told that it is a great book, and hearing it mentioned many times on Eric Wright‘s GC On Demand podcast.

Written by Gene Kim, Kevin Behr, and George Stafford, it is told as a first-person narrative from the perspective of Bill, a middleware team manager who is promoted into a senior IT management role for a business in jeopardy. Through his experiences and a guiding hand from another key character, together we work through the problems facing the business, the IT department and the individuals within.

The story is told in an easy to read, informal style, and I made quick work of it over the course of just a few days. I really enjoyed it on numerous levels:

  1. I recognised every single character in the book as somebody I have worked with (or indeed currently work with!). I guarantee you will feel the same!
  2. The book was pretty well written, and the story arc itself was compelling. I was really rooting for Bill to succeed in his endeavours! (But did he? You will have to read the book to find out!)
  3. The authors obviously have a great sense of humour! Quotes such as “Show me a dev who isn’t crashing production systems, and I’ll show you one who can’t fog a mirror. Or more likely, is on vacation.” had me laughing out loud on the train in front of other passengers!
  4. The book is approachable and not elitist. You could pick it up as a cable monkey or an IT director (or maybe even a Sales person!!!), and relate to the concepts and methods described.
  5. I learned a huge amount about different methods for handling and improving processes around WIP (Work in Progress), such as the Theory of Constraints or the use of Kanban boards (I am currently testing this with my pre-sales customer workloads using Trello, but I’m told Kanbanize is also very good). Resilience Engineering (think Netflix Simian Army) and numerous other techniques are also covered, along with the overarching “Three Ways” (very Zen!).
  6. I actually picked up a few key tips which could be applied directly to my pre-sales design and requirements gathering workshops with my customer stakeholders.
  7. Finally, it didn’t feel “preachy”, which is always a risk when trying to sell an idea / concept as your main theme and I was initially concerned that the book would be ramming DevOps culture down my neck throughout. This could not be farther from the truth, and the full DevOps concepts do not come into play until the story is almost complete. There are many lessons to be learned throughout the story, which could be applied to any organisation!

The Phoenix Project Cover

Here are another few choice quotes from The Phoenix Project, both humorous and insightful:

“The only thing more dangerous than a developer is a developer conspiring with Security. The two working together gives us means, motive, and opportunity.”

“How can we manage production if we don’t know what the demand, priorities, status of work in process, and resource availability are?”

“You just described ‘technical debt’ that is not being paid down. It comes from taking shortcuts, which may make sense in the short-term. But like financial debt, the compounding interest costs grow over time. If an organization doesn’t pay down its technical debt, every calorie in the organization can be spent just paying interest, in the form of unplanned work.”

“On the other hand, if a resource is ninety percent busy, the wait time is ‘ninety percent divided by ten percent’, or nine hours. In other words, our task would wait in queue nine times longer than if the resource were fifty percent idle.”

In case you hadn’t felt like I was positive enough about The Phoenix Project yet, I would say that this book should be provided as mandatory training to every person working in every IT department today, from the guys plugging in cables to the CIO!

If you do read and enjoy the book, I highly recommend also reading The Goal by Eliyahu M. Goldratt. I was a little surprised, to say the least, that this appears to be a very similar story, following a similar arc and some almost identical characters to The Phoenix Project. That said, I am half way through it at the moment and still thoroughly enjoying it, though I am not too worried about missing the movie version!

The Goal by Eli Goldratt CoverThe Goal delves even deeper into the Theory of Constraints and explains some of the tools we can use to mitigate, bypass or remove constraints in a system. All of these tools and methods can be applied as easily to IT as they can to production lines, which (without stating the bleeding obvious) is exactly the point of The Phoenix Project!

Anyway, if you want to do yourself a favour both in terms of your career development, but also a really compelling story and a thoroughly decent book, you could do a lot worse than spending £5 on the Kindle Edition of The Phoenix Project!

Where To Get Them

For anything technical, I like to buy ebooks these days for both portability and the fact that I wont be chopping down trees needlessly. Both of the above titles are available very inexpensively on Kindle:

And Finally…

Sincerest apologies for one of the most click bait-y blog titles I’ve ever posted! Even worse than this one. Honestly, I feel ashamed!

I’ll get my coat…

Architecture, Career, Cloud , , , , , , , , , , , , ,