Tag Archive for storage

Data Corruption – The Silent Killer (aka Cosmic Rays are baaaad mmmkay?)

Minion Assassin

If you have worked in the IT industry for a reasonable amount of time, you have probably heard the term bit rot, referring to the gradual decay of storage media over time, or simply Data Corruption. What I never realised was what one of the primary causes is behind bit rot, and the amount of effort the storage industry goes to prevent it!

At Storage Field Day 9 we attended one of the most genuinely fascinating and enjoyable sessions I have ever seen. It was “proper science”!

Apparently one of the dominant causes of data corruption in SSDs, is in fact something which completely blew my mind when I heard it! Believe it or not, bit rot and data corruption is often caused by cosmic rays!

data corruption rust.png

Cosmic Rays cause Data Corruption!

These cosmic rays are actually protons and other heavy ions which originate from the Sun, or even distant stars! Next thing you know these evil buggers are coming down here, taking our bits and stealing our women! Ok, maybe not the last part, but they’re certainly interacting with other elements in our atmosphere and generating storms of neutrons (we walking flesh bags actually get hit by about 10 of them every second but as we’re not made primarily of silicon, no biggie on the data corruption front!).

These neutrons occasionally also then slam into integrated circuits, and more occasionally still, this causes a bit to flip from a 0 to a 1, or vice versa.

data corruption mind blown.jpgNow a flip of a single bit might not seem like a lot, especially with CRC and other features in modern HDDs, but the cumulative effect or a large number of these flips can lead to corrupt data. Furthermore, corruption of even a single bit of certain data types, such as the vast quantities of DNA data we plan to store in the future, could mean the difference between you being diagnosed with cancer or not!

As such, Intel have introduced a feature within their SSDs which will deliberately brick the drives if they detect too many bit flips / errors! More amusingly, they adopt “aggressive bricking“, i.e. brick the drives even when minimal data corruption is detected! A brilliantly ironic description for something which is actually trying to protect data, as this has the effect of causing your RAID or Erasure coding data protection to rebuild the drive contents on another drive, therefore ensuring that you don’t end up with corrupt data replicating etc.

Intel actually test this using a particle accelerator at Los Alamos Neutron Science Centre, by firing neutron beams at their drives and checking the data corruption rates! But don’t worry about the poor drives… it’s all over in a flash! 😉

This is genuinely an absolutely fascinating video and well worth spending 45 minutes watching it:

Also for those of you who may notice some snickering and shaking of shoulders going on in the video, it was partly down to the crazy awesomeness of the subject, but also due to some very humorous twitter conversations going on at the same time! I finally understand the meaning of the term corpsing now, having most definitely experienced it during this session! Vinod did an awesome job of putting up with us! 🙂

data corruption cosmic rays.png

Further Info

You can catch the full Intel Session at the link below, which covers other fascinating subjects such as 3D XPoint, NVMe, and SDS – They’re all well worth a watch!

Intel Storage Presents at Storage Field Day 9

Further Reading

Some of the other SFD9 delegates had their own takes on the presentation we saw. Check them out here:

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 9 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

The Tekhead Top 10 Tech Podcast Perfect Playlist Picks

I reckon that’s about enough alliteration click bait for now! If you are either a regular commuter or someone who travels a lot for work, tech podcasts are one of the best ways of staying up to date with the industry and do training without having to necessarily spend hours every week reading blog posts or El Reg.

More fool me of course, I actually do both!

As someone who is fortunate enough to work from home most of the time though, the majority of my tech podcast listening happens either when I go out for a daily walk at lunch time, when I’m traveling to client meetings, or when I’m binging on podcasts whilst getting DIY stuff done at home (it passes the time so much quicker!).

Either way, I consume a fairly large chunk of tech podcasts, and as the last time I posted a list was over two years ago, I thought I would update and re-post it here.

Tech Podcasts - Tech Podcast Meme

To be honest, not a huge amount has changed in terms of sources, though there are one or two new shows which were of such awesome quality, I had to somehow find time to squeeze them in! It’s also worth noting that I still continue to find BeyondPod as one of the most reliable and easy to use podcast consumption apps for Android!

All of these podcasts are great, but the following table is in the priority order I listen to them – you may read that as you will! 🙂

Recommended Tech Podcasts

Podcast

Published

 

Notes

Speaking in Tech

Weekly – Wednesdays

Irreverent, funny, NSFW, but thoroughly informative and entertaining consumer and enterprise tech news and interviews! Greg, Eddie and Sarah are like the Fox News of tech podcasting, only more sensational! 😉
In Tech We Trust

Weekly – Mondays

Unofficially competing hard with Speaking in Tech to be the best news focussed tech podcast out there! Unmissable!

GreyBeards on Storage

Monthly Satisfying my inner storage nerd on a monthly basis, Howard and Ray talk to existing and startup storage vendors about their products, as well as industry trends. I always learn something new listening to this show!

vSoup

Irregular Though sporadic of late, vSoup feels like a comfortable chair you sit in and chat about IT with your mates. Insightful at times, and always entertaining!

vNews

Irregular Run by two fellow London VMUG members, I’d say this could almost be described as a quintessentially British view of tech industry news. Well worth listening to!

Eigencast

Irregular but often Well known industry veteran Justin Warren, talks with companies and other industry figures about the business of IT. A very unique perspective, which those of an engineering mindset sometimes have a tendency to downplay!

VUPaaS

On hiatus? Although this seems to have fallen by the wayside, I am hoping this user and use-case focused show will make a return soon!

Around the Storage Block

Irregular

I haven’t missed an episode of Calvin’s HP vendor show in years, and though I did recently feel the need to stop one half-way through due to FUD overload, it still continues to be a great source of information.

Packet Pushers

Multiple per week

I’m no network guy, but if I have the time, I always try to catch a few episodes and learn something new. I’m never disappointed!

Professional VMware Brownbag

Weekly

I generally dip in and out of the amazingly awesome vBrownbag tech podcast content depending on what I am studying at the time. I have a massive amount of respect for the effort these guys put into providing free training for the community! Just brilliant!

Words Mean Things, Apparently – Deduplication Myths Explored

A rose by any other name would smell as sweet?

We might all agree that this is most definitely the case, but in the technology industry we have a problem, and it was highlighted across a number of the sessions we attended at Storage Field Day 9 this week.

Specifically, the use of certain terms to describe technology features, when the specific implementations are very different, and have potentially very different outcomes. This is becoming more and more of a problem across the industry as similar features are being “RFP checkboxed” as the same, when in reality they are not.

For example most of the vendors we saw support deduplication in one form or another, and in many cases there was a significant use of the word “inline”.

What do we mean by “inline deduplication”, and what impact to performance can this have?

One of the other delegates at SFD9, W Curtis Preston, had very strong opinions on this, which I am generally inclined to agree with!

UPDATE 08/04/2016: Curtis has recently published an article detailing his thoughts here.

If a write hits the system and is deduplicated prior to being written to its final non-volatile media, be it flash or disk, then it can generally be considered as inline.

Dedupe-Inline

Inline Deduplication

If deduplication is running in hardware (for example as 3PAR do in their Gen4+ ASIC), the deduplication process has minimal overhead on the system, and by not needing to send all writes to the back end storage it can actually improve performance overall, even under sustained high throughput where it can actually improve it by reducing back end writes.

Most non-inline deduplication would typically be referred to as “post-process”, and as a general rule are either run on a schedule or as a lower priority 24/7 system maintenance task. It can also run immediately after the write has gone to disk. This is still post-process, not inline.

It’s worth noting that any of these post-process methods can potentially have an impact on back-end capacity management, as dumping large quantities of data onto a system can temporarily spike capacity utilisation until the dedupe process has time to work its magic and increase storage efficiency. Not ideal if your storage capacity is approaching critical.

depu

In addition, the block has been written to an NVRAM device which should protect it from power loss etc, but the problem we have is that cache is an expensive and finite resource. As such, by throwing a sustained number of IOs at the system, you end up potentially filling up that cache/NVRAM faster than the IOs can be flushed and deduplicated, which is exacerbated by the fact that post-process dedupe generates yet more IOPS on the back end storage (by as much as 2-3x compared to the original write!). The cumulative effect causes IO to back up in the system like a dodgy toilet, thereby increasing latency and reducing your maximum capable IOPS from the system.

Worse still, in some vendor implementations, when system performance is maxed out deduplication in the IO path is dropped altogether, and inbound data is dumped out to disk as fast as possible. Then is then post-processed later, but this could obviously leave you in a bit of a hole again if you are at high capacity utilisation.

Dedupe-post

Post-Process Deduplication

None of this is likely to kick in for the vast majority of customers as they will probably have workloads generating tens of thousands of IOPS, or maybe low hundreds of thousands on aggregate. As such, for most modern systems and mixed workloads, this is unlikely to be a huge problem. However, when you have a use case which is pushing your array or HCI solution to its maximum capability, this can potentially have a significant impact on performance as described above.

[HCI – yet another misappropriated computing acronym, but I’ll let that one slide for now and move on!]

VMware VSAN Deduplication

In the case of one of one of the vendors we saw, VMware, they joked that because of the fact that they initially write to the caching flash tier prior to deduplication, they spent more time arguing over whether it was valid to call this inline than it took them to actually develop the feature! In their case, they have been open enough not to call it “inline” but instead “nearline”.

In part this is because they are always written to a flash device prior to dedupe, but also because not all of the writes to their caching tier actually get sent to the capacity tier. In fact some may live out their entire existence in an non-deduplicated state in flash cache.

dedupe.png

I applaud VMware for their attempt to avoid jumping on the inline bandwagon, though it would have perhaps been better to use a term which doesn’t already mean something completely different in the context of storage! 🙂

You can catch the full VMware session at the link below – it’s well worth a watch!
VMware Storage Presents at Storage Field Day 9

Further Reading

Some of the other SFD9 delegates and VMware staffers had their own takes on the presentation we saw. Check them out here:

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 9 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

Storage Field Day 9 – Behind the Curtain

Tech Field Day cheese

Tech Field Day is an awesome experience for all of the delegates! We get to spend an entire week unabashedly geeking out, as well as hanging out with the founders, senior folk and engineers at some of the most innovative companies in the world!

For those people who always wondered what goes on behind the scenes in the Tech Field Day experience, I took a few pano shots at the Storage Field Day 9 event this week.

Here they are, along with most of my favourite tweets and photos of the week… it was a blast!

Panos

Pre-Event Meeting

Pre-Event Meeting & Plexistor

NetApp & SolidFire

NetApp & SolidFire

Violin Memory

Violin Memory

Intel

Intel

Cohesity

Cohesity

VMware

VMware

The rest of the event…

Until next time… 🙂

%d bloggers like this: