Why are storage snapshots so painful?

Have you ever wondered why we don’t use snapshots more often than about every 5-15 minutes in most solutions, and in many others, a lot less often than that?

It’s pretty simple to be honest… The biggest problem with taking snapshots is quiescing the data stream to complete the activity. At a LUN level, this usually involves some form of locking mechanism to pause all IO while any metadata updates or data redirections are made, after which the IO is resumed.

For small machines and LUNs with minimal IO load this is generally such a quick operation that it has virtually no effect on the application user, and is pretty much transparent. For busy applications, however, data can be changing at such a massive rate that disrupting that IO stream, even for a few seconds can have a significant impact on performance and user experience. In addition the larger the number of snapshots in the snap tree, the more that performance is often degraded through the management of large numbers of snapshots, copy on write activities, and, of course, lots of locking.

This problem is then multiplied several times over when you want to get consistency across multiple machines, for example when you want to get point-in-time consistency for an entire application stack (Web / App / DB, etc).

So what do we typically do? We reduce the regularity at which we take these snaps in order to minimise the impact, whilst still having to meet the (usually near zero because all data is critical, right?) RPO set by the business.

At SFD8, we had a very well received presentation from INFINIDAT, a storage startup based in Israel and founded by industry legend Moshe Yanai (the guy who brought you EMC Symmetrix / VMAX, and subsequently XIV). Moshe’s “third generation” enterprise class storage system comes with one particular feature with which I was really interested; snapshots! Yes, I know it sounds like a boring “checkbox in an RFP” feature, but when I found out how it worked I was really impressed.

For every single write stripe which goes to disk, a checksum and a timestamp (from a high precision clock) are written. This forms the base on which the snapshot system is built (something they call InfiniSnap™).

If you have a micro-second accurate clock and timestamps on every write, then in order to achieve a snapshot you simply have to pick a date and time! Anything written earlier than this is not included in the current snap, and anything on or after the time is. This means no locking or pausing of IO during a snap, making the entire process a near zero time and a zero impact operation! A volume with or without snapshots, therefore has indistinguishable performance. Wow!

Screen Shot 2015-12-13 at 20.55.19

It sounds so simple it shouldn’t work, but according to INFINIDAT they can easily support up to 100,000 snaps per system, and even this isn’t even a real number. They made it up as it was a double figure percentage bigger than the next closest array on the market. They will also happily support more than this if you ask, they said that they just need to test it first. In addition, each snap group will support up to 25 snaps per second, and they guarantee an RPO of as low as 4 seconds, based on snapshots alone. You can then use point in time replication to create an asynchronous copy on another array if needed. Now that’s granular! 🙂

The one caveat I would add to this is that this does not yet appear to have a fix for ye old faithful crash consistent vs application consistent issue, but it’s a great start. Going back to the application stack “consistency group” concept, in theory, you generally only need to VSS the database VM, and as such it will be much easier and simpler to have a consistent snap across an app stack with minimal overhead. As we move more towards applications using No-SQL databases etc, this will also become less of an issue.

The above was just one of the cool features they covered in their presentation, from which the general consensus was very positive indeed! A couple of weeks ago I was also able to spend a little time with one of INFINIDAT’s customers who just so happened to be attending the same UKVMUG event. Their impressions in terms of the quality of the array build (with a claimed 99.99999% availability!), the management interface, general performance during initial testing, the compelling pricing, and of course, their very funky matrix-like chassis, were all very positive too.

If you want to see the INFINIDAT presentation from SFD8, make sure you have your thinking hat on and a large jug of coffee! Their very passionate CTO, Brian Carmody, was a very compelling speaker and was more than happy to get stuck into the detail of how the technology works. I definitely felt that I came away a little smarter having been a part of the audience! He also goes into some fascinating detail about genome sequencing, the concept of cost per genome and it’s likely massive impact on the storage industry and our lives in general! The video is worth a watch for this section alone…

Further Reading
Some of the other SFD8 delegates have their own takes on the presentation we saw. Check them out here:

Dan FrithINFINIDAT – What exactly is a “Moshe v3.0”?
Enrico Signoretti’s blog Juku.itInfinidat: awesome tech, great execution
Enrico Signoretti writing on El RegHas the next generation of monolithic storage arrived?
Ray LucchesiMobile devices as a cache for cloud data
Vipin V.K. – Infinibox – Enterprise storage solution from Infinidat
GreyBeards on Storage Podcast – Interview with Brian Carmody

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 8 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

Storage, Tech Field Day , , , , , , , , , , , ,