Tag Archive for Deduplication

StorageOS – An array based on containers? It’s like storage for millenials!

Last week I managed to catch up with the guys from StorageOS, a new container-based storage company, headquartered in London. I found out about them at a London Storage Beers event a few weeks ago, and my first question was, what the hell is container-based storage, and how does it work?!

They started from the premise (yes that’s actually the correct use of the word premise!), that if you want to build a storage system FOR containers, what better way to do it than to build it FROM containers. StorageOS therefore offer what they describe as “full enterprise storage array functionality, delivered by software, on a pay-as-you-go basis”. They also plan to offer a free-forever Developer tier, which includes everything except HA functionality which you would obviously need for production usage!

StorageOS Announcement

So the good news is, today (Monday 20th June 2016) StorageOS are announcing the release of their Beta at DockerCon, so you can now download and test out their new storage platform.

The StorageOS Stack

The StorageOS Stack

 

You can deploy this StorageOS software anywhere from bare metal to containers:

StorageOS - It's software, so it runs anywhere!

It’s software, so it runs anywhere!

Appliances for some of the larger clouds are in the works, but will not be available on day zero.

They can then consume any back-end storage, from SSD, HDDs and virtual drives, to EBS volumes, object stores, etc. You then pool all of capacity from all devices into a capacity pool, which is deduped, encrypted, and available across all nodes, and carve out volumes to present to systems like Docker through their own native Docker driver, or (slightly oddly) iSCSI / FC!!! They even have VAAI support in development!

Overall, I think it’s a pretty interesting product. At first look it feels a bit like a traditional array in a container package, much like if you containerised an enterprise app, then just utilised as a traditional array with some container plugins, instead of being very targeted and container-specific. StorageOS do have an OS driver to let you mount their volumes direct from containers, but there are other things out there today which do that anyway (e.g. Flocker).

I would say their messaging is a little inconsistent at the moment, and adding things like FC integration early on feels a bit odd if they’re positioning themselves as a container play. They do however state clearly that they’re targeting enterprises and want to make the on-boarding process as simple and friction-less as possible. I do worry that this “all things to all people” approach could be a wee bit risky at this early stage, and being more laser focused in the short to medium term would allow them to differentiate more.

StorageOS Cloud

The founders were very specific when they stated that they were building a clustered array with synchronous remote replicas, not a distributed storage array. Async replication is coming, which will be critical to maintaining performance in a hybrid cloud or multi-cloud setup. I really like the fact that you can stretch the same hybrid storage environment between your on-premises and cloud infrastructure using a single storage solution. This same solution can actually be used to span multiple public clouds as well, providing a resilient storage solution between say AWS and Azure, all of which is deduped and encrypted of course! This could be very interesting indeed, as customers look to protect their workloads from large public outages!

Finally, the StorageOS software is built (as you would expect these days) with APIs at the heart of everything. Even the modern GUI is really just based on API calls to the back end.

The Tekhead Take

Anyway, enough gabbing… It’s still early days, but the storage experience of the founders is certainly solid! Who better than ex-storage admins to provide a product that works well for storage admins?! I’d say there’s a good chance of this becoming a pretty cool product in the future, so definitely one to watch!

You can find a link to their website and beta sign up here:
http://storageos.com/index.php/product/

StorageOS hipster-approved storage

Words Mean Things, Apparently – Deduplication Myths Explored

A rose by any other name would smell as sweet?

We might all agree that this is most definitely the case, but in the technology industry we have a problem, and it was highlighted across a number of the sessions we attended at Storage Field Day 9 this week.

Specifically, the use of certain terms to describe technology features, when the specific implementations are very different, and have potentially very different outcomes. This is becoming more and more of a problem across the industry as similar features are being “RFP checkboxed” as the same, when in reality they are not.

For example most of the vendors we saw support deduplication in one form or another, and in many cases there was a significant use of the word “inline”.

What do we mean by “inline deduplication”, and what impact to performance can this have?

One of the other delegates at SFD9, W Curtis Preston, had very strong opinions on this, which I am generally inclined to agree with!

UPDATE 08/04/2016: Curtis has recently published an article detailing his thoughts here.

If a write hits the system and is deduplicated prior to being written to its final non-volatile media, be it flash or disk, then it can generally be considered as inline.

Dedupe-Inline

Inline Deduplication

If deduplication is running in hardware (for example as 3PAR do in their Gen4+ ASIC), the deduplication process has minimal overhead on the system, and by not needing to send all writes to the back end storage it can actually improve performance overall, even under sustained high throughput where it can actually improve it by reducing back end writes.

Most non-inline deduplication would typically be referred to as “post-process”, and as a general rule are either run on a schedule or as a lower priority 24/7 system maintenance task. It can also run immediately after the write has gone to disk. This is still post-process, not inline.

It’s worth noting that any of these post-process methods can potentially have an impact on back-end capacity management, as dumping large quantities of data onto a system can temporarily spike capacity utilisation until the dedupe process has time to work its magic and increase storage efficiency. Not ideal if your storage capacity is approaching critical.

depu

In addition, the block has been written to an NVRAM device which should protect it from power loss etc, but the problem we have is that cache is an expensive and finite resource. As such, by throwing a sustained number of IOs at the system, you end up potentially filling up that cache/NVRAM faster than the IOs can be flushed and deduplicated, which is exacerbated by the fact that post-process dedupe generates yet more IOPS on the back end storage (by as much as 2-3x compared to the original write!). The cumulative effect causes IO to back up in the system like a dodgy toilet, thereby increasing latency and reducing your maximum capable IOPS from the system.

Worse still, in some vendor implementations, when system performance is maxed out deduplication in the IO path is dropped altogether, and inbound data is dumped out to disk as fast as possible. Then is then post-processed later, but this could obviously leave you in a bit of a hole again if you are at high capacity utilisation.

Dedupe-post

Post-Process Deduplication

None of this is likely to kick in for the vast majority of customers as they will probably have workloads generating tens of thousands of IOPS, or maybe low hundreds of thousands on aggregate. As such, for most modern systems and mixed workloads, this is unlikely to be a huge problem. However, when you have a use case which is pushing your array or HCI solution to its maximum capability, this can potentially have a significant impact on performance as described above.

[HCI – yet another misappropriated computing acronym, but I’ll let that one slide for now and move on!]

VMware VSAN Deduplication

In the case of one of one of the vendors we saw, VMware, they joked that because of the fact that they initially write to the caching flash tier prior to deduplication, they spent more time arguing over whether it was valid to call this inline than it took them to actually develop the feature! In their case, they have been open enough not to call it “inline” but instead “nearline”.

In part this is because they are always written to a flash device prior to dedupe, but also because not all of the writes to their caching tier actually get sent to the capacity tier. In fact some may live out their entire existence in an non-deduplicated state in flash cache.

dedupe.png

I applaud VMware for their attempt to avoid jumping on the inline bandwagon, though it would have perhaps been better to use a term which doesn’t already mean something completely different in the context of storage! 🙂

You can catch the full VMware session at the link below – it’s well worth a watch!
VMware Storage Presents at Storage Field Day 9

Further Reading

Some of the other SFD9 delegates and VMware staffers had their own takes on the presentation we saw. Check them out here:

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 9 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

Storage Field Day 9 – Behind the Curtain

Tech Field Day cheese

Tech Field Day is an awesome experience for all of the delegates! We get to spend an entire week unabashedly geeking out, as well as hanging out with the founders, senior folk and engineers at some of the most innovative companies in the world!

For those people who always wondered what goes on behind the scenes in the Tech Field Day experience, I took a few pano shots at the Storage Field Day 9 event this week.

Here they are, along with most of my favourite tweets and photos of the week… it was a blast!

Panos

Pre-Event Meeting

Pre-Event Meeting & Plexistor

NetApp & SolidFire

NetApp & SolidFire

Violin Memory

Violin Memory

Intel

Intel

Cohesity

Cohesity

VMware

VMware

The rest of the event…

Until next time… 🙂

%d bloggers like this: