NetApp – Is this the dawn of a new day?

Many people in the storage industry believed that NetApp made a pretty big mistake by underestimating the power of flash and its impact on the storage market. What really impressed me is that at Storage Field Day 9, Dave Hitz stood up and openly agreed!

He then went on to explain how they had recognised this and made a strategic decision to purchase one of the hottest and most innovative flash storage companies in the world, SolidFire. This has clearly been done with the intention of using SolidFire as Polyfilla for the hole in their product portfolio, but I would suggest that it is as much about SolidFire becoming a catalyst for modernising and reforming the organisation.

As with almost any company which has been around for a significant period of time and grown to a significant size (currently standing at around 12,500 employees), NetApp has become rather a behemoth, with all of the usual process-driven issues which beset companies of their scale. Much like an oil tanker, they don’t so much measure their turning circle in metres, as they do in miles.

With the exception of a few key figures and some public battles with a certain 3-letter competitor, their marketing has also historically been relatively conservative and their customers the same. As a current and historical NetApp customer and ex-NetApp admin myself, by no means am I denigrating the amazing job they have done over the years, or indeed the quality of the products they have produced! However, of late I have generally considered them to be mostly in the camp of “nobody ever got fired for buying IBM”.

Nobody ever got fired for buying IBM

In stark contrast, they have just spent a significant chunk of change on a company that is the polar opposite. SolidFire have not only brilliant engineers and impressive technology, but they also furnished their tech marketing team with some of the most well known and talented figures in the industry. These guys have been backed up by a strong, but relatively small sales organisation, who were not afraid to qualify out of shaky opportunities quickly, allowing them to concentrate their limited resources on chasing business where their unique solution had the best chance of winning. Through this very clear strategy, they have been able to grow revenues significantly year on year, ultimately leading to their very attractive $870m exit.

Having experienced a number of M&As myself, both as the acquiring company and the acquired, I can see some parallels to my own experiences. Needless to say, the teams from both sides of this new venture are in for a pretty bumpy ride over the coming months! NetApp must make the transformation into a cutting edge infrastructure company with a strong social presence, and prove themselves to be more agile to changing market requirements. This is will not be easy for some individuals in the legacy organisation, who are perhaps more comfortable with the status quo. The guys coming in from SolidFire are going to feel rather like they’re nailing jelly to a tree at times, especially when they run into many of the old processes and old guard attitudes at their new employer.

kidding

What gives me hope that the eventual outcome could be a very positive one, is that NetApp senior management have already identified and accepted these challenges, and have put a number of policies in place to mitigate them. For example, as I understand it, the staff at SolidFire have been given a remit that whenever they come across blockers to achieving success for the organisation, to ask some “hard questions”, which are robust in nature to say the least! That said, some are as simple as asking the question “Why?”. With executive sponsorship behind this endeavour ensuring that responses like “because that’s how we’ve always done it” will not be acceptable, I am confident that it will enable the SolidFire guys and gals to work with their new colleagues to affect positive change within the organisation.

I think this is reflected in Jeramiah Dooley’s recent post here, which echoes so many elements of this post I almost considered not hitting publish! 😮

If the eventual outcome of this is to make NetApp stronger and more viable in the long term, then all the better it will be for those who stick around to enjoy it! This, of course, will benefit the industry as a whole by maintaining a strong and broad set of storage companies to keep competition fierce and prices low for customers. Win-win!

bright

It is certainly going to be an interesting couple of years, and I for one am looking forward to seeing the results!

You can find the session videos from all the guys at NetApp here, I would say they are well worth the time to watch:
NetApp Presents at Storage Field Day 9

Further Reading
Some of the other SFD9 delegates had their own takes on the presentation we saw. Check them out here:

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 9 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

Storage, Tech Field Day , , , , , ,

Words Mean Things, Apparently – Deduplication Myths Explored

A rose by any other name would smell as sweet?

We might all agree that this is most definitely the case, but in the technology industry we have a problem, and it was highlighted across a number of the sessions we attended at Storage Field Day 9 this week.

Specifically, the use of certain terms to describe technology features, when the specific implementations are very different, and have potentially very different outcomes. This is becoming more and more of a problem across the industry as similar features are being “RFP checkboxed” as the same, when in reality they are not.

For example most of the vendors we saw support deduplication in one form or another, and in many cases there was a significant use of the word “inline”.

What do we mean by “inline deduplication”, and what impact to performance can this have?

One of the other delegates at SFD9, W Curtis Preston, had very strong opinions on this, which I am generally inclined to agree with!

UPDATE 08/04/2016: Curtis has recently published an article detailing his thoughts here.

If a write hits the system and is deduplicated prior to being written to its final non-volatile media, be it flash or disk, then it can generally be considered as inline.

Dedupe-Inline

Inline Deduplication

If deduplication is running in hardware (for example as 3PAR do in their Gen4+ ASIC), the deduplication process has minimal overhead on the system, and by not needing to send all writes to the back end storage it can actually improve performance overall, even under sustained high throughput where it can actually improve it by reducing back end writes.

Most non-inline deduplication would typically be referred to as “post-process”, and as a general rule are either run on a schedule or as a lower priority 24/7 system maintenance task. It can also run immediately after the write has gone to disk. This is still post-process, not inline.

It’s worth noting that any of these post-process methods can potentially have an impact on back-end capacity management, as dumping large quantities of data onto a system can temporarily spike capacity utilisation until the dedupe process has time to work its magic and increase storage efficiency. Not ideal if your storage capacity is approaching critical.

depu

In addition, the block has been written to an NVRAM device which should protect it from power loss etc, but the problem we have is that cache is an expensive and finite resource. As such, by throwing a sustained number of IOs at the system, you end up potentially filling up that cache/NVRAM faster than the IOs can be flushed and deduplicated, which is exacerbated by the fact that post-process dedupe generates yet more IOPS on the back end storage (by as much as 2-3x compared to the original write!). The cumulative effect causes IO to back up in the system like a dodgy toilet, thereby increasing latency and reducing your maximum capable IOPS from the system.

Worse still, in some vendor implementations, when system performance is maxed out deduplication in the IO path is dropped altogether, and inbound data is dumped out to disk as fast as possible. Then is then post-processed later, but this could obviously leave you in a bit of a hole again if you are at high capacity utilisation.

Dedupe-post

Post-Process Deduplication

None of this is likely to kick in for the vast majority of customers as they will probably have workloads generating tens of thousands of IOPS, or maybe low hundreds of thousands on aggregate. As such, for most modern systems and mixed workloads, this is unlikely to be a huge problem. However, when you have a use case which is pushing your array or HCI solution to its maximum capability, this can potentially have a significant impact on performance as described above.

[HCI – yet another misappropriated computing acronym, but I’ll let that one slide for now and move on!]

VMware VSAN Deduplication

In the case of one of one of the vendors we saw, VMware, they joked that because of the fact that they initially write to the caching flash tier prior to deduplication, they spent more time arguing over whether it was valid to call this inline than it took them to actually develop the feature! In their case, they have been open enough not to call it “inline” but instead “nearline”.

In part this is because they are always written to a flash device prior to dedupe, but also because not all of the writes to their caching tier actually get sent to the capacity tier. In fact some may live out their entire existence in an non-deduplicated state in flash cache.

dedupe.png

I applaud VMware for their attempt to avoid jumping on the inline bandwagon, though it would have perhaps been better to use a term which doesn’t already mean something completely different in the context of storage! 🙂

You can catch the full VMware session at the link below – it’s well worth a watch!
VMware Storage Presents at Storage Field Day 9

Further Reading

Some of the other SFD9 delegates and VMware staffers had their own takes on the presentation we saw. Check them out here:

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 9 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

Storage, Tech Field Day, VMware , , , , , , , , ,

Storage Field Day 9 – Behind the Curtain

Tech Field Day is an awesome experience for all of the delegates! We get to spend an entire week unabashedly geeking out, as well as hanging out with the founders, senior folk and engineers at some of the most innovative companies in the world!

For those people who always wondered what goes on behind the scenes in the Tech Field Day experience, I took a few pano shots at the Storage Field Day 9 event this week.

Here they are, along with most of my favourite tweets and photos of the week… it was a blast!

Panos

Pre-Event Meeting

Pre-Event Meeting & Plexistor

NetApp & SolidFire

NetApp & SolidFire

Violin Memory

Violin Memory

Intel

Intel

Cohesity

Cohesity

VMware

VMware

The rest of the event…

Until next time… 🙂

Tech Field Day , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

You had me at Tiered Non-Volatile Memory!

Memory isn’t cheap! Despite the falling costs and increasing sizes of DRAM DIMMS, it’s still damned expensive compared to most non-volatile media at a price per GB. What’s more frustrating is that often you buy all of this expensive RAM, assign it to your applications, and find later through detailed monitoring, that only a relatively small percentage is actually being actively used.

For many years, we have had technologies such as paging, which allow you to maximise the use of your physical RAM, by writing out the least used pages to disk, freeing up RAM for services with current memory demand. The problem with paging is that it is sometimes unreliable, and when you do actually need to get that page back, it can be multiple orders of magnitude slower returning it from disk.

Worse still, if you are running a workload such as virtual machines and the underlying host becomes memory constrained, a hypervisor may often not have sufficient visibility of the underlying memory utilisation, and as such will simply swap out random memory pages to a swap file. This can obviously have significant impact on virtual machine performance.

More and more applications are being built to run in memory these days, from Redis to Varnish, Hortonworks to MongDB. Even Microsoft got on the bandwagon with SQL 2014 in-memory OLTP.

One of the companies we saw at Storage Field Day ,  Plexistor, told us that can offer both tiered posix storage and tiered non-volatile memory through a single software stack.

The posix option could effectively be thought of a bit like a non-volatile, tiered RAM disk. Pretty cool, but not massively unique as RAM disks have been around for years.

The element which really interested me was the latter option; effectively a tiered memory driver which can present RAM to the OS, but in reality tier it between NVDIMMs, SSD and HDDs depending on how hot / cold the pages are! They will also be able to take advantage of newer bit addressable technologies such as 3D XPoint as they come on the market, making it even more awesome!

PlexistorArch.jpg

Plexistor Architecture

All of this is done through the simple addition of their NVM file system (i.e. device driver) on top of the pmem and bio drivers and this is compatible with most versions of Linux running reasonably up to date kernel versions.

It’s primarily designed to work with some of the Linux based memory intensive apps mentioned above, but will also work with more traditional workloads as well, such as MySQL and the KVM hypervisor.

Plexistor define their product as “Software Defined Memory” aka SDM. An interesting term which is jumping on the SDX bandwagon, but I kind of get where they’re going with it…

SDM_vs_SDS2.png

Software Defined Memory!

One thing to note with Plexistor is that they actually have two flavours of this product; one which is based on the use of NVRAM to provide a persistent store, and one which is non-persistent, but can be run on cloud infrastructures, such as AWS. If you need data persistence for the latter, you will have to do it at the application layer, or risk losing data.

If you want to find out a bit more about them, you can find their Storage Field Day presentation here:
Plexistor Presents at Storage Field Day 9

Musings…
As a standalone product, I have a sneaking suspicion that Plexistor may not have the longevity and scope which they might gain if they were procured by a large vendor and integrated into existing products. Sharon Azulai has already sold one startup in relatively early stages (Tonian, which they sold to Primary Data), so I suspect he would not be averse to this concept.

Although the code has been written specifically for the Linux kernel, they have already indicated that it would be possible to develop the same driver for VMware! As such, I think it would be a really interesting idea for VMware to consider acquiring them and integrating the technology into ESXi. It’s generally recognised as a universal truth that you run out of memory before CPU on most vSphere solutions. Moreover, when looking in the vSphere console we often see that although a significant amount of memory is allocated to VMs, often only a small amount is actually active RAM.

The use of Plexistor technology with vSphere would enable VMware to both provide an almost infinite pool of RAM per host for customers, as well as being able to significantly improve upon the current vswp process by ensuring hot memory blocks always stay on RAM and cold blocks are tiered out to flash.

plexistorvmware

The homelab nerd in me also imagines an Intel NUC with 160GB+ of addressable RAM per node! 🙂

Of course the current licensing models for retail customers favour the “run out of RAM first” approach as it sells more per-CPU licenses, however, I think in the long term VMware will likely move to a subscription based model, probably similar to that used by service providers (i.e. based on RAM). If this ends up being the approach, then VMware could offer a product which saves their customers further hardware costs whilst maintaining their ESXi revenues. Win-Win!

Further Reading
One of the other SFD9 delegates had their own take on the presentation we saw. Check it out here:

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 9 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

Storage, Tech Field Day, VMware , , , , , , , , , , , , ,