Tag Archive for IOPS

VulcanCast Follow Up – A few thoughts on 60TB SSDs

So last week I was kindly invited to share a ride in Marc Farley‘s car (not as dodgy as it sounds, I promise!).

The premise was to discuss the recent announcements around Seagate’s 60TB SSD, Samsung’s 30TB SSD, their potential use cases, and how on earth we can protect the quantities of data which will end up on these monster drives?!

Performance

As we dug into a little in the VulcanCast, many use cases will present themselves for drives of this type, but the biggest challenge is that the IOPS density of the drives not actually very high. On a 60TB drive with 150,000 read IOPS (and my guess but not confirmed is ~100,000 or fewer write IOPS), the average IOPS per GB is actually only a little higher than that of SAS 15K drives. When you start adding deduplication and compression into the mix, if you are able to achieve around 90-150TB per drive, you could easily be looking at IOPS/GB performance approaching smaller 10K SAS devices!

Seagate 60TB SSD Vulcancast flash is fastThe biggest benefit of course if that you achieve this performance in a minuscule footprint by comparison to any current spindle type. Power draw is orders of magnitude lower than 10/15K, and at least (by my estimates) at least 4x lower than using NL-SAS / SATA at peak, and way more at idle. As such, a chunk of the additional cost of using flash for secondary tier workloads, could be soaked up by your space and power savings, especially in high-density environments.

In addition, the consistency of the latency will open up some interesting additional options…

SAS bus speeds could also end up being a challenge. Modern storage arrays often utilise 12GB SAS to interconnect the shelves and disks, which gives you multiple SAS channels over which to transfer data. With over half a PB of usable storage in just a dozen drives, which could be 1PB with compression and dedupe, and that’s a lot of storage to stick on a single channel! In the long term, faster connectivity methods such as NVMe will help, but in the short-term we may even have to see some interesting scenarios with one controller (and channel) for every few drives, just to ensure we don’t saturate bandwidth too easily.

Seagate 60TB SSD Vulcancast Big DataUse Cases

For me, the biggest use cases for this type of drive are going to be secondary storage workloads which require low(ish) latency, a reasonable number of predominantly Read IOPS, and consistent performance even when a little bit bursty. For example:

  • Unstructured data stores, such as file / NAS services where you may access data infrequently, possibly tiered with some faster flash for cache and big write bursts.
  • Media storage for photo and video sites (e.g. facebook, but there are plenty of smaller ones such as Flickr, Photobox, Funky Pigeon, Snapfish, etc. Indeed the same types of organisations we discussed at the Storage Field Day roundtable session on high performance object storage. Obviously one big disadvantage here, would be the inability to dedupe / compress very much as you typically can’t expect high ratios for media content, which then has the effect of pushing up the cost per usable GB.
  • Edge cache nodes for large media streaming services such as NetFlix where maximising capacity and performance in a small footprint to go in other providers data centres is pretty important,whilst being able to provide a consistent performance for many random read requests.

For very large storage use cases, I could easily see these drives replacing 10K and if the price can be brought down sufficiently, for highly dedupable (is that a word?) data types, starting to edge into competing with NL SAS / SATA in a few years.

Data Protection

Here’s where things start to get a little tricky… we are now talking about protecting data at such massive quantities, failure of just two drives within a short period, has the potential to cause the loss of many hundreds of terabytes of data. At the same time, adding additional drives for protection (at tens of thousands of dollars each) comes with a pretty hefty price tag!

Seagate 60TB SSD Vulcancast data protectionUnless you are buying a significant number of drives, the cost of your “N+1”, RAID, erasure coding, etc is going to be so exorbitant, you may as well buy a larger number of small drives so you don’t waste all of that extra capacity. As such, I can’t see many people using these drives in quantities of less than 12-24 per device (or perhaps per RAIN set in a hyper-converged platform), which means even with a conservatively guestimated cost of $30k per drive, you’re looking at the best part of $350-$700k for your disks alone!

Let’s imagine then, the scenario where you have a single failed drive, and 60TB of your data is now hanging in the balance. Would you want to replace that drive in a RAID set, and based on the write rates suggested so far, wait 18-24 hours for it to resync? I would be pretty nervous to do that myself…

In addition, we need to consider the rate of change of the data. Let’s say our datastore consists of 12x60TB drives. We probably have about 550TB or more of usable capacity. Even with a rate of change of just 5%, we need to be capable of backing up 27TB from that single datastore per night just to keep up with the incrementals! If we were to use a traditional backup solution against something like this, to achieve this in a typical 10-hour backup window will generate a consistent 6Gbps, never mind any full backups!

Ok, let’s say we can achieve these kinds of backup rates comfortably. Fine. Now, what happens if we had failure of a shelf, parity group or pool of disks? We’ve probably just lost 250+TB of data (excluding compression or dedupe) which we now need to restore from backup. Unless you are comfortable with an RTO measured in days to weeks, you might find that the restore time for this, even over a 10Gbps network, is not going to meet your business requirements!!!

This leaves us with a conundrum of wondering how we increase the durability of the data against disk failures, and how do we minimise the rebuild time in the event of data media failure, whilst still keeping costs reasonably low.

Seagate 60TB SSD VulcancastToday, the best option seems to me to be the use of Erasure Coding. In the event of the loss of a drive, the data is then automatically rebuilt and redistributed across many or all of the remaining drives within the storage device. Even with say 12-24 drives in a “small” system, this would mean data being rebuilt back up to full protection in 30-60 minutes, instead of 18-24 hours! That said, this assumes the connectivity on the array bus / backplane is capable of handling the kinds of bandwidth generated by the rebuilds, and that this doesn’t have a massive adverse impact on the array processors!

The use of “instant restore” technologies, where you can mount data direct from the backup media to get up and running asap, then move the data transparently in the background also seems to me to be a reasonable mitigation. In order to maintain a decent level of performance, this will likely also drive the use of flash more in the data protection storage tiers as well as production.

The Tekhead Take

Whatever happens, the massive quantities of data we are beginning to see, and the drives we plan to store them on are going to need to lead us to new (as yet, not even invented) forms of data protection. We simply can’t keep up with the rates of growth without them!

VulcanCast

Catch the video here:

The video and full transcript are also available here:
Huge SSDs will force changes to data protection strategies – with @alexgalbraith

Software Defined Storage Virtualisation – How useful is that then?

Ignoring the buzzword bingo post title, storage virtualisation is not a new thing (and for my American cousins, yes, it should be spelt with an s! 🙂 ).

NetApp have for example been doing a V-Series controller for many years which could virtualise pretty much any storage you stick in the back of it. It would then present it as NFS and layer on all of the standard ONTAP features.

The big advantage then was that you can use the features which might otherwise be missing from your primary or secondary storage tiers, as well as being able to mix and match different tiers of storage from the same platform.

In a previous role, we had an annual process to full backup and restore a 65TB Oracle database from one site to another over a rather slow link, using an ageing VTL that could just about cope with incrementals and not much more on a day to day basis. End to end this process took a month!

Then one year we came up with a plan to used virtualised NFS storage to do compressed RMAN backups, replicate the data using snap mirror and restore on the other side. It took us 3 days; an order of magnitude improvement!

That was 4 years ago, when the quantity of data globally was about 4x less than it is now; the problem of data inertia is only going to get worse as the worlds storage consumption doubles roughly every two years!

What businesses need is the flexibility to use a heterogeneous pool of storage of different tiers and vendors in different locations to move our data around as required to meet our current IT strategy, without having to change paths to data or take downtime (especially on non virtualised workloads which don’t have the benefits of Storage vMotion etc). These tiers need to provide the consistent performance defined by individual application requirements.

It’s for this reason that I was really interested in the presentation from Primary Data at Storage Field Day 8. They were founded just two years ago, came out of stealth at VMworld 2015, and plan to go GA with their first product in less than a month’s time. They also have some big technical guns in the form of their Chief Scientist, the inimitable Steve Wozniak!

One of the limitations of the system I used in the past was that it was ultimately a physical appliance, with all the usual drawbacks thereof. Primary Data are providing the power to abstract data services based on software only, presented in the most appropriate format for the workload at hand (e.g. for vSphere, Windows, Linux etc), so issues with data gravity and inertia are effectively mitigated. I immediately see three big benefits:

  • Not only can we decouple the physical location of the data from it’s logical representation and therefore move that data at will, we can also very quickly take advantage of emerging storage technologies such as VVOLs.
    Some companies who shall remain nameless (and happen to have just been bought by a four letter competitor) won’t have support for VVOLs for up to another 12 months on some of their products, but with the “shim” layer of storage virtualisation from Primary Data, we could do it today on virtually any storage platform whether it is VVOL compliant or not. Now that is cool!
  • By virtualising the data plane and effectively using the underlying storage as object storage / chains of blocks, they enable additional data services which may either not be included with the current storage, or may be an expensive add-on license. A perfect example of this is sync and async replication between heterogenous devices.
    Perhaps then you could spend the bulk of your budget on fast and expensive storage in your primary DC from vendor A, then replicate to your DR site asynchronously onto cheaper storage from vendor B, or even a hyper-converged storage environment using all local server media. The possibilities are broad to say the least!
  • The inclusion of policy based Quality of Service from day one. In Primary Data parlance, they call them SLOs – Service Level Objectives for applications with specific IOPS, latency etc.
    QoS does not even exist as a concept on many recent storage devices, much to the chagrin of many service providers for example, so being able to retrofit it would protect the ROI on existing spend whilst keeping the platform services up to date.

There are however still a few elements which to me are not yet perfect. Access to SMB requires a filter driver in Windows in front of the SMB client, so the client thinks it’s talking to an SMB server but it’s actually going via the control plane to route the data to the physical block chains. A bit of a pain to retrofit to any large legacy environment.

vSphere appears to be a first class tenant in the Primary Data solution, with VASA and NFS-VAAI supported out of the “virtual” box, however it would be nice to have Primary Data as a VASA Client too, so it could read and then surface all capabilities from the underlying storage straight through to the vSphere hosts.

You will still have to do some basic administration on your storage back end to present it through to Primary Data before you can start carving it up in their “Single Pane of Glass”. If they were to create array plugins which would allow you to remote manage many common arrays this would really make that SPoG shine! (Yes, I have a feverish unwavering objection to saying that acronym!)

I will certainly be keeping an eye on Primary Data as they come to market. Their initial offering would have solved a number of issues for me in previous roles if it had been available a few years earlier, and I can definitely see opportunities where it would work well in my current infrastructure. I guess it now becomes up to the market to decide whether they see the benefits too!

Further Reading
Some of the other SFD8 delegates have their own takes on the presentation we saw. Check them out here:

Ray Lucchesi – Primary data’s path to better data storage presented at SFD8

Dan Frith – Primary Data  Because we all want our storage to do well

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 8 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

%d bloggers like this: