Archive for Storage

Tech Field Day 12 (TFD12) – Preview

Tech Field Day 12 (TFD12)

For those people who haven’t heard of Tech Field Day, it’s an awesome event run by the inimitable Stephen Foskett. The event enables tech vendors and real engineers / architects / bloggers (aka delegates) to sit down and have a conversation about their latest products, along with technology and industry trends.

Ever been reading up on a vendor’s website about their technology and had some questions they didn’t answer? One of the roles of the TFD delegates is to ask the questions which help viewers to understand the technology. If you tune in live, you can also post questions via twitter and the delegates, who will happily ask them on your behalf!

As a delegate it’s an awesome experience as you get to spend several days visiting some of the biggest and newest companies in the industry, nerding out with like-minded individuals, and learning as much from the other delegates as you do from the vendors!

So with this in mind, I am very pleased to say that I will be joining the TFD crew for the third time in San Jose, for Tech Field Day 12, from the 15th-16th of November!

Tech Field Day 12 (TFD12) Vendors

As you can see from the list of vendors, there are some truly awesome sessions coming up! Having previously visited Intel and Cohesity, as well as written about StorageOS, it will be great to catch up with them and find out about their latest innovations. DellEMC are going through some massive changes at the moment, so their session should be fascinating. Finally, I haven’t had the pleasure of visiting rubrik, DriveScale or Igneous to date, so should be very interesting indeed!

That said, if there was one vendor I am probably most looking forward to visiting at Tech Field Day 12, it’s Docker! Container adoption is totally changing the way that developers architect and deploy software, and I speak to customers regularly who are now beginning to implement them in anger. It will definitely be interesting to find out about their latest developments.

If you want to tune in live to the sessions, see the following link:
Tech Field Day 12

If for any reason you can’t make it live, have no fear! All of the videos are posted on YouTube and Vimeo within a day or so of the event.

Finally, if you can’t wait for November, pass the time by catching some of the fun and highlights from the last event I attended:

Storage Field Day 9 – Behind the Curtain

VulcanCast Follow Up – A few thoughts on 60TB SSDs

So last week I was kindly invited to share a ride in Marc Farley‘s car (not as dodgy as it sounds, I promise!).

The premise was to discuss the recent announcements around Seagate’s 60TB SSD, Samsung’s 30TB SSD, their potential use cases, and how on earth we can protect the quantities of data which will end up on these monster drives?!

Performance

As we dug into a little in the VulcanCast, many use cases will present themselves for drives of this type, but the biggest challenge is that the IOPS density of the drives not actually very high. On a 60TB drive with 150,000 read IOPS (and my guess but not confirmed is ~100,000 or fewer write IOPS), the average IOPS per GB is actually only a little higher than that of SAS 15K drives. When you start adding deduplication and compression into the mix, if you are able to achieve around 90-150TB per drive, you could easily be looking at IOPS/GB performance approaching smaller 10K SAS devices!

Seagate 60TB SSD Vulcancast flash is fastThe biggest benefit of course if that you achieve this performance in a minuscule footprint by comparison to any current spindle type. Power draw is orders of magnitude lower than 10/15K, and at least (by my estimates) at least 4x lower than using NL-SAS / SATA at peak, and way more at idle. As such, a chunk of the additional cost of using flash for secondary tier workloads, could be soaked up by your space and power savings, especially in high-density environments.

In addition, the consistency of the latency will open up some interesting additional options…

SAS bus speeds could also end up being a challenge. Modern storage arrays often utilise 12GB SAS to interconnect the shelves and disks, which gives you multiple SAS channels over which to transfer data. With over half a PB of usable storage in just a dozen drives, which could be 1PB with compression and dedupe, and that’s a lot of storage to stick on a single channel! In the long term, faster connectivity methods such as NVMe will help, but in the short-term we may even have to see some interesting scenarios with one controller (and channel) for every few drives, just to ensure we don’t saturate bandwidth too easily.

Seagate 60TB SSD Vulcancast Big DataUse Cases

For me, the biggest use cases for this type of drive are going to be secondary storage workloads which require low(ish) latency, a reasonable number of predominantly Read IOPS, and consistent performance even when a little bit bursty. For example:

  • Unstructured data stores, such as file / NAS services where you may access data infrequently, possibly tiered with some faster flash for cache and big write bursts.
  • Media storage for photo and video sites (e.g. facebook, but there are plenty of smaller ones such as Flickr, Photobox, Funky Pigeon, Snapfish, etc. Indeed the same types of organisations we discussed at the Storage Field Day roundtable session on high performance object storage. Obviously one big disadvantage here, would be the inability to dedupe / compress very much as you typically can’t expect high ratios for media content, which then has the effect of pushing up the cost per usable GB.
  • Edge cache nodes for large media streaming services such as NetFlix where maximising capacity and performance in a small footprint to go in other providers data centres is pretty important,whilst being able to provide a consistent performance for many random read requests.

For very large storage use cases, I could easily see these drives replacing 10K and if the price can be brought down sufficiently, for highly dedupable (is that a word?) data types, starting to edge into competing with NL SAS / SATA in a few years.

Data Protection

Here’s where things start to get a little tricky… we are now talking about protecting data at such massive quantities, failure of just two drives within a short period, has the potential to cause the loss of many hundreds of terabytes of data. At the same time, adding additional drives for protection (at tens of thousands of dollars each) comes with a pretty hefty price tag!

Seagate 60TB SSD Vulcancast data protectionUnless you are buying a significant number of drives, the cost of your “N+1”, RAID, erasure coding, etc is going to be so exorbitant, you may as well buy a larger number of small drives so you don’t waste all of that extra capacity. As such, I can’t see many people using these drives in quantities of less than 12-24 per device (or perhaps per RAIN set in a hyper-converged platform), which means even with a conservatively guestimated cost of $30k per drive, you’re looking at the best part of $350-$700k for your disks alone!

Let’s imagine then, the scenario where you have a single failed drive, and 60TB of your data is now hanging in the balance. Would you want to replace that drive in a RAID set, and based on the write rates suggested so far, wait 18-24 hours for it to resync? I would be pretty nervous to do that myself…

In addition, we need to consider the rate of change of the data. Let’s say our datastore consists of 12x60TB drives. We probably have about 550TB or more of usable capacity. Even with a rate of change of just 5%, we need to be capable of backing up 27TB from that single datastore per night just to keep up with the incrementals! If we were to use a traditional backup solution against something like this, to achieve this in a typical 10-hour backup window will generate a consistent 6Gbps, never mind any full backups!

Ok, let’s say we can achieve these kinds of backup rates comfortably. Fine. Now, what happens if we had failure of a shelf, parity group or pool of disks? We’ve probably just lost 250+TB of data (excluding compression or dedupe) which we now need to restore from backup. Unless you are comfortable with an RTO measured in days to weeks, you might find that the restore time for this, even over a 10Gbps network, is not going to meet your business requirements!!!

This leaves us with a conundrum of wondering how we increase the durability of the data against disk failures, and how do we minimise the rebuild time in the event of data media failure, whilst still keeping costs reasonably low.

Seagate 60TB SSD VulcancastToday, the best option seems to me to be the use of Erasure Coding. In the event of the loss of a drive, the data is then automatically rebuilt and redistributed across many or all of the remaining drives within the storage device. Even with say 12-24 drives in a “small” system, this would mean data being rebuilt back up to full protection in 30-60 minutes, instead of 18-24 hours! That said, this assumes the connectivity on the array bus / backplane is capable of handling the kinds of bandwidth generated by the rebuilds, and that this doesn’t have a massive adverse impact on the array processors!

The use of “instant restore” technologies, where you can mount data direct from the backup media to get up and running asap, then move the data transparently in the background also seems to me to be a reasonable mitigation. In order to maintain a decent level of performance, this will likely also drive the use of flash more in the data protection storage tiers as well as production.

The Tekhead Take

Whatever happens, the massive quantities of data we are beginning to see, and the drives we plan to store them on are going to need to lead us to new (as yet, not even invented) forms of data protection. We simply can’t keep up with the rates of growth without them!

VulcanCast

Catch the video here:

The video and full transcript are also available here:
Huge SSDs will force changes to data protection strategies – with @alexgalbraith

StorageOS – An array based on containers? It’s like storage for millenials!

Last week I managed to catch up with the guys from StorageOS, a new container-based storage company, headquartered in London. I found out about them at a London Storage Beers event a few weeks ago, and my first question was, what the hell is container-based storage, and how does it work?!

They started from the premise (yes that’s actually the correct use of the word premise!), that if you want to build a storage system FOR containers, what better way to do it than to build it FROM containers. StorageOS therefore offer what they describe as “full enterprise storage array functionality, delivered by software, on a pay-as-you-go basis”. They also plan to offer a free-forever Developer tier, which includes everything except HA functionality which you would obviously need for production usage!

StorageOS Announcement

So the good news is, today (Monday 20th June 2016) StorageOS are announcing the release of their Beta at DockerCon, so you can now download and test out their new storage platform.

The StorageOS Stack

The StorageOS Stack

 

You can deploy this StorageOS software anywhere from bare metal to containers:

StorageOS - It's software, so it runs anywhere!

It’s software, so it runs anywhere!

Appliances for some of the larger clouds are in the works, but will not be available on day zero.

They can then consume any back-end storage, from SSD, HDDs and virtual drives, to EBS volumes, object stores, etc. You then pool all of capacity from all devices into a capacity pool, which is deduped, encrypted, and available across all nodes, and carve out volumes to present to systems like Docker through their own native Docker driver, or (slightly oddly) iSCSI / FC!!! They even have VAAI support in development!

Overall, I think it’s a pretty interesting product. At first look it feels a bit like a traditional array in a container package, much like if you containerised an enterprise app, then just utilised as a traditional array with some container plugins, instead of being very targeted and container-specific. StorageOS do have an OS driver to let you mount their volumes direct from containers, but there are other things out there today which do that anyway (e.g. Flocker).

I would say their messaging is a little inconsistent at the moment, and adding things like FC integration early on feels a bit odd if they’re positioning themselves as a container play. They do however state clearly that they’re targeting enterprises and want to make the on-boarding process as simple and friction-less as possible. I do worry that this “all things to all people” approach could be a wee bit risky at this early stage, and being more laser focused in the short to medium term would allow them to differentiate more.

StorageOS Cloud

The founders were very specific when they stated that they were building a clustered array with synchronous remote replicas, not a distributed storage array. Async replication is coming, which will be critical to maintaining performance in a hybrid cloud or multi-cloud setup. I really like the fact that you can stretch the same hybrid storage environment between your on-premises and cloud infrastructure using a single storage solution. This same solution can actually be used to span multiple public clouds as well, providing a resilient storage solution between say AWS and Azure, all of which is deduped and encrypted of course! This could be very interesting indeed, as customers look to protect their workloads from large public outages!

Finally, the StorageOS software is built (as you would expect these days) with APIs at the heart of everything. Even the modern GUI is really just based on API calls to the back end.

The Tekhead Take

Anyway, enough gabbing… It’s still early days, but the storage experience of the founders is certainly solid! Who better than ex-storage admins to provide a product that works well for storage admins?! I’d say there’s a good chance of this becoming a pretty cool product in the future, so definitely one to watch!

You can find a link to their website and beta sign up here:
http://storageos.com/index.php/product/

StorageOS hipster-approved storage

Data Corruption – The Silent Killer (aka Cosmic Rays are baaaad mmmkay?)

Minion Assassin

If you have worked in the IT industry for a reasonable amount of time, you have probably heard the term bit rot, referring to the gradual decay of storage media over time, or simply Data Corruption. What I never realised was what one of the primary causes is behind bit rot, and the amount of effort the storage industry goes to prevent it!

At Storage Field Day 9 we attended one of the most genuinely fascinating and enjoyable sessions I have ever seen. It was “proper science”!

Apparently one of the dominant causes of data corruption in SSDs, is in fact something which completely blew my mind when I heard it! Believe it or not, bit rot and data corruption is often caused by cosmic rays!

data corruption rust.png

Cosmic Rays cause Data Corruption!

These cosmic rays are actually protons and other heavy ions which originate from the Sun, or even distant stars! Next thing you know these evil buggers are coming down here, taking our bits and stealing our women! Ok, maybe not the last part, but they’re certainly interacting with other elements in our atmosphere and generating storms of neutrons (we walking flesh bags actually get hit by about 10 of them every second but as we’re not made primarily of silicon, no biggie on the data corruption front!).

These neutrons occasionally also then slam into integrated circuits, and more occasionally still, this causes a bit to flip from a 0 to a 1, or vice versa.

data corruption mind blown.jpgNow a flip of a single bit might not seem like a lot, especially with CRC and other features in modern HDDs, but the cumulative effect or a large number of these flips can lead to corrupt data. Furthermore, corruption of even a single bit of certain data types, such as the vast quantities of DNA data we plan to store in the future, could mean the difference between you being diagnosed with cancer or not!

As such, Intel have introduced a feature within their SSDs which will deliberately brick the drives if they detect too many bit flips / errors! More amusingly, they adopt “aggressive bricking“, i.e. brick the drives even when minimal data corruption is detected! A brilliantly ironic description for something which is actually trying to protect data, as this has the effect of causing your RAID or Erasure coding data protection to rebuild the drive contents on another drive, therefore ensuring that you don’t end up with corrupt data replicating etc.

Intel actually test this using a particle accelerator at Los Alamos Neutron Science Centre, by firing neutron beams at their drives and checking the data corruption rates! But don’t worry about the poor drives… it’s all over in a flash! 😉

This is genuinely an absolutely fascinating video and well worth spending 45 minutes watching it:

Also for those of you who may notice some snickering and shaking of shoulders going on in the video, it was partly down to the crazy awesomeness of the subject, but also due to some very humorous twitter conversations going on at the same time! I finally understand the meaning of the term corpsing now, having most definitely experienced it during this session! Vinod did an awesome job of putting up with us! 🙂

data corruption cosmic rays.png

Further Info

You can catch the full Intel Session at the link below, which covers other fascinating subjects such as 3D XPoint, NVMe, and SDS – They’re all well worth a watch!

Intel Storage Presents at Storage Field Day 9

Further Reading

Some of the other SFD9 delegates had their own takes on the presentation we saw. Check them out here:

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 9 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

%d bloggers like this: