Archive for Tech Field Day

Tech Field Day 12 (TFD12) – Preview

Tech Field Day 12 (TFD12)

For those people who haven’t heard of Tech Field Day, it’s an awesome event run by the inimitable Stephen Foskett. The event enables tech vendors and real engineers / architects / bloggers (aka delegates) to sit down and have a conversation about their latest products, along with technology and industry trends.

Ever been reading up on a vendor’s website about their technology and had some questions they didn’t answer? One of the roles of the TFD delegates is to ask the questions which help viewers to understand the technology. If you tune in live, you can also post questions via twitter and the delegates, who will happily ask them on your behalf!

As a delegate it’s an awesome experience as you get to spend several days visiting some of the biggest and newest companies in the industry, nerding out with like-minded individuals, and learning as much from the other delegates as you do from the vendors!

So with this in mind, I am very pleased to say that I will be joining the TFD crew for the third time in San Jose, for Tech Field Day 12, from the 15th-16th of November!

Tech Field Day 12 (TFD12) Vendors

As you can see from the list of vendors, there are some truly awesome sessions coming up! Having previously visited Intel and Cohesity, as well as written about StorageOS, it will be great to catch up with them and find out about their latest innovations. DellEMC are going through some massive changes at the moment, so their session should be fascinating. Finally, I haven’t had the pleasure of visiting rubrik, DriveScale or Igneous to date, so should be very interesting indeed!

That said, if there was one vendor I am probably most looking forward to visiting at Tech Field Day 12, it’s Docker! Container adoption is totally changing the way that developers architect and deploy software, and I speak to customers regularly who are now beginning to implement them in anger. It will definitely be interesting to find out about their latest developments.

If you want to tune in live to the sessions, see the following link:
Tech Field Day 12

If for any reason you can’t make it live, have no fear! All of the videos are posted on YouTube and Vimeo within a day or so of the event.

Finally, if you can’t wait for November, pass the time by catching some of the fun and highlights from the last event I attended:

Storage Field Day 9 – Behind the Curtain

VulcanCast Follow Up – A few thoughts on 60TB SSDs

So last week I was kindly invited to share a ride in Marc Farley‘s car (not as dodgy as it sounds, I promise!).

The premise was to discuss the recent announcements around Seagate’s 60TB SSD, Samsung’s 30TB SSD, their potential use cases, and how on earth we can protect the quantities of data which will end up on these monster drives?!

Performance

As we dug into a little in the VulcanCast, many use cases will present themselves for drives of this type, but the biggest challenge is that the IOPS density of the drives not actually very high. On a 60TB drive with 150,000 read IOPS (and my guess but not confirmed is ~100,000 or fewer write IOPS), the average IOPS per GB is actually only a little higher than that of SAS 15K drives. When you start adding deduplication and compression into the mix, if you are able to achieve around 90-150TB per drive, you could easily be looking at IOPS/GB performance approaching smaller 10K SAS devices!

Seagate 60TB SSD Vulcancast flash is fastThe biggest benefit of course if that you achieve this performance in a minuscule footprint by comparison to any current spindle type. Power draw is orders of magnitude lower than 10/15K, and at least (by my estimates) at least 4x lower than using NL-SAS / SATA at peak, and way more at idle. As such, a chunk of the additional cost of using flash for secondary tier workloads, could be soaked up by your space and power savings, especially in high-density environments.

In addition, the consistency of the latency will open up some interesting additional options…

SAS bus speeds could also end up being a challenge. Modern storage arrays often utilise 12GB SAS to interconnect the shelves and disks, which gives you multiple SAS channels over which to transfer data. With over half a PB of usable storage in just a dozen drives, which could be 1PB with compression and dedupe, and that’s a lot of storage to stick on a single channel! In the long term, faster connectivity methods such as NVMe will help, but in the short-term we may even have to see some interesting scenarios with one controller (and channel) for every few drives, just to ensure we don’t saturate bandwidth too easily.

Seagate 60TB SSD Vulcancast Big DataUse Cases

For me, the biggest use cases for this type of drive are going to be secondary storage workloads which require low(ish) latency, a reasonable number of predominantly Read IOPS, and consistent performance even when a little bit bursty. For example:

  • Unstructured data stores, such as file / NAS services where you may access data infrequently, possibly tiered with some faster flash for cache and big write bursts.
  • Media storage for photo and video sites (e.g. facebook, but there are plenty of smaller ones such as Flickr, Photobox, Funky Pigeon, Snapfish, etc. Indeed the same types of organisations we discussed at the Storage Field Day roundtable session on high performance object storage. Obviously one big disadvantage here, would be the inability to dedupe / compress very much as you typically can’t expect high ratios for media content, which then has the effect of pushing up the cost per usable GB.
  • Edge cache nodes for large media streaming services such as NetFlix where maximising capacity and performance in a small footprint to go in other providers data centres is pretty important,whilst being able to provide a consistent performance for many random read requests.

For very large storage use cases, I could easily see these drives replacing 10K and if the price can be brought down sufficiently, for highly dedupable (is that a word?) data types, starting to edge into competing with NL SAS / SATA in a few years.

Data Protection

Here’s where things start to get a little tricky… we are now talking about protecting data at such massive quantities, failure of just two drives within a short period, has the potential to cause the loss of many hundreds of terabytes of data. At the same time, adding additional drives for protection (at tens of thousands of dollars each) comes with a pretty hefty price tag!

Seagate 60TB SSD Vulcancast data protectionUnless you are buying a significant number of drives, the cost of your “N+1”, RAID, erasure coding, etc is going to be so exorbitant, you may as well buy a larger number of small drives so you don’t waste all of that extra capacity. As such, I can’t see many people using these drives in quantities of less than 12-24 per device (or perhaps per RAIN set in a hyper-converged platform), which means even with a conservatively guestimated cost of $30k per drive, you’re looking at the best part of $350-$700k for your disks alone!

Let’s imagine then, the scenario where you have a single failed drive, and 60TB of your data is now hanging in the balance. Would you want to replace that drive in a RAID set, and based on the write rates suggested so far, wait 18-24 hours for it to resync? I would be pretty nervous to do that myself…

In addition, we need to consider the rate of change of the data. Let’s say our datastore consists of 12x60TB drives. We probably have about 550TB or more of usable capacity. Even with a rate of change of just 5%, we need to be capable of backing up 27TB from that single datastore per night just to keep up with the incrementals! If we were to use a traditional backup solution against something like this, to achieve this in a typical 10-hour backup window will generate a consistent 6Gbps, never mind any full backups!

Ok, let’s say we can achieve these kinds of backup rates comfortably. Fine. Now, what happens if we had failure of a shelf, parity group or pool of disks? We’ve probably just lost 250+TB of data (excluding compression or dedupe) which we now need to restore from backup. Unless you are comfortable with an RTO measured in days to weeks, you might find that the restore time for this, even over a 10Gbps network, is not going to meet your business requirements!!!

This leaves us with a conundrum of wondering how we increase the durability of the data against disk failures, and how do we minimise the rebuild time in the event of data media failure, whilst still keeping costs reasonably low.

Seagate 60TB SSD VulcancastToday, the best option seems to me to be the use of Erasure Coding. In the event of the loss of a drive, the data is then automatically rebuilt and redistributed across many or all of the remaining drives within the storage device. Even with say 12-24 drives in a “small” system, this would mean data being rebuilt back up to full protection in 30-60 minutes, instead of 18-24 hours! That said, this assumes the connectivity on the array bus / backplane is capable of handling the kinds of bandwidth generated by the rebuilds, and that this doesn’t have a massive adverse impact on the array processors!

The use of “instant restore” technologies, where you can mount data direct from the backup media to get up and running asap, then move the data transparently in the background also seems to me to be a reasonable mitigation. In order to maintain a decent level of performance, this will likely also drive the use of flash more in the data protection storage tiers as well as production.

The Tekhead Take

Whatever happens, the massive quantities of data we are beginning to see, and the drives we plan to store them on are going to need to lead us to new (as yet, not even invented) forms of data protection. We simply can’t keep up with the rates of growth without them!

VulcanCast

Catch the video here:

The video and full transcript are also available here:
Huge SSDs will force changes to data protection strategies – with @alexgalbraith

Data Corruption – The Silent Killer (aka Cosmic Rays are baaaad mmmkay?)

Minion Assassin

If you have worked in the IT industry for a reasonable amount of time, you have probably heard the term bit rot, referring to the gradual decay of storage media over time, or simply Data Corruption. What I never realised was what one of the primary causes is behind bit rot, and the amount of effort the storage industry goes to prevent it!

At Storage Field Day 9 we attended one of the most genuinely fascinating and enjoyable sessions I have ever seen. It was “proper science”!

Apparently one of the dominant causes of data corruption in SSDs, is in fact something which completely blew my mind when I heard it! Believe it or not, bit rot and data corruption is often caused by cosmic rays!

data corruption rust.png

Cosmic Rays cause Data Corruption!

These cosmic rays are actually protons and other heavy ions which originate from the Sun, or even distant stars! Next thing you know these evil buggers are coming down here, taking our bits and stealing our women! Ok, maybe not the last part, but they’re certainly interacting with other elements in our atmosphere and generating storms of neutrons (we walking flesh bags actually get hit by about 10 of them every second but as we’re not made primarily of silicon, no biggie on the data corruption front!).

These neutrons occasionally also then slam into integrated circuits, and more occasionally still, this causes a bit to flip from a 0 to a 1, or vice versa.

data corruption mind blown.jpgNow a flip of a single bit might not seem like a lot, especially with CRC and other features in modern HDDs, but the cumulative effect or a large number of these flips can lead to corrupt data. Furthermore, corruption of even a single bit of certain data types, such as the vast quantities of DNA data we plan to store in the future, could mean the difference between you being diagnosed with cancer or not!

As such, Intel have introduced a feature within their SSDs which will deliberately brick the drives if they detect too many bit flips / errors! More amusingly, they adopt “aggressive bricking“, i.e. brick the drives even when minimal data corruption is detected! A brilliantly ironic description for something which is actually trying to protect data, as this has the effect of causing your RAID or Erasure coding data protection to rebuild the drive contents on another drive, therefore ensuring that you don’t end up with corrupt data replicating etc.

Intel actually test this using a particle accelerator at Los Alamos Neutron Science Centre, by firing neutron beams at their drives and checking the data corruption rates! But don’t worry about the poor drives… it’s all over in a flash! 😉

This is genuinely an absolutely fascinating video and well worth spending 45 minutes watching it:

Also for those of you who may notice some snickering and shaking of shoulders going on in the video, it was partly down to the crazy awesomeness of the subject, but also due to some very humorous twitter conversations going on at the same time! I finally understand the meaning of the term corpsing now, having most definitely experienced it during this session! Vinod did an awesome job of putting up with us! 🙂

data corruption cosmic rays.png

Further Info

You can catch the full Intel Session at the link below, which covers other fascinating subjects such as 3D XPoint, NVMe, and SDS – They’re all well worth a watch!

Intel Storage Presents at Storage Field Day 9

Further Reading

Some of the other SFD9 delegates had their own takes on the presentation we saw. Check them out here:

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 9 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

Cohesity Announces Cloud Integration Services

With the release of v2.0 of their OASIS platform, as presented as Storage Field Day 9 recently, Cohesity’s development team have continued churn out new features and data services at a significant rate. It seems that they are now accelerating towards the cloud (or should that be The Cloud?) with a raft of cloud integration features announced today!

There are three key new features included as part of this, called CloudArchive, CloudTier and CloudReplicate respectively, all of which pretty much do exactly what it says on the tin!

CloudArchive is a feature which allows you to archive datasets to the cloud (duh!), specifically onto Google Nearline, Azure, and Amazon S3. This would be most useful for things like long term retention of backups without taking up space on your primary platform.

CohesityCloudFeatures.png

CloudTier extends on-premises storage, allowing you to use cloud storage as a cold tier, moving your least used blocks out. If you are like me, you like to understand how these things work down deep in the guts! Mohit Aron, Founder & CEO of Cohesity, kindly provided Tekhead.it with this easy to understand explanation on their file and tiering system:

NFS/SMB files are mapped to objects in our system – which we call blobs. Each blob consists though of small pieces – which we call chunks. Chunks are variable sized – approximately ranging from 8K-16K. The variable size is due to deduplication – we do variable length deduplication.

The storage of the chunks [is] done by a completely different component. We group chunks together into what we call a chunkfile – which is approximately 8MB in size. When we store a chunkfile on-prem, it is a file on Linux. But when we put it in the cloud, it becomes an S3 object.

Chunkfiles are the units of tiering – we’ll move around chunkfiles based on their hotness.

So there you have it folks; chunkfile hotness is the key to Cohesity’s very cool new tiering technology! I love it!

chunkfilehotness

With the chunkfiles set at 8mb this seems like a sensible size for moving large quantities of data back and forth to the cloud with minimal overhead. With a reasonable internet connection in place, it should still be possible to recall a “cool” chunk without too much additional latency, even if your application does require it in a hurry.

You can find out more information about these two services on a new video they have just published to their youtube channel.

The final feature, which is of most interest to me is called CloudReplicate, though this is not yet ready for release and I am keen to find out more as information becomes available. With CloudReplicate, Cohesity has made the bold decision to allow customers to run a software only edition of their solution in your cloud of choice, with native replication from their on premises appliances, paving the way to true hybrid cloud, or even simply providing a very clean DR strategy.

This solution is based on their native on-premises replication technology, and as such will support multiple replication topologies, e.g. 1-to-many, many-to-1, many-to-many, etc, providing numerous simple or complex DR and replication strategies to meet multiple use cases.

Cohesity-CloudReplicate.png

It could be argued that the new solution potentially provides their customers with an easy onramp to the cloud in a few years… I would say that anyone making an investment in Cohesity today is likely to continue to use their products for some time, and between now and then Cohesity will have the time to significantly grow their customer base and market share, even if it means enabling a few customers to move away from on-prem down the line.

I have to say that once again Cohesity have impressed with their vision and speedy development efforts. If they can back this with increase sales to match, their future certainly looks rosy!

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 9 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

%d bloggers like this: