I Like Big Files and I Cannot Lie

You other vendors, can’t deny,
When an array walks in with an itty bitty waste [-ed capacity],
And many spindles in your face
You get sprung, want to pull up tough,
‘Cause you notice that storage was stuffed!

Ok… I’ll stop now! I’m just a bit sad and always wanted an excuse to to use that as a post opener! 🙂

There is a certain, quite specific type of customer whose main requirements revolve around the storage of large data sets consisting of thousands to millions of huge files. Think media / TV / movie companies, video surveillance or even PACS imaging and genomic sequencing. Ultimately we’re talking petabyte-scale capacities – more than your average enterprise needs to worry about!

How you approach storage of this type of data is worlds apart from your average solution!

The Challenges of “Chunky” Data

Typical challenges involve having multiple silos of your data across multiple locations, with different performance and workload characteristics. Then you have different storage protocols for different applications or phases in their data processing and delivery. Each of those silos then requires different skills to manage, and different capacity management regimes.

Sir Mixalot likes big files

On top of that, for the same reason as we moved away from parity groups in arrays to wide striping, these silos then have IO and networking hotspots, wasted capacity (sometimes referred to as trapped white space) and wasted performance, which cannot be shared across multiple systems.

Finally (and arguably most importantly), how do you ensure the integrity, resilience, and durability of this data, as by its very nature, it typically requires long-term retention?

Ideal Solution

What you really need is a single storage system which can not only scale to multi-petabyte capacities with multiple protocols, but is reasonably easy to manage, even with a high admin to capacity ratio.

You then need to ensure that data can also be protected against accidental, or malicious file modification or deletion.

Finally, you need the system to be able to replicate additional copies to remote sites, as backing up petabytes of data is simply unrealistic! Similarly, you may want multiple replicas or additional pools outside of your central repository which all replicate back to the mothership, for example for ROBO or multi-site solutions where editing large files needs to be done locally.

As my good friend Josh De Jong said recently:

Of course, the biggest drawback of using this approach is that you have one giant failure domain. If something somehow manages to proverbially poison your “data lake”, that’s a hell of a lot of data to lose in one go!

DellEMC Isilon

During our recent Tech Field Day 12 session at DellEMC, I was really interested to see how the DellEMC Isilon scale-out NAS system was capable of meeting many of these requirements, especially as this is a product which can trace its heritage all the way back to 2001! In fact, their average customer on Isilon is around 1PB in size, and their largest customer is using 144PB! Scalability, check!

The Isilon team also confirmed that around 70% of their 8,000+ customers trust the solution sufficiently to not use any external backup solution, trusting in SnapshotIQ, SyncIQ and in some cases SmartLock, to protect their data. That’s a pretty significant number!

One thing I am not so keen on with the Isilon (and to be fair, many other “traditional” /  old guard storage vendor offerings) is the complexity and breadth of the licensing; almost all of the interesting features each have to have their own license. If the main benefit to the data lake is simplicity, then I would far rather have a single price with perhaps one or two uplift options for licenses, than an a la carte menu.

In addition, the limit of 50 security domains provides some flexibility for service providers, but then limits the size of your “data lake” to 50 customers. It would be great to see this limit increased in future.Data Lake

The Tekhead Take

Organisations looking to retain data in these quantities need to weigh up the relative risks of using a single system for all storage, versus the costs of and complexity of multiple silos. Ultimately it is down to each individual organisation to work out what closest matches their requirements, but for the convenience of a single large repository of all of your data, the DellEMC Islion still remains a really interesting proposition.

Further Info

You can catch the full Isilon session at the link below:
Dell EMC Presents at Tech Field Day 12

Further Reading

Some of the other TFD delegates had their own takes on the presentation we saw. Check them out here:

Disclaimer: My flights, accommodation, meals, etc at Tech Field Day 12 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services.

If you use Flash Storage, do NOT miss this Storage Unpacked Podcast!

Silicon Wafer

Just a very quick post today…

I don’t often write blog posts either responding to, or recommending things, as I usually just fire them out on Twitter or LinkedIn.

That said, I enjoyed this particular podcast episode so much, I felt I needed to share it with my [two] subscribers via a post!

 

Storage Unpacked

I have been listing with great interest to Chris Evans and Martin Glassborow’s new Storage Unpacked podcast for the past couple of months. If you are either a storage afficionado, or even a generalist, i highly recommend it.

The particular episode I am referring to is Talking NAND Flash with Jim Handy, specifically Part One, released 6th Jan this year.

As someone who follows the storage industry with interest, this episode was a great insight as to the history and decisions which have led us to where we are with flash storage today, as well as some fascinating facts and figures.

I don’t want to preempt or spoil the episode, but for example did you know:

  • Nanometer scale productions actually mean working at the scale of millionths of a millimeter
  • Fabrication plants cost around $8 Billion each to make, and the machines involved in creating the chips cost about $100 million each!
  • It takes 3 months to create a single chip from the raw materials!

These, and other interesting things can be found on the episode, below – I highly recommend you check it out (and dont forget to subscribe to their podcast)!

Talking NAND Flash with Jim Handy – Part One

Further Info

I listen to about 20+ podcasts on a regular basis (the one and only good thing about commuting every day to the office!). I need to do an updated article on them, but in the mean time, here is a list of some of my recommendations:

The Tekhead Top 10 Tech Podcast Perfect Playlist Picks

Juxtaposition Time! Join us at London VMUG to talk AWS! – January 19th 2017

Don’t panic, you’re not imagining things! You did indeed read that title right! If everything goes to plan, Chris Porter and I will be taking the January 2017 London VMUG to a whole new place, with a session on AWS!

Yes, that’s an AWS session at a VMUG! 😮

Why?

For those people who have been living in a bunker on the isle of Arran for the past few years, AWS has been taking the IT industry by storm. So much so, at VMworld 2016, VMware announced their new product “VMware Cloud on AWS“!

Whatever the reasons that VMware have decided to do this (and I’m not going to go into my opinions of that right now), it leaves VMware admins in a position where even if they aren’t already doing some AWS today, the likelihood of them doing so in the near future has just jumped by an order of magnitude!

Meanwhile in a parallel universe...

Meanwhile in a parallel universe…

What’s the session about then?

The session is planned to be a quick intro on the key features of AWS, some tips on how to learn more and get certified, as well as some of Chris and my experiences of working with and designing for AWS (which is rather different to doing things in VMware, for sure!).

Hopefully it should be a pretty interesting session, especially if you haven’t had much exposure to AWS yet!

What else can you see at the event?

As always, there will be many awesome speakers at the London VMUG event. Ricky El-Quasem is even doing two by himself!

There will also be a load of other sessions, so check out the agenda below:

LonVMUG Jan 2017Wrapping up the event there will also be the eponymous vBeers event at the Old Bank of England (194 Fleet St, London EC4A 2LT), so make sure you hang around after and join us for what is often the best part of the day!

Lastly, thanks very much to the LonVMUG sponsors, Rubrik, iLand and Stormagic, without whom it would not be possible to hold these events!

I’m in! How do I register?

You can register for the event at the London VMUG workspace here:

LonVMUG January 2017 Registration

The location is techUK, 10 St Bride St, London, EC4A, which is pretty easy to get to via your preferred public transport methods, though coming in via Waterloo I generally find the bus to be fastest…

If you do see me on the day (I’m 6’7” so you cant miss me), please do come and say hi! 🙂

Top 10 Tekhead Posts of 2016

I’m pleased to say that I upped my game somewhat over the past year, managing to churn out 62 posts in 2016, more than double the 28 posts I produced in 2015!

There were a few other interesting trends over the previous year. The balance between VMware and other subjects has definitely shifted for me, where for example, I wrote well over a dozen posts on AWS.

I guess this is probably representative of both my recent role change, as well as the shift in my customers from being 90%+ VMware houses, to a broad mix of different cloud platforms, both public (AWS / Azure) and private (VMware / OpenStack).

This trend is only going to accelerate in the future, and I suggest Scott Lowe’s Full Stack Journey podcast would be well worth your time subscribing to for great information on how to avoid being left behind as our industry morphs over the coming years!

thecloud

It’s worth noting that this trend is also mirrored in the top 5 articles alone, which include popular newer technologies such as Docker and AWS. That said, it’s great to see the Intel NUC Nanolab series is still as popular as ever, and people are obviously still keeping their vSphere skills and certs up to date, based on the VCP delta study guide popularity.

You may also have noticed that I have been a little quieter of late. The main reasons for this have been down to starting my new role earlier this year, studying for exams, plus a number of other projects I’ve been involved in (such as the Open TechCast podcast). Hopefully I can find a little more balance between them all in 2017, though I already have a couple of podcasts, a VMUG presentation, and a possible exam lined up for January so I’m not really helping myself on that front!

Tekhead Post Stats 2016

So, enough jibber jabbing! Here follows the top 10 most popular posts of the past 12 months.

Tekhead Top 10 Posts of 2016
  1. My Synology DSM Blue LED issue was actually just a failed drive!
  2. Installing Docker on Ubuntu Quick Fix
  3. NanoLab – Running VMware vSphere on Intel NUC – Part 1
  4. Fix for VMware Remote Console unrecoverable error: (vmrc)
  5. AWS Certified Solutions Architect Associate Exam Study Guide & Resources
  6. VCP6-DCV Delta Exam (2V0-621D) Study Guide and Exam Experience
  7. NetApp – Is this the dawn of a new day?
  8. NanoLab – Part 10 – Your NUCs are nice and cool, but what about your stick?
  9. Index of Tekhead.it Blog Posts on Amazon AWS
  10. Quick Fix for “The task was canceled by a user” when deploying OVA in vCenter 6

Something Mike Preston and I discussed on our recent Open TechCast podcast episode, was how it can be a little frustrating as a blogger that often an opinion piece which took ages to write and edit will get a small number of views, whilst a quick tip which took a couple of minutes to jot down, might get thousands or even tens of thousands over time!

Gladly, my top 10 this year includes both types, so my time wasnt completely wasted! 🙂

Anyway thats enough from me for now; all the best for 2017 folks!

%d bloggers like this: