Tag Archive for distributed

Scale-Out Doesn’t Just Mean Applications

Scale Out

A couple of months ago I wrote a post entitled Scale-Out. Distributed. Whatever the Name, it’s the Future of Computing.

Taking the concept a step further, I recently started thinking about other elements in IT which are moving in that direction; not just applications and storage, but underlying infrastructure and management elements too.

Then it dawned on me that this really is not a new thing… we’ve been taking this approach for years! Technologies like VMware vSphere, have enabled us to become trusting, almost presumptuous, that we can add resources as we need them; increasing the shared pool transparently and enabling us to continue to service requirements, whilst eliminating downtime. (You can even use them to scale up on-the-fly if you really have to!)

The current breed of infrastructure engineers and startups have grown up in this era and the great thing is that this has now become part of their DNA! Typically, no longer are solutions designed from scratch to be scale-up in nature; hitting some artificial limit in capacity or having to scale specific elements of a solution to avoid nasty bottlenecks.

Instead, infrastructure is being designed to scale-out natively; distributed architectures, balancing workloads and metadata evenly across platforms. This has the added benefit, of course, of making them more resilient to failure of individual components.Distributed Systems

Backup isn’t Sexy, but it’s Necessary

One great example of this new architecture paradigm (drink!), is Rubrik, a startup in the backup space who we met at Tech Field Day 12. Their home-grown distributed file system, distributed metadata, built in off-site replication and global namespace, provide a massively scalable and resilient backup system.

All of the roles from a traditional backup solution (such as backup proxies/media servers/metadata servers, etc) are now rolled into a single, scale-out platform. As I seem to find myself saying more and more often these days, KISS personified!kiss - Keep it simple stupid EFS

With shrinking IT teams, I commonly find that companies are willing to trade budget for time savings. Utilising a simple, policy-driven management interface and enabling off-site replication to be done over-the-wire, has a lot of benefits to operational time!

As an added bonus, it can even replicate out to S3, Blob and NFS targets, to give even more options for off-site replication. Of course, a big fat pipe to the internet will cost you more each month; though you’re probably investing in that anyway, to meet your employee’s peak lunchtime demand for facebook and youtube! 🙂

Much like any complex machine, under the hood, Rubrik is pretty impressive. There is a masterless cluster management solution, multi-tier flash and disk for performance, and a clever redirect-on-write snapshot chain algorithm, which minimises capacity utilisation whilst providing very granular restores.

The key thing here, though, is we don’t really care; we are a consumer society who just wants things to work, as we have more exciting things than backup to worry about!

rubrik

TLDR;

We have enough complexity in IT these days without having to worry about backup. I would say that the simple to manage, scale-out solution from Rubrik is certainly worth considering as part of any PoC or RFP! 🙂

Further Info

You can catch the full Rubrik session at the link below:
Rubrik Presents at Tech Field Day 12

Further Reading

Some of the other TFD delegates had their own takes on the presentation we saw. Check them out here:

Disclaimer: My flights, accommodation, meals, etc at Tech Field Day 12 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services.

Now that’s what I call… Tech Predictions 2017

predictions

At this time of year, it is customary to look back at the past 12 months and make some random or not-so-random guesses as to what will happen over the coming 12. As such, what could be more fitting for my final post of 2016?!

Here’s a few of my personal best, worst, and easy guess candidates for 2017…

Tekhead Predictable Tech Predictions 2017

Easy Guesses

Come on Alex, even Penfold could have predicted these!

  • AWS will continue to dominate the cloud market, though the rate at which they deploy new features will start to slow (over 1000 a year is pretty unsustainable!). Their revenues will continue to grow at gangbuster rates, however their market share will be slightly eroded as people experiment more with their competitors too.
  • Microsoft Azure will grow massively (not quite 100% but not far off it). Their main growth will probably be in hosting enterprises and typical line of business applications as people move their legacy junk into the cloud. The recent announcements of the Single Instance VM SLA of 99.9% will definitely accelerate this as customers will feel less include to refactor their applications for cloud.
  • Distributed everything!
  • Docker will start to become more mainstream production and less Dev/Test.
  • Google will kill off at least one popular service with multiple millions of users.
  • The homelab market will reduce as people do more and more of their studying in the cloud.
  • Podcasting will become the new blogging (if it hasn’t already!)
  • DellEMC will continue to hack off bits of its anatomy to pay back that cheeky little $67Bn debt.
  • I continue to use memes as a crutch to make my otherwise lifeless articles marginally more interesting!obvious
Best Guesses

Its on the cards… maybe?

  • Google will continue to be ignored by most enterprises for Cloud IaaS. They will gain some reasonable growth in the web application space after another mass marketing activity to developers, ISVs and hosters.
  • Oracle grows Cloud revenues 50% or more but market share remains small. Their growth is mainly driven by IaaS revenue as customers begin to move their workloads to be closer to their data in the Oracle PaaS and SaaS services.
  • There will be no major storage company IPO in 2017, i.e. over $200m.
  • Many storage startups will run out of funding and die on the vine (depressing I know!). Their IP will be snapped up by the old guard storage companies in the proceeding fire sales…
    fire-sale
  • 3D XPoint will begin to creep into storage arrays by the end of the year, fuelling another storage VC funding bubble for at least another 12 months for any company who claims to have an innovative way to use it.
  • A major cloud provider suffers a global outage.
Worst Guesses

These probably won’t happen, but if any of them do, I’ll claim smugly that I knew they were always going to!

  • Pure Storage will make an acquisition of a storage startup to create their third product line, perhaps a secondary storage company (i.e. not just all flash) along the lines of Cohesity.
  • Cisco will buy a storage company. They will be more successful at integrating it than they were with Whiptail! (Which wouldn’t be difficult… 😮 )
  • Spanning a single application over multiple clouds becomes a real possibility, as one or more startups come out of stealth to provide innovative ways to span clouds. Nobody buys into it, except maybe for DR.
  • Tekhead.it becomes the most read blog in the world in 2017
  • Cats take over the planet and dogs are forced to form a rebel alliance which is ultimately victorious when a chihuahua takes out the entire cat leadership in one go, with a stolen reaper drone.Cats vs Dogs
  • Jonah Hill wins Strictly Come Dancing, narrowly defeating Frankie Boyle and Charlie Brooker in the final.
And finally…

Here’s wishing you all an awesome, fun and prosperous 2017!

Scale-Out. Distributed. Whatever the Name, it’s the Future of Computing

Scale Out

We are currently living in the fastest period of innovation in the technology space which there has probably ever been. New companies spring up every week with new ideas, some good, some bad, some just plain awesome and unexpected!

One of the most common trends I have seen in this however was described in a book I read recently, “The Second Machine Age” by Erik Brynjolfsson & Andrew McAfee. This trend is that the majority of new ideas are (more often than not) unique recombinations of old ones.

Take for example the iPhone. It was not the first smart phone. It was not the first mobile phone, the first touch screen, or the first device to run installable apps. However, Apple recombined an existing set of technologies into a very compelling product.

We also reached a point a while back where clock speeds of CPUs are no longer increasing, and even CPUs are scaling horizontally. Workloads are therefore typically being designed to scale horizontally instead of vertically, taking advantage of the increased compute resources available whilst avoid being locked to vertically scaling clock speeds.

Finally, another trend we have seen in the industry of late is inexpensive and low power CPUs from ARM, being used in all sorts of weird and wonderful places; often providing solutions to problems we didn’t even know we had. Up until now, their place has generally been confined outside of the data centre. I am, however, aware of a number of companies now working on bringing them to the enterprise in a big way!

So, in this context of recombination, imagine then if you could provide a scale-out storage architecture where every single spindle had its own compute directly attached. Then combine many of these “nano-servers” together in a scale-out JBOD form factor on subscription pricing, all managed from a Meraki-style cloud portal… well that’s exactly what Igneous Systems have designed!

Igneous Systems Nano-Servers

One of the coolest things about scaling out like this, is that instead of a small number of large fault domains based around controllers, you actually end up with many tiny fault domains instead. The loss of any one controller or drive is basically negligible within the system and replacements can be sorted at the convenience of the administrators, rather than panicking about replacement of components asap. Igneous claim that you can also scale fairly linearly, avoiding the traditional bottlenecks of a dual controller (or similar) system. It will be interesting to see some performance benchmarks as they become available!

It’s still early days, so they are doing code deployments at some pretty high rates, around every 2 weeks, and to be honest I think there is a bit of work to be done around clarity of their SLAs, but in general it looks like a very interesting platform, particularly when pricing is claimed to be as low as half the price of Amazon S3.

Now as you might expect from a massively distributed solution, the entry point is not small, typically procured in 212TiB chunks, so don’t expect to use it for your SMB home drives! If however you have petabyte-scale data volumes and are looking for an on-prem(ises!) S3 compatible datastore, then its certainly worth looking at Igneous.

The future in the scale-out space is certainly bright, now if only I could get people to refactor their single-threaded applications!… 🙂

Further Info

You can catch the full Igneous session at the link below – it certainly was unexpected and interesting, for sure!

Igneous Systems Presents at Tech Field Day 12

Further Reading

Some of the other TFD delegates had their own takes on the presentation we saw. Check them out here:

Disclaimer: My flights, accommodation, meals, etc at Tech Field Day 12 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services.

Secondary can be just as important as Primary

There can be little doubt these days, that the future of the storage industry for primary transactional workloads is All Flash. Finito, that ship has sailed, the door is closed, the game is over, [Insert your preferred analogy here].

Now I can talk about the awesomeness of All Flash until the cows come home, but the truth is that flash is not now, and may never be as inexpensive for bulk storage as spinning rust! I say may as technologies like 3D NAND are changing the economics for flash systems. Either way, I think it will still be a long time before an 8TB flash device is cheaper than 8TB of spindle. This is especially true for storing content which does not easily dedupe or compress, such as the two key types of unstructured data which are exponentially driving global storage capacities through the roof year on year; images and video.

With that in mind, what do we do with all of our secondary data? It is still critical to our businesses from a durability and often availability standpoint, but it doesn’t usually have the same performance characteristics as primary storage. Typically it’s also the data which consumes the vast majority of our capacity!

AFA Backups

Accounting needs to hold onto at leat 7 years of their data, nobody in the world ever really deletes emails these days (whether you realise or not, your sysadmin is probably archiving all of yours in case you do something naughty, tut tut!), and woe betide you if you try to delete any of the old marketing content which has been filling up your arrays for years! A number of my customers are also seeing this data growing at exponential rates, often far exceeding business forecasts.

Looking at the secondary storage market from my personal perspective, I would probably break it down into a few broad groups of requirements:

  • Lower performance “primary” data
  • Dev/test data
  • Backup and archive data

As planning for capacity is becoming harder, and business needs are changing almost by the day, I am definitely leaning more towards scale-out solutions for all three of these use cases nowadays. Upfront costs are reduced and I have the ability to pay as I grow, whilst increasing performance linearly with capacity. To me, this is a key for any secondary storage platform.

One of the vendors we visited at SFD8, Cohesity, actually targets both of these workload types with their solution, and I believe they are a prime example of where the non-AFA part of the storage industry will move in the long term.

The company came out of stealth last summer and was founded by Mohit Aron, a rather clever chap with a background in distributed file systems. Part of the team who wrote the Google File System, he went on to co-found Nutanix as well, so his CV doesn’t read too bad at all!

Their scale-out solution utilises the now ubiquitous 2u, 4-node rack appliance physical model, with 96TB of HDDs and a quite reasonable 6TB of SSD, for which you can expect to pay an all-in price of about $80-100k after discount. It can all be managed via the console, or a REST API.

Cohesity CS2000 Series

2u or not 2u? That is the question…

That stuff is all a bit blah blah blah though of course! What really interested me is that Cohesity aim to make their platform infinitely and incrementally scalable; quite a bold vision and statement indeed! They do some very clever work around distributing data across their system, whilst achieving a shared-nothing architecture with a strongly consistent (as opposed to eventually consistent), 2-phase commit file system. Performance is achieved by first caching data on the SSD tier, then de-staging this sequentially to HDD.

I suspect the solution being infinitely scalable will be difficult to achieve, if only because you will almost certainly end up bottlenecking at the networking tier (cue boos and jeers from my wet string-loving colleagues). In reality most customers don’t need infinite as this just creates one massive fault domain. Perhaps a better aim would be to be able to scale massively, but cluster into large pods (perhaps by layer 2 domain) and be able to intelligently spread or replicate data across these fault domains for customers with extreme durability requirements?

Lastly they have a load of built-in data protection features in the initial release, including instant restore, and file level restore which is achieved by cracking open VMDKs for you and extracting the data you need. Mature features, such as SQL or Exchange object level integration, will come later.

Cohesity Architecture

Cohesity Architecture

As you might have guessed, Cohesity’s initial release appeared to be just that; an early release with a reasonable number of features on day one. Not yet the polished article, but plenty of potential! They have already begun to build on this with the second release of their OASIS software (Open Architecture for Scalable Intelligent Storage), and I am pleased to say that next week we get to go back and visit Cohesity at Storage Field Day 9 to discuss all of the new bells and whistles!

Watch this space! 🙂

To catch the presentations from Cohesity as SFD8, you can find them here:
http://techfieldday.com/companies/cohesity/

Further Reading
I would say that more than any other session at SFD8, the Cohesity session generated quite a bit of debate and interest among the guys. Check out some of their posts here:

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 8 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

%d bloggers like this: