Tag Archive for Backups

Downtime sucks! Designing Highly Available Applications on a Budget

HA Minions

Downtime sucks.

I write this whilst sitting in an airport lounge, having been disembarked from my plane due to a technical fault. I don’t really begrudge the airline in question; it was a plumbing issue! This is a physical failure of the aircraft in question and just one of those things (unless I find out later they didn’t do the appropriate preventative maintenance of course)! Sometimes failures just happen and I would far rather it was just a plumbing issue, not an engine issue!

What is not excusable, however, is if the downtime is easily preventable; for example, if you are designing a solution which has no resilience!

This is obviously more common with small and medium sized businesses, but even large organisations can be guilty of it! I have had many conversations in the past with companies who have architected their solutions with significant single points of failure. More often than not, this is due to the cost of providing an HA stack. I fully appreciate that most IT departments are not swimming in cash, but there are many ways around a budgetary constraint and still provide more highly available, or at least “Disaster Resistant” solutions, especially in the cloud!HA Austin Powers Meme

Now obviously there is High Availability (typically within a single region or Data Centre), and Disaster Recovery (across DCs or regions). An ideal solution would achieve both, but for many organisations it can be a choice between one and the other!

Budgets are tight, what can we do?

Typically HA can be provided at either the application level (preferred), or if not, then at the infrastructure level. Many solutions to improvise availability are relatively simple and inexpensive. For example:

  • Building on a public cloud platform (and assuming that the application supports load balancing), why not test running twice as many instances with half the specification each? In most cases, unless there are significant storage quantities in each instance, the cost of scaling out this way is minimal.
    If there is a single instance, split it out into two instances, immediately doubling your availability. If there are two instances, what about splitting into 4? The impact of a node loss is then only 25% of the overall throughput capacity for the application, and can even bring down the cost of HA for applications where the +1 in N+1 is expensive!
  • Again in cloud, if there are more than two availability zones in a region (e.g. on AWS), then take advantage of them! If an application can handle 2 AZs, then the latency of adding a third shouldn’t make much, if any difference, and costs will only increase slightly with a small amount of extra inter-AZ bandwidth or per-AZ services (e.g NAT gateways).
    Again, in this scenario the loss of an AZ will only take out 33% of the application servers, not 50%, so it is possible to reduce the number of servers which are effectively there for failover only.
  • If you can’t afford to run an application as multi-AZ or multi-node, consider putting it in an auto-scaling group or scale-set with a minimum and maximum of 1 server. That way if an outage occurs or int he case of AWS, an entire AZ goes down, an instance will automatically be regenerated in an alternative AZ.HA Oliver
What if my app doesn’t like load balancers?

If you have an application which cannot be load balanced, you probably shouldn’t be thinking about running it in the cloud (not if you have any serious availability requirements anyway!). It amazes me how many business critical applications and services are still running in single servers all over the world!

  • If your organisation is dead set on using cloud for a SPoF app, then making it as ephemeral as possible can help. Start by splitting the DBs from the apps, as these can almost always be made HA by some means (e.g. master/slave replication, mirroring, log shipping, etc). Failover nodes also often don’t attract a license fee from many vendors (e.g. MS SQL), so always check your license documentation to see what you can achieve on the cheap.
  • Automate! If you can deploy application server(s) from a script, even if the worst happens, the application can be redeployed very quickly, in a consistent fashion.
    The trend at the moment is moving towards a more agile deployment process and automated CI/CD pipelines. This enables companies to recover from an outage by rebuilding their environments and redeploying code rapidly (as long as they have a replica of the data or a highly available datastore!).
  • If it’s not possible to script or image the code deployment, then taking regular backups (and snapshots where possible) of application servers, and testing them often is an option! If you don’t want to go through the inflexible, unreliable and painful nightmare of doing system state restores, then take image-based backups (supported by the vast majority of backup vendors nowadays). Perhaps even syncing of application data to a warm standby server which can be brought online reasonably swiftly, or even use an inexpensive DR service such as Azure Site Recovery, to provide an avenue of last resort!
  • If maybe cloud isn’t the best place to locate your application, then provide HA at the infrastructure layer by utilising the HA features of your favourite hypervisor!
    For example, VMware vSphere will have an instance back up and running within a minute or two of the failure of a host using the vSphere HA feature (which comes with every edition except Essentials!). On the assumption/risk that the power cycle does not corrupt OS, applications or data, you minimise exposure to hardware outages.
  • If the budget is not enough to buy shared storage and all VMs are running on local storage in the hypervisor hosts (I have seen this more than you might imagine!), then consider using something like vSphere Replication or Hyper-V Replicas to copy at least one of each critical VM role to another host, and if there are multiple instances, then spread them around the hosts.

Finally, make sure whatever happens there is some form of DR, even if it is no more than a holding page or application notification and a replica or off-site backup of critical data! Customers and users would rather see something telling them that you’re working to resolve the problem, than getting a spinning wheel and a timeout! If you can provide something which is of limited functionality or performance, then it’s better than nothing!

HA ServersTLDR; High Availability on a Budget

There are a million and one ways to provide more highly available applications; these are just a few. The point is that providing highly available applications is not as expensive as you might initially think.

With a bit of elbow grease, a bit of scripting and regular testing, even on the smallest budgets you can cobble together more highly available solutions for even the crummiest applications! 🙂

Now go forth and HA!

Secondary can be just as important as Primary

There can be little doubt these days, that the future of the storage industry for primary transactional workloads is All Flash. Finito, that ship has sailed, the door is closed, the game is over, [Insert your preferred analogy here].

Now I can talk about the awesomeness of All Flash until the cows come home, but the truth is that flash is not now, and may never be as inexpensive for bulk storage as spinning rust! I say may as technologies like 3D NAND are changing the economics for flash systems. Either way, I think it will still be a long time before an 8TB flash device is cheaper than 8TB of spindle. This is especially true for storing content which does not easily dedupe or compress, such as the two key types of unstructured data which are exponentially driving global storage capacities through the roof year on year; images and video.

With that in mind, what do we do with all of our secondary data? It is still critical to our businesses from a durability and often availability standpoint, but it doesn’t usually have the same performance characteristics as primary storage. Typically it’s also the data which consumes the vast majority of our capacity!

AFA Backups

Accounting needs to hold onto at leat 7 years of their data, nobody in the world ever really deletes emails these days (whether you realise or not, your sysadmin is probably archiving all of yours in case you do something naughty, tut tut!), and woe betide you if you try to delete any of the old marketing content which has been filling up your arrays for years! A number of my customers are also seeing this data growing at exponential rates, often far exceeding business forecasts.

Looking at the secondary storage market from my personal perspective, I would probably break it down into a few broad groups of requirements:

  • Lower performance “primary” data
  • Dev/test data
  • Backup and archive data

As planning for capacity is becoming harder, and business needs are changing almost by the day, I am definitely leaning more towards scale-out solutions for all three of these use cases nowadays. Upfront costs are reduced and I have the ability to pay as I grow, whilst increasing performance linearly with capacity. To me, this is a key for any secondary storage platform.

One of the vendors we visited at SFD8, Cohesity, actually targets both of these workload types with their solution, and I believe they are a prime example of where the non-AFA part of the storage industry will move in the long term.

The company came out of stealth last summer and was founded by Mohit Aron, a rather clever chap with a background in distributed file systems. Part of the team who wrote the Google File System, he went on to co-found Nutanix as well, so his CV doesn’t read too bad at all!

Their scale-out solution utilises the now ubiquitous 2u, 4-node rack appliance physical model, with 96TB of HDDs and a quite reasonable 6TB of SSD, for which you can expect to pay an all-in price of about $80-100k after discount. It can all be managed via the console, or a REST API.

Cohesity CS2000 Series

2u or not 2u? That is the question…

That stuff is all a bit blah blah blah though of course! What really interested me is that Cohesity aim to make their platform infinitely and incrementally scalable; quite a bold vision and statement indeed! They do some very clever work around distributing data across their system, whilst achieving a shared-nothing architecture with a strongly consistent (as opposed to eventually consistent), 2-phase commit file system. Performance is achieved by first caching data on the SSD tier, then de-staging this sequentially to HDD.

I suspect the solution being infinitely scalable will be difficult to achieve, if only because you will almost certainly end up bottlenecking at the networking tier (cue boos and jeers from my wet string-loving colleagues). In reality most customers don’t need infinite as this just creates one massive fault domain. Perhaps a better aim would be to be able to scale massively, but cluster into large pods (perhaps by layer 2 domain) and be able to intelligently spread or replicate data across these fault domains for customers with extreme durability requirements?

Lastly they have a load of built-in data protection features in the initial release, including instant restore, and file level restore which is achieved by cracking open VMDKs for you and extracting the data you need. Mature features, such as SQL or Exchange object level integration, will come later.

Cohesity Architecture

Cohesity Architecture

As you might have guessed, Cohesity’s initial release appeared to be just that; an early release with a reasonable number of features on day one. Not yet the polished article, but plenty of potential! They have already begun to build on this with the second release of their OASIS software (Open Architecture for Scalable Intelligent Storage), and I am pleased to say that next week we get to go back and visit Cohesity at Storage Field Day 9 to discuss all of the new bells and whistles!

Watch this space! 🙂

To catch the presentations from Cohesity as SFD8, you can find them here:

Further Reading
I would say that more than any other session at SFD8, the Cohesity session generated quite a bit of debate and interest among the guys. Check out some of their posts here:

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 8 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

Why are storage snapshots so painful?

Have you ever wondered why we don’t use snapshots more often than about every 5-15 minutes in most solutions, and in many others, a lot less often than that?

It’s pretty simple to be honest… The biggest problem with taking snapshots is quiescing the data stream to complete the activity. At a LUN level, this usually involves some form of locking mechanism to pause all IO while any metadata updates or data redirections are made, after which the IO is resumed.

For small machines and LUNs with minimal IO load this is generally such a quick operation that it has virtually no effect on the application user, and is pretty much transparent. For busy applications, however, data can be changing at such a massive rate that disrupting that IO stream, even for a few seconds can have a significant impact on performance and user experience. In addition the larger the number of snapshots in the snap tree, the more that performance is often degraded through the management of large numbers of snapshots, copy on write activities, and, of course, lots of locking.

This problem is then multiplied several times over when you want to get consistency across multiple machines, for example when you want to get point-in-time consistency for an entire application stack (Web / App / DB, etc).

So what do we typically do? We reduce the regularity at which we take these snaps in order to minimise the impact, whilst still having to meet the (usually near zero because all data is critical, right?) RPO set by the business.

At SFD8, we had a very well received presentation from INFINIDAT, a storage startup based in Israel and founded by industry legend Moshe Yanai (the guy who brought you EMC Symmetrix / VMAX, and subsequently XIV). Moshe’s “third generation” enterprise class storage system comes with one particular feature with which I was really interested; snapshots! Yes, I know it sounds like a boring “checkbox in an RFP” feature, but when I found out how it worked I was really impressed.

For every single write stripe which goes to disk, a checksum and a timestamp (from a high precision clock) are written. This forms the base on which the snapshot system is built (something they call InfiniSnap™).

If you have a micro-second accurate clock and timestamps on every write, then in order to achieve a snapshot you simply have to pick a date and time! Anything written earlier than this is not included in the current snap, and anything on or after the time is. This means no locking or pausing of IO during a snap, making the entire process a near zero time and a zero impact operation! A volume with or without snapshots, therefore has indistinguishable performance. Wow!

Screen Shot 2015-12-13 at 20.55.19

It sounds so simple it shouldn’t work, but according to INFINIDAT they can easily support up to 100,000 snaps per system, and even this isn’t even a real number. They made it up as it was a double figure percentage bigger than the next closest array on the market. They will also happily support more than this if you ask, they said that they just need to test it first. In addition, each snap group will support up to 25 snaps per second, and they guarantee an RPO of as low as 4 seconds, based on snapshots alone. You can then use point in time replication to create an asynchronous copy on another array if needed. Now that’s granular! 🙂

The one caveat I would add to this is that this does not yet appear to have a fix for ye old faithful crash consistent vs application consistent issue, but it’s a great start. Going back to the application stack “consistency group” concept, in theory, you generally only need to VSS the database VM, and as such it will be much easier and simpler to have a consistent snap across an app stack with minimal overhead. As we move more towards applications using No-SQL databases etc, this will also become less of an issue.

The above was just one of the cool features they covered in their presentation, from which the general consensus was very positive indeed! A couple of weeks ago I was also able to spend a little time with one of INFINIDAT’s customers who just so happened to be attending the same UKVMUG event. Their impressions in terms of the quality of the array build (with a claimed 99.99999% availability!), the management interface, general performance during initial testing, the compelling pricing, and of course, their very funky matrix-like chassis, were all very positive too.

If you want to see the INFINIDAT presentation from SFD8, make sure you have your thinking hat on and a large jug of coffee! Their very passionate CTO, Brian Carmody, was a very compelling speaker and was more than happy to get stuck into the detail of how the technology works. I definitely felt that I came away a little smarter having been a part of the audience! He also goes into some fascinating detail about genome sequencing, the concept of cost per genome and it’s likely massive impact on the storage industry and our lives in general! The video is worth a watch for this section alone…

Further Reading
Some of the other SFD8 delegates have their own takes on the presentation we saw. Check them out here:

Dan FrithINFINIDAT – What exactly is a “Moshe v3.0”?
Enrico Signoretti’s blog Juku.itInfinidat: awesome tech, great execution
Enrico Signoretti writing on El RegHas the next generation of monolithic storage arrived?
Ray LucchesiMobile devices as a cache for cloud data
Vipin V.K. – Infinibox – Enterprise storage solution from Infinidat
GreyBeards on Storage Podcast – Interview with Brian Carmody

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 8 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

%d bloggers like this: