Tag Archive for Azure

Startup Spotlight: Re-skill, Pivot or Get Squashed

spotlight

The subject matter of this post is a startup of sorts and was triggered by a conversation I had with an industry veteran a few months back. By veteran of course, I mean an old bugger! ūüėČ

It is an entity which begins its journey sourcing a target market in the tech industry and spends day and night pursuing that market to the best of its ability.

It brings in resources to help meet the key requirements of the target market; some of those resources are costly, and others not so much.

Occasionally it hits a bump in the road with funding and potentially needs to find other sources of investment, and may go through several rounds of funding over the course of a number of years. Eventually it gets to a point where the product is of a decent quality and market value.

Then it does a market analysis and discovers that the market has shifted and if the entity does not pivot or indeed re-skill, they will become irrelevant within a few short years.

Eh?

I am of course talking about the career of an IT professional.

Though I may be slightly exaggerating on the becoming irrelevant quite so fast, we certainly all made the choice to follow a career in one of the fastest moving industries on the planet. We have no choice but to continue to develop and maintain our knowledge, in order to keep driving our careers forward.

As a self-confessed virtual server hugger with a penchant for maintaining a pretty reasonable home lab, I enjoy understanding the detailed elements of a technology, how they interact, and acknowledging where the potential pitfalls are. The cloud, however, is largely obfuscated in this respect; to the point where many cloud companies will not even divulge the location of their data centres, never mind the equipment inside them and configuration thereof!

Obfuscation

Obfuscation

That said, those of you with a keen eye may have noticed a shift in my twitter stream in the past year or so, with subjects tending towards a more public cloudy outlook… Talking to a huge range of customers in various verticals on a regular basis, it feels to me that a great many organisations are right on the tipping point between their current on-premises / dedicated managed services deployment models, and full public cloud adoption (or at the very least hybrid!).

It’s hard to believe that companies like AWS have actually been living and breathing public cloud for over ten years already; that’s almost as long as my entire career! In that time they have grown from niche players selling a bit of object storage, to the¬†Behemoth-aaS they are today. To a greater or lesser extent (and for better or worse!), they are now the yardstick upon which many cloud and non-cloud services are measured. This is also particularly the case when it comes to cost, much to the chagrin of many across the industry!

To me, this feels like the optimum time for engineers and architects across our industry (most definitely including myself) to fully embrace public and hybrid cloud design patterns. My development has pivoted predominantly towards technologies which are either native to, or which support public cloud solutions. Between family commitments, work, etc, we have precious little time to spend in personal development, so we need to spend it where we think we will get the most ROI!

charge

So what have I been doing?

Instead of messing about with my vSphere lab of an evening, I have spent recent months working towards certified status in AWS, Azure, and soon, GCP. This has really been an eye opener for me around the possibilities of designs which can be achieved on the current public cloud platforms; never mind the huge quantity of features these players are likely to release in the coming 12 months, or the many more after that.

Don’t get me wrong, of course, everything is not perfect in the land of milk and honey! I have learned as much in these past months¬†about workloads and solutions which are NOT appropriate for the public cloud, as I have about solutions which are! Indeed, I have recently produced a¬†series of posts covering some of the more interesting AWS gotchas, and some potential workarounds for them. I will be following up with something similar for Azure in the coming months.

Taking AWS as an example, something which strikes me is that many of the features are not 100% perfect and don’t have every feature and nerd knob under the sun available. Most seem to have been designed to meet the 80/20 rule and are generally¬†good enough¬†to meet the majority of design requirements more than adequately. If you want to meet a corner use case¬†or a very specific requirement, then maybe you need to go beyond native public cloud tooling.

Perhaps the same could be said about the mythical Full Stack Engineer?

Good Enough

Anyhow, that’s enough rambling from me‚Ķ By no means does this kind of pivot imply that everything we as infrastructure folks have learned to date has been wasted. Indeed I personally have no intention to drop “on premises” skills and stop designing managed dedicated solutions. For the foreseeable future there will likely be a huge number of appropriate use cases, but in many, if not most cases I am being engaged to look at new solutions with a publicly cloudy mindset!

Indeed, as Ed put it this time last year:

Downtime sucks! Designing Highly Available Applications on a Budget

HA Minions

Downtime sucks.

I write this whilst sitting in an airport lounge, having been disembarked from my plane due to a technical fault. I don’t really begrudge the airline in question; it was a plumbing issue! This is a physical failure of the aircraft in question and just one of those things (unless I find out later they didn’t do the appropriate preventative maintenance of course)! Sometimes failures just happen and I would far rather it was just a plumbing issue, not an engine issue!

What is not excusable, however, is if the downtime is easily preventable; for example, if you are designing a solution which has no resilience!

This is obviously more common with small and medium sized businesses, but even large organisations can be guilty of it! I have had many conversations in the past with companies who have architected their solutions with significant single points of failure. More often than not, this is due to the cost of providing an HA stack. I fully appreciate that most IT departments are not swimming in cash, but there are many ways around a budgetary constraint and still provide more highly available, or at least “Disaster Resistant” solutions, especially in the cloud!HA Austin Powers Meme

Now obviously there is High Availability (typically within a single region or Data Centre), and Disaster Recovery (across DCs or regions). An ideal solution would achieve both, but for many organisations it can be a choice between one and the other!

Budgets are tight, what can we do?

Typically HA can be provided at either the application level (preferred), or if not, then at the infrastructure level. Many solutions to improvise availability are relatively simple and inexpensive. For example:

  • Building on a public cloud platform (and assuming that the application supports load balancing), why not test running twice as many instances with half the specification each? In most cases, unless there are significant storage quantities in each instance, the cost of scaling out this way is minimal.
    If there is a single instance, split it out into two instances, immediately doubling your availability. If there are two instances, what about splitting into 4? The impact of a node loss is then only 25% of the overall throughput capacity for the application, and can even bring down the cost of HA for applications where the +1 in N+1 is expensive!
  • Again in cloud, if there are more than two availability zones in a region (e.g. on AWS), then take advantage of them! If an application can handle 2 AZs, then the latency of adding a third shouldn’t make much, if any¬†difference, and costs will only increase slightly with a small amount of extra inter-AZ bandwidth or per-AZ services (e.g NAT gateways).
    Again, in this scenario the loss of an AZ will only take out 33% of the application servers, not 50%, so it is possible to reduce the number of servers which are effectively there for failover only.
  • If you can’t afford to run an application as multi-AZ or multi-node, consider putting it in an auto-scaling group or¬†scale-set with a minimum and maximum of 1 server. That way if an outage occurs or int he case of AWS, an entire AZ goes down, an instance will automatically be regenerated in an alternative AZ.HA Oliver
What if my app doesn’t like load balancers?

If you have an application which cannot be load balanced, you probably shouldn’t be thinking about running it in the cloud (not if you have any serious availability requirements anyway!). It amazes me how many business critical applications and services are still running in single servers all over the world!

  • If your organisation is dead set on using cloud for a SPoF app, then making it as ephemeral as possible can help. Start by splitting the DBs from the apps, as these can almost always be made HA by some means (e.g. master/slave replication, mirroring, log shipping, etc). Failover nodes also often don’t attract a license fee from many vendors (e.g. MS SQL), so always check your license documentation to see what you can achieve on the cheap.
  • Automate! If you can deploy application server(s) from a script, even if the worst happens, the application can be redeployed very quickly, in a consistent fashion.
    The trend at the moment is moving towards a more agile deployment process and automated CI/CD pipelines. This enables companies to recover from an outage by rebuilding their environments and redeploying code rapidly (as long as they have a replica of the data or a highly available datastore!).
  • If it’s not possible to script or image the code deployment, then taking regular backups (and snapshots where possible) of application servers, and testing them often is an option! If you don’t want to go through the inflexible, unreliable and painful nightmare of doing system state restores, then take image-based backups (supported by the vast majority of backup vendors nowadays). Perhaps even syncing of application data to a warm standby server which can be brought online reasonably swiftly, or even use an inexpensive DR service such as¬†Azure Site Recovery, to provide an avenue of last resort!
  • If maybe cloud isn’t the best place to locate your application, then provide HA at the infrastructure layer by utilising the HA features of your favourite hypervisor!
    For example, VMware vSphere will have an instance back up and running within a minute or two of the failure of a host using the vSphere HA feature (which comes with every edition except Essentials!). On the assumption/risk that the power cycle does not corrupt OS, applications or data, you minimise exposure to hardware outages.
  • If the budget is not enough to buy shared storage and all VMs are running on local storage in the hypervisor hosts (I have seen this more than you might imagine!), then consider using something like¬†vSphere Replication or Hyper-V Replicas to copy at least one of each critical VM role to another host, and if there are multiple instances, then spread them around the hosts.

Finally, make sure whatever happens there is¬†some form of DR, even if it is no more than a holding page or application notification and a replica or off-site backup of critical data! Customers and users would rather see something telling them that you’re working to resolve the problem, than getting a spinning wheel and a timeout! If you can provide something which is of limited functionality or performance, then it’s better than nothing!

HA ServersTLDR; High Availability on a Budget

There are a million and one ways to provide more highly available applications; these are just a few. The point is that providing highly available applications is not as expensive as you might initially think.

With a bit of elbow grease, a bit of scripting and regular testing, even on the smallest budgets you can cobble together more highly available solutions for even the crummiest applications! ūüôā

Now go forth and HA!

What’s your definition of Cloud DR, and how far down do the turtles go?

Dr Evil Disaster Recovery

WARNING –¬†Opinion piece! No Cloud Holy Wars please!

DR in IT can mean many different things to different people. To a number of people I have spoken to in the past, it’s simply HA protection against the failure of a physical host (yikes!)! To [most] others, it‚Äôs typically protection against failure of a data centre. As we discovered this week, to AWS customers, a DR plan can mean needing to protect yourself against a failure impacting an entire cloud region!

But how much is your business willing to pay for peace of mind?

When I say pay, I don’t just mean monetarily, I also mean in terms of technical flexibility and agility as well.

What are you protecting against?

What if you need to ensure that in a full region outage you will still have service? In the case of AWS, a great many customers are comfortable that the Availability Zone concept provides sufficient protection for their businesses without the need for inter-region replication, and this is perfectly valid in many cases. If you can live with a potential for a few hours downtime in the unlikely event of a full region outage, then the cost and complexity of extending beyond one region may be too much.

That said, as we saw from the failure of some AWS capabilities this week, if we take DR in the cloud to it’s most extreme, some organisations may wish to protect their business against not only a DC or region outage, but even a global impacting incident at a cloud provider!

This isn’t just technical protection either (for example against a software bug which hits multiple regions); what if a¬†cloud provider goes under due to a financial issue? Even big businesses can disappear overnight (just ask anyone who used to work for Barings Bank, Enron, Lehman Brothers, or even 2e2!).

Ok, it’s true that the likelihood of your cloud provider going under is pretty teeny tiny, but just how paranoid are your board or investors?

Cloud DR

Ultimate Cloud DR or Ultimate Paranoia?

For the ultimate in paranoia, some companies consider protecting themselves against the ultimate outage, by replicating between multiple clouds. In doing so, however, they must stick to using the lowest common denominator between clouds to avoid incompatibility, or indeed any potential for the dreaded “lock-in”.

At that point, they have then lost the ability to take advantage of one of the key benefits of going to cloud; getting rid of the “undifferentiated heavy lifting” as Simon Elisha always calls it. They then end up less agile, less flexible, and potentially spend their time on things which fail to add value to the business.

What is best for YOUR business?

These are all the kinds of considerations which the person responsible for an organisation’s IT DR strategy needs to consider, and it is up to each business to individually decide where they draw the line in terms of comfort level vs budget vs “lock-in” and features.

I don’t think anyone has the right answer to this problem today, but perhaps one possible solution is this:

No cloud is going to be 100% perfect for every single workload, so why not use this fact to our advantage? Within reason, it is possible to spread workloads across two or more public clouds based on whichever is best suited to those individual workloads. Adopting a multi-cloud strategy which meets business objectives and technical dependencies, without going crazy on the complexity front, is a definite possibility in this day and age!

(Ok, perhaps even replicating a few data sources between them, for the uber critical stuff, as a plan of last resort!).

The result is potentially a collection of smaller fault domains (aka blast radii!), making the business more resilient to significant outages from major cloud players, as only some parts of their infrastructure and a subset of applications are then impacted, whilst still being able to take full advantage of the differentiating features of each of the key cloud platforms.replication photocopierOf course, this is not going to work for everyone, and plenty of organisations struggle to find talent to build out capability internally on one cloud, never mind maintaining the broad range of skills required to utilise many clouds, but that’s where service providers can help both in terms of expertise and support.

They simply take that level of management and consulting a little further up the stack, whilst enabling the business to get on with the more exciting and value added elements on top. Then it becomes the service provider’s issue to make sure they are fully staffed and certified on your clouds of choice.

*** Full Disclosure *** I work for a global service provider who does manage multiple public clouds, and I’m lucky enough to have a role where I get to design solutions across many types of infrastructure, so I am obviously a bit biased in this regard. That doesn’t make the approach any less valid! ūüôā

The Tekhead Take

Whatever your thoughts on the approach above are, it’s key to understand what the requirements are for an individual organisation, and where their comfort levels lie.

An all-singing, all-dancing, multi-cloud, hybrid globule of agnostic cloudy goodness is probably a step too far for most organisations, but perhaps a failover physical host in another office isn’t quite enough either‚Ķ

I would love to hear your thoughts? Don’t forget to comment below!

Now that’s what I call… Tech Predictions 2017

predictions

At this time of year, it is customary to look back at the past 12 months and make some random or not-so-random guesses as to what will happen over the coming 12. As such, what could be more fitting for my final post of 2016?!

Here’s a few of my personal best, worst, and easy guess candidates for 2017‚Ķ

Tekhead Predictable Tech Predictions 2017

Easy Guesses

Come on Alex, even Penfold could have predicted these!

  • AWS will continue to dominate the cloud market, though the rate at which they deploy new features will start to slow (over 1000 a year is pretty unsustainable!). Their revenues will continue to grow at gangbuster rates, however their market share will be slightly eroded as people experiment more with their competitors too.
  • Microsoft Azure will grow massively (not quite 100% but not far off it). Their main growth will probably be in hosting enterprises and typical line of business applications as people move their legacy junk into the cloud. The recent announcements of the Single Instance VM SLA of 99.9% will definitely accelerate this as customers will feel less include to refactor their applications for cloud.
  • Distributed everything!
  • Docker will start to become more mainstream production and less Dev/Test.
  • Google will kill off at least one popular service with multiple millions of users.
  • The homelab market will reduce as people do more and more of their studying in the cloud.
  • Podcasting will become the new blogging (if it hasn’t already!)
  • DellEMC will continue to hack off bits of its anatomy to pay back that cheeky little $67Bn debt.
  • I continue to use memes as a crutch to make my otherwise lifeless articles marginally more interesting!obvious
Best Guesses

Its on the cards… maybe?

  • Google will continue to be ignored by most enterprises for Cloud IaaS. They will gain some reasonable growth in the web application space after another mass marketing activity to developers, ISVs and hosters.
  • Oracle grows Cloud revenues 50% or more but market share remains small. Their growth is mainly driven by IaaS revenue as customers begin to move their workloads to be closer to their data in the Oracle PaaS and SaaS services.
  • There will be no major storage company IPO in 2017, i.e. over $200m.
  • Many storage startups will run out of funding and die on the vine (depressing I know!). Their IP will be snapped up by the old guard storage companies in the proceeding fire sales‚Ķ
    fire-sale
  • 3D XPoint will begin to creep into storage arrays by the end of the year, fuelling another storage VC funding bubble for at least another 12 months for any company who claims to have an innovative way to use it.
  • A major cloud provider suffers a global outage.
Worst Guesses

These probably won’t happen, but if any of them do, I’ll claim smugly that I knew they were always going to!

  • Pure Storage will make an acquisition of a storage startup to create their third product line, perhaps a secondary storage company (i.e. not just all flash) along the lines of Cohesity.
  • Cisco will buy a storage company. They will be more successful at integrating it than they were with Whiptail! (Which wouldn’t be difficult‚Ķ ūüėģ )
  • Spanning a single application over multiple clouds becomes a real possibility, as one or more startups come out of stealth to provide innovative ways to span clouds. Nobody buys into it, except maybe for DR.
  • Tekhead.it becomes the most read blog in the world in 2017
  • Cats take over the planet and dogs are forced to form a rebel alliance which is ultimately victorious when a chihuahua takes out the entire cat leadership in one go, with a stolen reaper drone.Cats vs Dogs
  • Jonah Hill wins Strictly Come Dancing, narrowly defeating Frankie Boyle and Charlie Brooker in the final.
And finally…

Here’s wishing you all an awesome, fun and prosperous 2017!

%d bloggers like this: