Tag Archive for ASG

Amazon AWS Tips and Gotchas – Part 9 – Scale-Up Patching

Continuing in this series of blog posts taking a bit of a “warts and all” view of a few Amazon AWS features, below is another tip for designing and implementing solutions on Amazon AWS. In this case, Scale-Up Patching of Auto-Scaling Groups (ASGs) and a couple of wee bonuses about Dark Launch techniques.

For the first post in this series with a bit of background on where it all originated from, see here:
Amazon #AWS Tips and Gotchas – Part 1

For more posts in this series, see here:
Index of AWS Tips and Gotchas

19. AWS Tips and Gotchas – Part 9 – Scale-Up Patching in ASGs

Very quick tip on Auto Scaling Groups this week, courtesy of an awesome session I attended at the AWS User Group UK (London) last week on DevOps, presented by Chris Turvil from The Trainline.

Assuming you need to just do a code release to an existing farm of servers running in an ASG, and you aren’t planning anything complex such as a DB schema update, you can use a technique called “Scale-Up Patching”. I hadn’t heard the term before, but it’s actually incredibly simple, but very effective! There are a couple of methods you might use, depending on how you deliver your code, but the technique is the same; make your new code or image live, double the minimum size of your ASG, then halve it! Job done!AWS Scale-Up Patching with ASGs (Auto-Scaling Groups)So how does this work?

If you have looked into the detail of ASGs, assuming you have roughly even instances spread over multiple AZs then when an ASG shrinks / scales down, the oldest EC2 instances are killed first. For more detail on the exact rules, see here.

If you double the size of your current number of instances, all of the new instances will be deployed with your new code version. This leaves you with a farm of 50% vOld and 50% vNew. When you then tell the ASG to scale to the original size, it will obviously kill off all of the vOld instances, leaving your entire farm upgraded. If you found an issue and had to roll back, you simply rinse and repeat the same exercise! How brilliant is that?!

This process will work exactly the same regardless of whether you deploy your code via updated AMIs each time, or simply post-boot using a user-data script which pulls your source from a bucket, repo, or similar. Either way, the result is the same and infinitely repeatable!

The one counter to this which a colleague of mine brought up, is that you are explicitly depending on a specific feature of AWS always functioning in the same way and not changing in the future. An alternative might be to deploy in a blue-green setup with independent ELBs and instances. You then simply failover using Route53, either all in one go or using weighted routing for a canary release process. Funnily enough, AWS released a white paper on exactly that subject a couple of months ago:
Blue/Green Deployments on AWS Whitepaper

They also cover the scale-up patching method in detail from page 17 of the whitepaper.

Brucie Bonus One – Deployment Dictionary

Incidentally, you can actually deploy said code, without it actually going live immediately, by using methods called “Dark Launch Techniques”. As the name suggests, this separates code deployment from feature launches. You pre-release your code into production, but you simply don’t toggle it on for anyone (or everyone) at first. You can then either toggle it on for everyone, or even better, smaller canary groups. Web-scale companies such as Netflix, Facebook and Google have been doing this for many years!

This process then completely avoids the panic-inducing impact of deploying a large new code release whilst simultaneously having that code go live and ramping up utilisation at the same time!

devops Dark Launch Meme

Combining dark launch methods with scale-up patching or blue/green deployments should lead to a few less grey hairs in the long run, that’s for sure!

For more info, see the following overview:
What is a dark launch in terms of continuous delivery of software?

Brucie Bonus Two – Environment Manager

Lastly, a bit of interesting news which also came from The Trainline is that they have open sourced their own internal deployment tool, they call Environment Manager.

With an AngularJS front end, and a Node.js back end, it’s a home-grown continuous deployment tool which includes a self-service portal, REST APIs, and a number of operational governance features. The governance elements include a feature which prevents rogue developers deploying anything which hasn’t already been defined in the central service catalogue.

The Tramline Environment Manager Architecture

You can check out Environment Manager on GitHub:
https://trainline.github.io/environment-manager

Want More AWS Tips and Gotchas?

Find more posts in this series here:
Index of AWS Tips and Gotchas

Amazon AWS Tips and Gotchas – Part 10 – EFS (Elastic File System)

Amazon AWS Tips and Gotchas – Part 8 – AWS EC2 Reserved Instances

Continuing in this series of blog posts taking a bit of a “warts and all” view of a few Amazon AWS features, below are a handful more tips and gotchas when designing and implementing solutions on Amazon AWS, including AWS EC2 Reserved Instances.

For the first post in this series with a bit of background on where it all originated from, see here:
Amazon #AWS Tips and Gotchas – Part 1

For more posts in this series, see here:
Index of AWS Tips and Gotchas

AWS Tips and Gotchas – Part 8

Reserved Instances are a great way to save yourself some money for instances you know you will require for a significant period of time (from 12-36 months). One really cool fact which AWS don’t announce enough, in my opinion, is that reserved instances can actually be shared across consolidated billing accounts!

If you wanted to, you could purchase all of your reserved instances from your primary consolidated billing accounts, however, it doing this has some potentially unexpected results:

  1. Reserved instances don’t just provide you with a better price, they also provide you with guaranteed ability to spin up an instance of your chosen type, regardless of how busy the AZ in question actually is.
    If there is an AZ outage, other AWS customers will scramble to spin up additional instances in other AZs in the same region, either manually or via ASGs, and this has the potential to starve the compute resources for one or more instance types!
    Yes, that’s right, even AWS do not have an infinite compute resources!AWS Infinity Reserved InstancesBy using reserved instances, you are still guaranteed to be able to run yours regardless of available capacity for on-demand instances. They are truly reserved.
    If however, you centralise your reserved instances into your CB account, you will get the reservation pricing benefits at the top of the account tree, but you don’t get the capacity reservations as these are account specific.
  2. Reserved instances are specific to individual Availability Zones, so ensure you spread these evenly across your AZs to avoid wasting them (you are of course designing your apps to be resilient across AZs, right?) and give you maximum reserved coverage in the unlikely event of a full AZ outage.
  3. And finally… Reserved instances are a commercial tool applied after-the-fact, not against a specific instance. When using consolidated billing for reserved instances, the reservations are therefore effectively split evenly across all accounts. If you actually want to report back to each business unit / account owner on their billing including reserved instance, this could be tricky.

Find more posts in this series here:
Index of AWS Tips and Gotchas

Amazon AWS Tips and Gotchas – Part 9 – Scale-Up Patching

Amazon AWS Tips and Gotchas – Part 3 – S3, Tags and ASG

Continuing in this series of blog posts taking a bit of a “warts and all” view of a few Amazon AWS features, below are a handful more tips and gotchas when designing and implementing solutions on Amazon AWS, including AWS S3, Tags / Tagging as well as ASG (Auto-Scaling Groups).

For the first post in this series with a bit of background on where it all originated from, see here:
http://tekhead.it/blog/2016/02/amazon-aws-tips-and-gotchas-part-1/

For more posts in this series, see here:
Index of AWS Tips and Gotchas

AWS Tips And Gotchas – Part 3
  1. Individual S3 buckets are soft limited to 100 concurrent write transactions per second, and 300 reads initially and only partition as the storage performance quantities grow over time. This sounds like a lot but when you consider the average web page probably consists of 30-60 objects, it would not take a huge number of concurrent users hitting an application at the same time of day to start hitting limits on this.

    The first recommendation here, especially for read intensive workloads, is to cache the content from S3 using a service like CloudFront. This will immediately mean that for your object TTL you would only ever expect to see each object accessed a maximum of around 50 times (once per global edge location), assuming a global user base. A lot less than that if all of your users are in a small number of geographic regions.
    Second, do not use sequentially named S3 objects. Assign a prefix to the start of each filename which is a random set of characters, and will mean that in the background, S3 will shard the data across partitions rather than putting them all in one. This is very effectively explained here:
    http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

    Third, effectively shard your data across multiple S3 buckets in some logical fashion, ensuring you are also roughly spreading the read and write requests equally between them, therefore increasing your maximum IO linearly with every additional S3 bucket. You would then potentially need some form of service to keep a track of where your content lives; a common method for this is to store the S3 object locations in a DynamoDB table for resilient and fast retrieval.

    For extra fast retrieval you could also cache these S3 locations in memory using Elasticache (Memcached/Redis).AWS S3 cache all the things
    If you go down this route and assuming older data is less frequently accessed, I suggest you rebalance your data when new S3 buckets are added, otherwise you risk having hot and cold buckets, which defeats the objective of sharing them in the first place!

    Even better, just start with a decent number of S3 buckets anyway, as the buckets themselves are free; you are only charged for the content stored inside them! This, of course, adds some complexity for management and maintenance, so make sure you account for this in your designs!

    Lastly, use a CDN! That way your object access hit counts will be far lower, and your users will get improved performance from delivery of content from local pops! 🙂

  2. If you are using Tags as a method to assign permissions to users or even prevent accidental deletion of content or objects (something I’m not 100% sure I’m convinced is bullet proof but hey!), make sure you then deny the ability for users to modify those tags (duh!).

    For example, if you set a policy which states that any instance tagged with “PROD” may not be deleted without either MFA or elevated permissions, make sure you deny all ability for your users to edit said tags, otherwise they just need to change from PROD to BLAH and they can terminate the instance.AWS Tags Security

  3. This is a configuration point which can cost you a wee chunk of change if you make this error and don’t spot it quickly! When configuring your Auto-Scaling Group make sure the Grace Period is set sufficiently long to ensure your instances have time to start and complete all of their bootstrap scripts.

    If you don’t, the first time you start up your group it will boot an instance, start health checking it, decide the instance has failed, terminate that instance and boot a new one, start health checking it, decide the instance has failed, etc (ad infinitum).

    If your grace period is low this could mean spinning up as many as 60 or more instances in an hour, each with a minimum charge of an hour!Instead, work out your estimated Grace Period and consider adding an extra 20% wiggle room. Similarly, if your bootstrap script has a typo in it (as mine did in one test) which causes your health checks to fail, Auto-Scaling will keep terminating and instantiating new instances until you stop it. Make sure you have thoroughly tested your current bootstrap script prior to using it in an Auto-Scaling group!

    Update: One last point to highlight with this is some sound advice from James Kilby. Be aware as your environment changes that a sufficient grace period may be enough day one, but it might not be later on! Don’t set and forget this stuff, or you may find you come in one day with a big bill and a load of lost revenue when your site needed to scale up and couldn’t!

Find more posts in this series here:
Index of AWS Tips and Gotchas

Amazon AWS Tips and Gotchas – Part 4 – Direct Connect & Public / Private VIFs

%d bloggers like this: