I don’t know about you, but some of my best and worst ideas come to me when I’m in the shower… it’s quite possible this may be the latter, but let’s see where it goes!
For those of you who are either regular readers of this blog, or perhaps even know me in the “walking, talking flesh sacks” world, you will probably have noticed I’m prone to long-form communication; whether it’s writing, or indeed speaking!
Due to many reasons I won’t bore you with today (but maybe later!), life has been spectacularly busy the last few months. This has led to something which I want to correct; missing out the enjoyable act of blogging here!
What’s the plan, Stan?
In response I am going to try a little experiment based on the theory of “little and often”.
In addition to my traditional “epic saga” posts, I will be producing a new post series I’m calling #TekBytes. Not quite Twitter-style microblogging, but more regular, bite-sized chunks of content. No more than a few paragraphs or a couple of hundred words per post, based on observations and challenges I see day to day in my role as a multi-cloud solutions architect.
That doesn’t mean it will all be cloudy of course, just whatever comes to mind and I can get down into a post in a few minutes, possibly even from my phone! Some of them might even only be questions for you, the readers!
And before you ask… of course there will still be terrible memes! 😀
Thoughts? Feedback? Make yourself heard using the comments below!
Continuing in this series of blog posts taking a bit of a “warts and all” view of a few Amazon AWS features, below are a handful more tips and gotchas when designing and implementing solutions on Amazon AWS. This week, we talk about the latest feature of AWS, EFS (aka Elastic File System).
20. Amazon AWS Tips and Gotchas – Part 10 – EFS (Elastic File System)
A big challenge when designing highly available web infrastructures is historically how to provide a centralised content store for static content without wasting resources.
A classic model for this is a pair of web / file servers with either rsync or Gluster to replicate the content between them. In Windows world, this would be something like either a WSFC (failover cluster) or perhaps something evil like a DFS replicated share. This means that not only are you wasting money on multiple virtual machines / instances just to serve file content, but you also add significant risk and complexity in the replication and failover between these machines.
Enter, AWS EFS!At a simple level, EFS is basically an NFS (v4.1) share within the AWS cloud, which is replicated across all AZs in any one region. No need for managing and replicating between instances, or indeed paying for EC2 instances just to create file shares! Great!
As this is still a relatively immature product, there are still a few “features” to be aware of:
There is no native EFS backup solution (yet!). I’m sure this will come very soon. As we have Re:invent coming up, it wouldn’t surprise me if something came out then. In the meantime, your main methods would be either to use Data Pipeline to backup to another EFS store or potentially mount EFS and backup through an EC2 instance using your own tools or scripts. I would be concerned about backing up EFS to EFS (if in the same region), as this is putting all your eggs in one basket. Hopefully, AWS will provide other target options in the future.
There is no native encryption of EFS data as yet. If you need this right now, you could achieve it by simply pre-encrypting the data in your application first, before it is written to EFS. Alternatively, just hold your breath as AWS have already stated that: “Amazon EFS does not currently provide the option to encrypt data at rest, but we will offer this option soon”.
If you have less than about 100GB, then due to the way the performance burst credits work you may not get the performance you need. The more you buy, the more performance you get, so don’t short change your app for the sake of a few dollars!
“Amazon EFS uses a credit system to determine when file systems can burst. Each file system earns credits over time at a baseline rate that is determined by the size of the file system, and uses credits whenever it reads or writes data”.
In early testing, it has been seen that very small filesystems can lead to IO starvation and performance issues. I would recommend you start with 100GB as a minimum (subject to your workload requirements of course!). This is still pretty cheap at only about $30-33 a month; a lot less than even a pair of EC2 instances, never mind the complexity reduction benefits. KISS!
Of course, the more caching you can do on that content, e.g. using CloudFront as a CDN, the lower the IO requirements on your EFS store.
Continuing in this series of blog posts taking a bit of a “warts and all” view of a few Amazon AWS features, below is another tip for designing and implementing solutions on Amazon AWS. In this case, Scale-Up Patching of Auto-Scaling Groups (ASGs) and a couple of wee bonuses about Dark Launch techniques.
19. AWS Tips and Gotchas – Part 9 – Scale-Up Patching in ASGs
Very quick tip on Auto Scaling Groups this week, courtesy of an awesome session I attended at the AWS User Group UK (London) last week on DevOps, presented by Chris Turvil from The Trainline.
Assuming you need to just do a code release to an existing farm of servers running in an ASG, and you aren’t planning anything complex such as a DB schema update, you can use a technique called “Scale-Up Patching”. I hadn’t heard the term before, but it’s actually incredibly simple, but very effective! There are a couple of methods you might use, depending on how you deliver your code, but the technique is the same; make your new code or image live, double the minimum size of your ASG, then halve it! Job done!So how does this work?
If you have looked into the detail of ASGs, assuming you have roughly even instances spread over multiple AZs then when an ASG shrinks / scales down, the oldest EC2 instances are killed first. For more detail on the exact rules, see here.
If you double the size of your current number of instances, all of the new instances will be deployed with your new code version. This leaves you with a farm of 50% vOld and 50% vNew. When you then tell the ASG to scale to the original size, it will obviously kill off all of the vOld instances, leaving your entire farm upgraded. If you found an issue and had to roll back, you simply rinse and repeat the same exercise! How brilliant is that?!
This process will work exactly the same regardless of whether you deploy your code via updated AMIs each time, or simply post-boot using a user-data script which pulls your source from a bucket, repo, or similar. Either way, the result is the same and infinitely repeatable!
The one counter to this which a colleague of mine brought up, is that you are explicitly depending on a specific feature of AWS always functioning in the same way and not changing in the future. An alternative might be to deploy in a blue-green setup with independent ELBs and instances. You then simply failover using Route53, either all in one go or using weighted routing for a canary release process. Funnily enough, AWS released a white paper on exactly that subject a couple of months ago: Blue/Green Deployments on AWS Whitepaper
They also cover the scale-up patching method in detail from page 17 of the whitepaper.
Brucie Bonus One – Deployment Dictionary
Incidentally, you can actually deploy said code, without it actually going live immediately, by using methods called “Dark Launch Techniques”. As the name suggests, this separates code deployment from feature launches. You pre-release your code into production, but you simply don’t toggle it on for anyone (or everyone) at first. You can then either toggle it on for everyone, or even better, smaller canary groups. Web-scale companies such as Netflix, Facebook and Google have been doing this for many years!
This process then completely avoids the panic-inducing impact of deploying a large new code release whilst simultaneously having that code go live and ramping up utilisation at the same time!
Combining dark launch methods with scale-up patching or blue/green deployments should lead to a few less grey hairs in the long run, that’s for sure!
Lastly, a bit of interesting news which also came from The Trainline is that they have open sourced their own internal deployment tool, they call Environment Manager.
With an AngularJS front end, and a Node.js back end, it’s a home-grown continuous deployment tool which includes a self-service portal, REST APIs, and a number of operational governance features. The governance elements include a feature which prevents rogue developers deploying anything which hasn’t already been defined in the central service catalogue.
Continuing in this series of blog posts taking a bit of a “warts and all” view of a few Amazon AWS features, below are a handful more tips and gotchas when designing and implementing solutions on Amazon AWS, including AWS EC2 Reserved Instances.
Reserved Instances are a great way to save yourself some money for instances you know you will require for a significant period of time (from 12-36 months). One really cool fact which AWS don’t announce enough, in my opinion, is that reserved instances can actually be shared across consolidated billing accounts!
If you wanted to, you could purchase all of your reserved instances from your primary consolidated billing accounts, however, it doing this has some potentially unexpected results:
Reserved instances don’t just provide you with a better price, they also provide you with guaranteed ability to spin up an instance of your chosen type, regardless of how busy the AZ in question actually is.
If there is an AZ outage, other AWS customers will scramble to spin up additional instances in other AZs in the same region, either manually or via ASGs, and this has the potential to starve the compute resources for one or more instance types!
Yes, that’s right, even AWS do not have an infinite compute resources!By using reserved instances, you are still guaranteed to be able to run yours regardless of available capacity for on-demand instances. They are truly reserved.
If however, you centralise your reserved instances into your CB account, you will get the reservation pricing benefits at the top of the account tree, but you don’t get the capacity reservations as these are account specific.
Reserved instances are specific to individual Availability Zones, so ensure you spread these evenly across your AZs to avoid wasting them (you are of course designing your apps to be resilient across AZs, right?) and give you maximum reserved coverage in the unlikely event of a full AZ outage.
And finally… Reserved instances are a commercial tool applied after-the-fact, not against a specific instance. When using consolidated billing for reserved instances, the reservations are therefore effectively split evenly across all accounts. If you actually want to report back to each business unit / account owner on their billing including reserved instance, this could be tricky.