Tag Archive for architecture

Amazon AWS Tips and Gotchas – Part 3 – S3, Tags and ASG

Continuing in this series of blog posts taking a bit of a “warts and all” view of a few Amazon AWS features, below are a handful more tips and gotchas when designing and implementing solutions on Amazon AWS, including AWS S3, Tags / Tagging as well as ASG (Auto-Scaling Groups).

For the first post in this series with a bit of background on where it all originated from, see here:
http://tekhead.it/blog/2016/02/amazon-aws-tips-and-gotchas-part-1/

For more posts in this series, see here:
Index of AWS Tips and Gotchas

AWS Tips And Gotchas – Part 3
  1. Individual S3 buckets are soft limited to 100 concurrent write transactions per second, and 300 reads initially and only partition as the storage performance quantities grow over time. This sounds like a lot but when you consider the average web page probably consists of 30-60 objects, it would not take a huge number of concurrent users hitting an application at the same time of day to start hitting limits on this.

    The first recommendation here, especially for read intensive workloads, is to cache the content from S3 using a service like CloudFront. This will immediately mean that for your object TTL you would only ever expect to see each object accessed a maximum of around 50 times (once per global edge location), assuming a global user base. A lot less than that if all of your users are in a small number of geographic regions.
    Second, do not use sequentially named S3 objects. Assign a prefix to the start of each filename which is a random set of characters, and will mean that in the background, S3 will shard the data across partitions rather than putting them all in one. This is very effectively explained here:
    http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

    Third, effectively shard your data across multiple S3 buckets in some logical fashion, ensuring you are also roughly spreading the read and write requests equally between them, therefore increasing your maximum IO linearly with every additional S3 bucket. You would then potentially need some form of service to keep a track of where your content lives; a common method for this is to store the S3 object locations in a DynamoDB table for resilient and fast retrieval.

    For extra fast retrieval you could also cache these S3 locations in memory using Elasticache (Memcached/Redis).AWS S3 cache all the things
    If you go down this route and assuming older data is less frequently accessed, I suggest you rebalance your data when new S3 buckets are added, otherwise you risk having hot and cold buckets, which defeats the objective of sharing them in the first place!

    Even better, just start with a decent number of S3 buckets anyway, as the buckets themselves are free; you are only charged for the content stored inside them! This, of course, adds some complexity for management and maintenance, so make sure you account for this in your designs!

    Lastly, use a CDN! That way your object access hit counts will be far lower, and your users will get improved performance from delivery of content from local pops! 🙂

  2. If you are using Tags as a method to assign permissions to users or even prevent accidental deletion of content or objects (something I’m not 100% sure I’m convinced is bullet proof but hey!), make sure you then deny the ability for users to modify those tags (duh!).

    For example, if you set a policy which states that any instance tagged with “PROD” may not be deleted without either MFA or elevated permissions, make sure you deny all ability for your users to edit said tags, otherwise they just need to change from PROD to BLAH and they can terminate the instance.AWS Tags Security

  3. This is a configuration point which can cost you a wee chunk of change if you make this error and don’t spot it quickly! When configuring your Auto-Scaling Group make sure the Grace Period is set sufficiently long to ensure your instances have time to start and complete all of their bootstrap scripts.

    If you don’t, the first time you start up your group it will boot an instance, start health checking it, decide the instance has failed, terminate that instance and boot a new one, start health checking it, decide the instance has failed, etc (ad infinitum).

    If your grace period is low this could mean spinning up as many as 60 or more instances in an hour, each with a minimum charge of an hour!Instead, work out your estimated Grace Period and consider adding an extra 20% wiggle room. Similarly, if your bootstrap script has a typo in it (as mine did in one test) which causes your health checks to fail, Auto-Scaling will keep terminating and instantiating new instances until you stop it. Make sure you have thoroughly tested your current bootstrap script prior to using it in an Auto-Scaling group!

    Update: One last point to highlight with this is some sound advice from James Kilby. Be aware as your environment changes that a sufficient grace period may be enough day one, but it might not be later on! Don’t set and forget this stuff, or you may find you come in one day with a big bill and a load of lost revenue when your site needed to scale up and couldn’t!

Find more posts in this series here:
Index of AWS Tips and Gotchas

Amazon AWS Tips and Gotchas – Part 4 – Direct Connect & Public / Private VIFs

Amazon AWS Tips and Gotchas – Part 2 – AWS EBS & RDS MS SQL

Continuing in this series of blog posts taking a bit of a “warts and all” view of a few Amazon AWS features, below are a handful more tips and gotchas when designing and implementing solutions on Amazon AWS, including EBS and MS SQL on RDS.

For the first post in this series with a bit of background on where it all originated from, see here:
http://tekhead.it/blog/2016/02/amazon-aws-tips-and-gotchas-part-1/

For more posts in this series, see here:
Index of AWS Tips and Gotchas

AWS Tips and Gotchas – Part 2 – EBS & RDS
  1. You cannot increase the size of EBS volumes without stopping the instance. If you are designing scale-out / high availability solution then this is not a big issue as you should be able to take some downtime on any individual node, but that downtime is going to be fairly significant, and the larger the volume, the more downtime you will incur. The actual process looks like this (summary below):
    • Stop the instance
    • Snapshot the volume
    • Create a new volume from the snapshot, with your new larger size
    • Detach the old volume
    • Attach the new volume and start the instance back up

    This is one of those features which is bread and butter for a vSphere or Hyper-V admin, and could be done online in seconds with the vast majority of guest operating systems.

    I think it really highlights the key difference between designing for AWS Cloud, and a traditional enterprise virtual infrastructure. In a solution where most of your hosts are ephemeral, this should not be a big issue. If you try to take a traditional enterprise approach, you may find yourself in hot water, having to take service downtime to make simple changes.

    I suggest where possible / appropriate, avoid using EBS and use alternative options such as S3 which can scale on demand.

    UPDATE 13th Feb 2017: Amazon have just released Elastic Volumes, which allow you to scale up EBS volumes on demand! Yay! More info here:
    Amazon EBS Update – New Elastic Volumes Change Everything

  2. Similar to resizing EBS volumes, you cannot hot-resize an instance, or indeed resize them / change their type in place. In order to change instance type you need to detach any EBS volumes (including root volumes if you wish to maintain them too), terminate the instance, create a new one and re-attach your volumes.
    Obviously you cannot re-attach a root volume if you are using instance storage (ephemeral) for this, so make sure you use EBS backed volumes if you want to maintain your root volumes for any scale-up elements of your solutions which cannot simply be re-created from a bootstrap script.
  3. If your application depends on Microsoft SQL, you are going to be in for a fairly unpleasant surprise! It is not currently possible to resize MS SQL volumes on Amazon RDS once they have been deployed! At all. Full stop. Nada.AWS MS SQL - say what nowThe recommendation from AWS is to deploy your estimated future capacity requirement from day one! Not very cloudy at all…Your only growth option when you hit your initial capacity limit is to migrate all the data to a new RDS instance and take some application downtime to fail over.This can be minimised by using things like log shipping from the source instance to get the target as close to up-to-date as possible, but you will still need to shut down and swing your applications, and frankly it’s a risky headache which would be better avoided if possible, and certainly not something you want to be doing on a regular basis.Probably best to design for your estimated growth, and add a percentage on top.

Find more posts in this series here:
Index of AWS Tips and Gotchas

Amazon AWS Tips and Gotchas – Part 3 – S3, Tags and ASG

Amazon AWS Tips and Gotchas – Part 1 – AWS Intro, EBS and EC2

Although I have been very much aware of AWS for many years and understood it at a high level, I have never had the time to get deep down and dirty with the AWS platform… that is until now!

I have spent the past three weeks immersing myself in AWS via the most excellent ACloud.Guru Solution Architect Associate training course, followed by a one week intensive AWS instructor-led class from QA on AWS SA Associate and Professional.

While the 100 hours or so I have spent labbing and interacting with AWS is certainly not 10,000, it has given me some valuable insights on both how absolutely AWSome (sorry – had to be done!) the platform is, as well as experiencing a few eye openers which I felt were worth sharing.

It would be very easy for me to extoll the virtues of AWS, but I don’t think there would be much benefit to that. Everyone knows it is a great platform (but maybe I’ll do it later anyway)! In the meantime, I thought it would be worthwhile taking a bit more of a “warts and all” view of a few features. Hopefully, this will avoid others stepping into the potential traps which have come up directly or indirectly through my recent training materials, as well as being a memory aid to myself!

pretty cloud AWS EC2 EBS

The key thing is with all of these “gotchas”, they are not irreparable, and can generally be worked around by tweaking your infrastructure design. In addition, with the rate that AWS develop and update features on their platforms, it is likely that many of them will improve over the coming months / years anyway.

The general feeling around many of these “features” is that AWS are indirectly and gently encouraging you to avoid building your solutions on EC2 and other IaaS services, Instead, pushing you more towards using their more managed services such as RDS, Lambda, Elastic Beanstalk etc.

This did originally start off as a single “Top 10” post but realised quickly that there are a lot more than 10 items and some of them are pretty deep dive! As such, I have split the content into easily consumable chunks, with a few lightweight ones to get us started… keep your eyes open for a few whoppers later in the series!

The full list of posts will be available here:
Index of AWS Tips and Gotchas

AWS Tips and Gotchas – Part 1
  1. Storage for any single instance may not exceed 20,000 IOPS and 320MB/sec per EBS volume. This is really only something which will impact very significant workloads. The current “recommended” workaround for this is to do some pretty scary things such as in-guest RAID / striping!

    Doing this with RAID0 means you then immediately risk loss of the entire datastore if a single EBS volume in the set goes offline for even a few seconds. Alternatively, you can buy twice as much storage and waste compute resources doing RAID calculations. In addition, you then have to do some really kludgy things to get consistent snapshots from your volume, such as taking your service offline. 
    In reality, only the most extreme workloads hit this kind of scale up. The real answer (which is probably better in the long term) is to refactor your application or database for scale-out, a far more cloudy design.
    amazon AWS EBS
  2. The internet gateway service does not provide a native method for capping of outbound bandwidth. It doesn’t take a genius to work out that when outbound bandwidth is chargeable, you could walk away with a pretty significant bandwidth bill should something decide to attack your platform with a high volume of traffic. One potential method to work around this would be to use NAT instances. You can then control the bandwidth using 3rd party software in the NAT instance OS.
  3. There is no SLA for EC2 instances unless you run them across multiple Availability Zones. Of course with typical RTTs of a few milliseconds at most, there is very little reason not to stretch your solutions across multiple AZs. The only time you might keep in one AZ is if you have highly latency sensitive applications, or potentially the type of app which requires a serialised string of DB queries to generate a response to the end user.

    In a way I actually quite like this SLA requirement as it pushes customers who might otherwise have accepted the risk of a single DC, into designing something more robust and accepting the (often minor) additional costs. With the use of Auto Scaling and Elastic Load Balancing there is often no reason you can’t have a very highly available application split across two or more AZs, whilst using roughly the same number of servers as a single site solution.

    For example the following solution would be resilient to a single AZ failure, whilst using no more infrastructure than a typical resilient on-premises single site solution:Teahead AWS Simple HA Web Configuration
    No DR replication required, no crazy metro clustering setup, nothing; just a cost effective, scalable, highly resilient and simple setup capable of withstanding the loss of an entire data centre (though not a region, obviously).

Find more posts in this series here:
Index of AWS Tips and Gotchas

Amazon AWS Tips and Gotchas – Part 2 – AWS EBS & RDS MS SQL

 

%d bloggers like this: