Tag Archive for S3

Index of Tekhead.it Blog Posts on Amazon AWS

I wrote my first blog post on AWS in February 2016 and the series is growing pretty quickly, so I thought it was worthwhile indexing all of the current posts and providing an updated list as this grows.

Hopefully, this should make these posts a little easier for people to find in the future!

Anyway, enough gabbing, on with the posts and links:


I was kindly invited by Scott Lowe to join him on the Full Stack Journey podcast, to discuss learning AWS and cloud architecture. The episode can be accessed here:

AWS Certification

bill was study Certified SysOps Administrator

AWS Tips and Gotchas Series

Random AWS and Cloud Related Posts

Also, just in case I forget to keep this page updated:

AWS Tips and Gotchas Blog Posts

Cohesity Announces Cloud Integration Services

With the release of v2.0 of their OASIS platform, as presented as Storage Field Day 9 recently, Cohesity’s development team have continued churn out new features and data services at a significant rate. It seems that they are now accelerating towards the cloud (or should that be The Cloud?) with a raft of cloud integration features announced today!

There are three key new features included as part of this, called CloudArchive, CloudTier and CloudReplicate respectively, all of which pretty much do exactly what it says on the tin!

CloudArchive is a feature which allows you to archive datasets to the cloud (duh!), specifically onto Google Nearline, Azure, and Amazon S3. This would be most useful for things like long term retention of backups without taking up space on your primary platform.


CloudTier extends on-premises storage, allowing you to use cloud storage as a cold tier, moving your least used blocks out. If you are like me, you like to understand how these things work down deep in the guts! Mohit Aron, Founder & CEO of Cohesity, kindly provided Tekhead.it with this easy to understand explanation on their file and tiering system:

NFS/SMB files are mapped to objects in our system – which we call blobs. Each blob consists though of small pieces – which we call chunks. Chunks are variable sized – approximately ranging from 8K-16K. The variable size is due to deduplication – we do variable length deduplication.

The storage of the chunks [is] done by a completely different component. We group chunks together into what we call a chunkfile – which is approximately 8MB in size. When we store a chunkfile on-prem, it is a file on Linux. But when we put it in the cloud, it becomes an S3 object.

Chunkfiles are the units of tiering – we’ll move around chunkfiles based on their hotness.

So there you have it folks; chunkfile hotness is the key to Cohesity’s very cool new tiering technology! I love it!


With the chunkfiles set at 8mb this seems like a sensible size for moving large quantities of data back and forth to the cloud with minimal overhead. With a reasonable internet connection in place, it should still be possible to recall a “cool” chunk without too much additional latency, even if your application does require it in a hurry.

You can find out more information about these two services on a new video they have just published to their youtube channel.

The final feature, which is of most interest to me is called CloudReplicate, though this is not yet ready for release and I am keen to find out more as information becomes available. With CloudReplicate, Cohesity has made the bold decision to allow customers to run a software only edition of their solution in your cloud of choice, with native replication from their on premises appliances, paving the way to true hybrid cloud, or even simply providing a very clean DR strategy.

This solution is based on their native on-premises replication technology, and as such will support multiple replication topologies, e.g. 1-to-many, many-to-1, many-to-many, etc, providing numerous simple or complex DR and replication strategies to meet multiple use cases.


It could be argued that the new solution potentially provides their customers with an easy onramp to the cloud in a few years… I would say that anyone making an investment in Cohesity today is likely to continue to use their products for some time, and between now and then Cohesity will have the time to significantly grow their customer base and market share, even if it means enabling a few customers to move away from on-prem down the line.

I have to say that once again Cohesity have impressed with their vision and speedy development efforts. If they can back this with increase sales to match, their future certainly looks rosy!

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 9 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

AWS Certified Solutions Architect Associate Exam Study Guide & Resources

After about 5 weeks of steeping myself in the AWS ecosystem and platform, labbing like crazy, and attending a compressed AWS training course, I finally sat the AWS Certified Solutions Architect Associate exam last week and passed.

I’ve described my experience and thoughts on the exam itself here:
#AWS Certified Solutions Architect Associate Exam Prep & Experience

Study Materials

In preparation for the exam, I used the following study materials:

Best of luck with your exams!!! 🙂

AWS Certified Solutions Architect Associate Exam Prep & Experience

Amazon AWS Tips and Gotchas – Part 3 – S3, Tags and ASG

Continuing in this series of blog posts taking a bit of a “warts and all” view of a few Amazon AWS features, below are a handful more tips and gotchas when designing and implementing solutions on Amazon AWS, including AWS S3, Tags / Tagging as well as ASG (Auto-Scaling Groups).

For the first post in this series with a bit of background on where it all originated from, see here:

For more posts in this series, see here:
Index of AWS Tips and Gotchas

AWS Tips And Gotchas – Part 3
  1. Individual S3 buckets are soft limited to 100 concurrent write transactions per second, and 300 reads initially and only partition as the storage performance quantities grow over time. This sounds like a lot but when you consider the average web page probably consists of 30-60 objects, it would not take a huge number of concurrent users hitting an application at the same time of day to start hitting limits on this.

    The first recommendation here, especially for read intensive workloads, is to cache the content from S3 using a service like CloudFront. This will immediately mean that for your object TTL you would only ever expect to see each object accessed a maximum of around 50 times (once per global edge location), assuming a global user base. A lot less than that if all of your users are in a small number of geographic regions.
    Second, do not use sequentially named S3 objects. Assign a prefix to the start of each filename which is a random set of characters, and will mean that in the background, S3 will shard the data across partitions rather than putting them all in one. This is very effectively explained here:

    Third, effectively shard your data across multiple S3 buckets in some logical fashion, ensuring you are also roughly spreading the read and write requests equally between them, therefore increasing your maximum IO linearly with every additional S3 bucket. You would then potentially need some form of service to keep a track of where your content lives; a common method for this is to store the S3 object locations in a DynamoDB table for resilient and fast retrieval.

    For extra fast retrieval you could also cache these S3 locations in memory using Elasticache (Memcached/Redis).AWS S3 cache all the things
    If you go down this route and assuming older data is less frequently accessed, I suggest you rebalance your data when new S3 buckets are added, otherwise you risk having hot and cold buckets, which defeats the objective of sharing them in the first place!

    Even better, just start with a decent number of S3 buckets anyway, as the buckets themselves are free; you are only charged for the content stored inside them! This, of course, adds some complexity for management and maintenance, so make sure you account for this in your designs!

    Lastly, use a CDN! That way your object access hit counts will be far lower, and your users will get improved performance from delivery of content from local pops! 🙂

  2. If you are using Tags as a method to assign permissions to users or even prevent accidental deletion of content or objects (something I’m not 100% sure I’m convinced is bullet proof but hey!), make sure you then deny the ability for users to modify those tags (duh!).

    For example, if you set a policy which states that any instance tagged with “PROD” may not be deleted without either MFA or elevated permissions, make sure you deny all ability for your users to edit said tags, otherwise they just need to change from PROD to BLAH and they can terminate the instance.AWS Tags Security

  3. This is a configuration point which can cost you a wee chunk of change if you make this error and don’t spot it quickly! When configuring your Auto-Scaling Group make sure the Grace Period is set sufficiently long to ensure your instances have time to start and complete all of their bootstrap scripts.

    If you don’t, the first time you start up your group it will boot an instance, start health checking it, decide the instance has failed, terminate that instance and boot a new one, start health checking it, decide the instance has failed, etc (ad infinitum).

    If your grace period is low this could mean spinning up as many as 60 or more instances in an hour, each with a minimum charge of an hour!Instead, work out your estimated Grace Period and consider adding an extra 20% wiggle room. Similarly, if your bootstrap script has a typo in it (as mine did in one test) which causes your health checks to fail, Auto-Scaling will keep terminating and instantiating new instances until you stop it. Make sure you have thoroughly tested your current bootstrap script prior to using it in an Auto-Scaling group!

    Update: One last point to highlight with this is some sound advice from James Kilby. Be aware as your environment changes that a sufficient grace period may be enough day one, but it might not be later on! Don’t set and forget this stuff, or you may find you come in one day with a big bill and a load of lost revenue when your site needed to scale up and couldn’t!

Find more posts in this series here:
Index of AWS Tips and Gotchas

Amazon AWS Tips and Gotchas – Part 4 – Direct Connect & Public / Private VIFs

%d bloggers like this: