Secondary can be just as important as Primary

There can be little doubt these days, that the future of the storage industry for primary transactional workloads is All Flash. Finito, that ship has sailed, the door is closed, the game is over, [Insert your preferred analogy here].

Now I can talk about the awesomeness of All Flash until the cows come home, but the truth is that flash is not now, and may never be as inexpensive for bulk storage as spinning rust! I say may as technologies like 3D NAND are changing the economics for flash systems. Either way, I think it will still be a long time before an 8TB flash device is cheaper than 8TB of spindle. This is especially true for storing content which does not easily dedupe or compress, such as the two key types of unstructured data which are exponentially driving global storage capacities through the roof year on year; images and video.

With that in mind, what do we do with all of our secondary data? It is still critical to our businesses from a durability and often availability standpoint, but it doesn’t usually have the same performance characteristics as primary storage. Typically it’s also the data which consumes the vast majority of our capacity!

AFA Backups

Accounting needs to hold onto at leat 7 years of their data, nobody in the world ever really deletes emails these days (whether you realise or not, your sysadmin is probably archiving all of yours in case you do something naughty, tut tut!), and woe betide you if you try to delete any of the old marketing content which has been filling up your arrays for years! A number of my customers are also seeing this data growing at exponential rates, often far exceeding business forecasts.

Looking at the secondary storage market from my personal perspective, I would probably break it down into a few broad groups of requirements:

  • Lower performance “primary” data
  • Dev/test data
  • Backup and archive data

As planning for capacity is becoming harder, and business needs are changing almost by the day, I am definitely leaning more towards scale-out solutions for all three of these use cases nowadays. Upfront costs are reduced and I have the ability to pay as I grow, whilst increasing performance linearly with capacity. To me, this is a key for any secondary storage platform.

One of the vendors we visited at SFD8, Cohesity, actually targets both of these workload types with their solution, and I believe they are a prime example of where the non-AFA part of the storage industry will move in the long term.

The company came out of stealth last summer and was founded by Mohit Aron, a rather clever chap with a background in distributed file systems. Part of the team who wrote the Google File System, he went on to co-found Nutanix as well, so his CV doesn’t read too bad at all!

Their scale-out solution utilises the now ubiquitous 2u, 4-node rack appliance physical model, with 96TB of HDDs and a quite reasonable 6TB of SSD, for which you can expect to pay an all-in price of about $80-100k after discount. It can all be managed via the console, or a REST API.

Cohesity CS2000 Series

2u or not 2u? That is the question…

That stuff is all a bit blah blah blah though of course! What really interested me is that Cohesity aim to make their platform infinitely and incrementally scalable; quite a bold vision and statement indeed! They do some very clever work around distributing data across their system, whilst achieving a shared-nothing architecture with a strongly consistent (as opposed to eventually consistent), 2-phase commit file system. Performance is achieved by first caching data on the SSD tier, then de-staging this sequentially to HDD.

I suspect the solution being infinitely scalable will be difficult to achieve, if only because you will almost certainly end up bottlenecking at the networking tier (cue boos and jeers from my wet string-loving colleagues). In reality most customers don’t need infinite as this just creates one massive fault domain. Perhaps a better aim would be to be able to scale massively, but cluster into large pods (perhaps by layer 2 domain) and be able to intelligently spread or replicate data across these fault domains for customers with extreme durability requirements?

Lastly they have a load of built-in data protection features in the initial release, including instant restore, and file level restore which is achieved by cracking open VMDKs for you and extracting the data you need. Mature features, such as SQL or Exchange object level integration, will come later.

Cohesity Architecture

Cohesity Architecture

As you might have guessed, Cohesity’s initial release appeared to be just that; an early release with a reasonable number of features on day one. Not yet the polished article, but plenty of potential! They have already begun to build on this with the second release of their OASIS software (Open Architecture for Scalable Intelligent Storage), and I am pleased to say that next week we get to go back and visit Cohesity at Storage Field Day 9 to discuss all of the new bells and whistles!

Watch this space! 🙂

To catch the presentations from Cohesity as SFD8, you can find them here:
http://techfieldday.com/companies/cohesity/

Further Reading
I would say that more than any other session at SFD8, the Cohesity session generated quite a bit of debate and interest among the guys. Check out some of their posts here:

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 8 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

Storage, Tech Field Day , , , , , , , , , , , , , , ,

AWS Certified Solutions Architect Associate Exam Study Guide & Resources

After about 5 weeks of steeping myself in the AWS ecosystem and platform, labbing like crazy, and attending a compressed AWS training course, I finally sat the AWS Certified Solutions Architect Associate exam last week and passed.

I’ve described my experience and thoughts on the exam itself here:
#AWS Certified Solutions Architect Associate Exam Prep & Experience

Study Materials

In preparation for the exam, I used the following study materials:

Best of luck with your exams!!! 🙂

AWS Certified Solutions Architect Associate Exam Prep & Experience

AWS, Certification , , , , , , , , , , , , ,

AWS Certified Solutions Architect Associate Exam Prep & Experience

Historically I have been well aware of AWS and understood the key services at a high level, but recently this has become a key strategic focus for my employer, and I was asked to get down and dirty with the platform. So after about 5 weeks of steeping myself in the AWS ecosystem and platform, labbing like crazy, and attending a compressed AWS Solutions Architect training course, I finally sat the AWS Certified Solutions Architect Associate exam this week, and am happy to say I passed!

It has been a pretty intense number of weeks, and my wife has been less than impressed with hardly seeing me for a month, but it has certainly been worthwhile!

TLDR: Loads of exam resources coming in the follow-up post. Learn to speed read! ACloud.Guru and official QA AWS courses are both good. The exam itself was reasonably tricky for an intro level exam, but not too bad. List of prep materials is here:
http://tekhead.it/blog/2016/03/aws-certified-solution-architect-associate-exam-study-guide-resources/

AWS Solutions Architect Exam Prep Process

I will post a follow-up list of resources shortly but for now, I will concentrate on the process!

My exam prep and training was largely centred around the ACloud.Guru and official QA AWS Accelerated courses, with a load of additional reading preceding and following them.

I am also a copious note taker and I spend significant amounts of time labbing to make sure that whatever I am designing for a customer, or whatever I am being tested on, I have generally done it at least once! More detail on these in the study materials post.

7 days before the AWS exam

Having spent several weeks labbing I spent my last week predominantly reading through the recommended whitepapers and reading the AWS FAQ documents, along with a number of articles from the AWS documentation site.

2 days before the AWS exam

I spent this time solidly doing practice questions, reading AWS documentation to fill in any blanks from the practice questions, and reading through my notes from the two courses.

I found the sample exam and practice questions very useful. The same goes for the practice tests in the ACloud.Guru course. Whenever I came across a question I was not 100% confident on, again I hit the AWS documentation site to fill in the blanks.

1 day before the AWS exam

One thing I did the night before the exam was to read through all of my ACloud.Guru notes, specifically concentrating on the “Exam Tips” which Ryan had noted throughout the course, as well as all of the end of section summaries.

Similarly during the QA course, every time the trainer mentioned something which is a likely exam topic I made a specific note of it. I took some time to review the list prior to the exam and look up AWS documentation and articles on the relevant features.

#AWS Certified Solutions Architect Associate Exam Prep & Experience

AWS Solutions Architect Exam Experience

The exam itself is obviously under NDA so I obviously cant go into any detail about the content. Amazon also provide an FAQ about the exam which is worth reading.

The exam centre I used was not one I had used before for Prometric or Pearson Vue. It certainly looked the part, very modern etc, but in reality, it was actually quite sub par. I was lucky enough to be sitting on the opposite side of a paper thin wall from a very noisy chap in a meeting room! Fortunately, the exam centre did provide ear plugs. Can’t say I have ever even felt the need to wear earplugs in an exam before, but there’s a first time for everything!

I felt the time allocation was reasonable. I finished after roughly 75-80% of my allotted time so very similar to a number of other industry entry to mid level exams I have taken in the past.

In terms of difficulty, I would equate the Solutions Architect Associate exam to being of a similar level to a reasonably tricky VCP / MCP, but definitely not as hard as a VCAP. I passed reasonably comfortably, but had to really think hard about quite a few of the questions. I was really glad I managed to get a bit of time to read some of the FAQ documents in the days before the exam, which were not originally on my resource list, but turned out to be very good exam prep!

Every time I hit next there was a very long pause until the next question is displayed. I can only guess the questions are being requested on the fly as you progress, as the pause was so long I cant think of any other reasonable explanation! I would guess I lost at least 3-5 minutes over the course of the exam, staring at the next question loading! Not ideal if you are pushed for time, and had I been, I may have found this more frustrating.

The submit button (which ends the exam) is frankly stupid! It appears on every single page of the exam. Do they believe people are going to answer the first 3 questions then hit submit?!? This is just asking for trouble IMHO.The test system vendor they use feels dated / clunky compared to other systems I have used recently, e.g. for Microsoft and VMware exams on Pearson Vue, which are pretty dated in and of themselves!

As this post is now getting rather long I shall end it here and provide a second post with a rather sizable list of my study materials!

In the mean time…

AWS Solution Architect Associate Exam Prep and Experience

 

AWS Certified Solutions Architect Associate Exam Study Guide & Resources

 

AWS, Certification , , , , , , , , , , , ,

Amazon AWS Tips and Gotchas – Part 3 – S3, Tags and ASG

Continuing in this series of blog posts taking a bit of a “warts and all” view of a few Amazon AWS features, below are a handful more tips and gotchas when designing and implementing solutions on Amazon AWS, including AWS S3, Tags / Tagging as well as ASG (Auto-Scaling Groups).

For the first post in this series with a bit of background on where it all originated from, see here:
http://tekhead.it/blog/2016/02/amazon-aws-tips-and-gotchas-part-1/

For more posts in this series, see here:
Index of AWS Tips and Gotchas

AWS Tips And Gotchas – Part 3
  1. Individual S3 buckets are soft limited to 100 concurrent write transactions per second, and 300 reads initially and only partition as the storage performance quantities grow over time. This sounds like a lot but when you consider the average web page probably consists of 30-60 objects, it would not take a huge number of concurrent users hitting an application at the same time of day to start hitting limits on this.

    The first recommendation here, especially for read intensive workloads, is to cache the content from S3 using a service like CloudFront. This will immediately mean that for your object TTL you would only ever expect to see each object accessed a maximum of around 50 times (once per global edge location), assuming a global user base. A lot less than that if all of your users are in a small number of geographic regions.
    Second, do not use sequentially named S3 objects. Assign a prefix to the start of each filename which is a random set of characters, and will mean that in the background, S3 will shard the data across partitions rather than putting them all in one. This is very effectively explained here:
    http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

    Third, effectively shard your data across multiple S3 buckets in some logical fashion, ensuring you are also roughly spreading the read and write requests equally between them, therefore increasing your maximum IO linearly with every additional S3 bucket. You would then potentially need some form of service to keep a track of where your content lives; a common method for this is to store the S3 object locations in a DynamoDB table for resilient and fast retrieval.

    For extra fast retrieval you could also cache these S3 locations in memory using Elasticache (Memcached/Redis).AWS S3 cache all the things
    If you go down this route and assuming older data is less frequently accessed, I suggest you rebalance your data when new S3 buckets are added, otherwise you risk having hot and cold buckets, which defeats the objective of sharing them in the first place!

    Even better, just start with a decent number of S3 buckets anyway, as the buckets themselves are free; you are only charged for the content stored inside them! This, of course, adds some complexity for management and maintenance, so make sure you account for this in your designs!

    Lastly, use a CDN! That way your object access hit counts will be far lower, and your users will get improved performance from delivery of content from local pops! 🙂

  2. If you are using Tags as a method to assign permissions to users or even prevent accidental deletion of content or objects (something I’m not 100% sure I’m convinced is bullet proof but hey!), make sure you then deny the ability for users to modify those tags (duh!).

    For example, if you set a policy which states that any instance tagged with “PROD” may not be deleted without either MFA or elevated permissions, make sure you deny all ability for your users to edit said tags, otherwise they just need to change from PROD to BLAH and they can terminate the instance.AWS Tags Security

  3. This is a configuration point which can cost you a wee chunk of change if you make this error and don’t spot it quickly! When configuring your Auto-Scaling Group make sure the Grace Period is set sufficiently long to ensure your instances have time to start and complete all of their bootstrap scripts.

    If you don’t, the first time you start up your group it will boot an instance, start health checking it, decide the instance has failed, terminate that instance and boot a new one, start health checking it, decide the instance has failed, etc (ad infinitum).

    If your grace period is low this could mean spinning up as many as 60 or more instances in an hour, each with a minimum charge of an hour!Instead, work out your estimated Grace Period and consider adding an extra 20% wiggle room. Similarly, if your bootstrap script has a typo in it (as mine did in one test) which causes your health checks to fail, Auto-Scaling will keep terminating and instantiating new instances until you stop it. Make sure you have thoroughly tested your current bootstrap script prior to using it in an Auto-Scaling group!

    Update: One last point to highlight with this is some sound advice from James Kilby. Be aware as your environment changes that a sufficient grace period may be enough day one, but it might not be later on! Don’t set and forget this stuff, or you may find you come in one day with a big bill and a load of lost revenue when your site needed to scale up and couldn’t!

Find more posts in this series here:
Index of AWS Tips and Gotchas

Amazon AWS Tips and Gotchas – Part 4 – Direct Connect & Public / Private VIFs

AWS, Storage , , , , , , , , , , , , , , , , , ,