Archive for Cloud

Does a Serverless Brexit mean goodbye to infrastructure management problems?

Last week I was able to get myself along to the London CloudCamp event at the Crypt on the Green, for an evening the theme of “We’ve done cloud, what’s next?”. For those of you unfamiliar with the event, CloudCamp is an “unconference” where early adopters of Cloud Computing technologies exchange ideas. As you can probably guess from the theme title, many of the discussions were around the concept of “serverless” computing.

So, other than being something which seems to freak out my spell check function, what is “serverless” then?

I think Paul Johnston of movivo summed it up well, as “scaling a single function / object in your code instead of an entire app”, which effectively means a microservices architecture. In practical terms, it’s really just another form of PaaS, where you upload your code to a provider (such as AWS Lambda), and they take care of managing all of the underlying infrastructure including compute, load balancing, scaling, etc, on your behalf.

The instances then simply act upon events (i.e. they are event driven), which could be anything from an item hitting a queue, to a user requesting a web page, and when not required, they are not running. AWS currently supports a limited subset of languages, specifically Node.js, Java, and Python.

serverless introduction

There are of course other vendors who provide similar platforms, including Google Cloud Functions, IBM Bluemix OpenWhisk, etc. They tend to support a similarly small pool of languages, however some are more agnostic and will even allow you to upload Docker containers as well. Iron.io also allows you to do serverless using your own servers, which seems a bit of an oxymoron! 🙂

Anyway, the cool thing about serverless is that you can therefore “vote to leave” your managed or IaaS infrastructure (yes, I know, seriously tenuous connection!), and just concentrate on writing your applications. This is superb for developers who don’t necessarily have the skills or the time to manage an IaaS platform once it has been deployed.

Serverless Introduction - Tenuous doesn't even come close!

The Case for Remain

Much like the Brexit vote however, it does come with some considerations and challenges, and you may not get exactly what you expected when you went to the polling booth! For example:

  • You may believe you are now running alone, but you are ultimately still dependent on actual servers! However, you no longer have access to those servers, so basic things like logging and performance monitoring suddenly become a lot trickier.
  • Taking this a step further, testing and troubleshooting becomes more challenging. When a fault occurs, how can you trace exactly where it occurred? This is further exacerbated if you are integrating with other SaaS and PaaS platforms, such as Auth0 (IAM), Firebase (DB), etc. This is already a very common architectural pattern for serverless designs.
    You therefore need to start introducing centralised logging and error trapping systems which will allow you to see what’s actually going on, which of course sounds a lot like infrastructure management again!
  • It’s still early days for serverless, so things like documentation and support are a lot more scarce. If you plan to be an early serverless adopter, you had better know your technical onions!
  • As with any microservices architecture, with great flexibility, comes great complexity! Instead of managing just a handful of interacting services, you could now be managing many hundreds of individual functions. You can understand each piece easily, but looking at the big picture is not so simple!Serverless and Microservices Complexity
  • Another level of complexity is in billing of course. Serverless services such as AWS Lambda charge you per 100ms of compute time, and per 1 million requests. If you are paying for a server and some storage, even in a cloud computing model, it’s reasonably easy to understand how much your bill will be at the end of the month.
    Paying for transactions and processing time however is could potentially provide a few nasty surprises, especially if you come under heavy load or even a DoS attack.
  • Finally, the biggest and most obvious concern about serverless is vendor lock-in. Indeed this is potentially the ultimate lock-in as once you pick a vendor and write your application specific to their cloud, moving that bad boy is going to mean some major refactoring and re-writes!
    As long as that vendors pricing is competitive, this shouldn’t matter too much (after all, every single vendor is lock-in to some varying degree), but if that vendor manages to take the lions share of the market they could easily change that pricing and you are almost powerless to react (at least not without significant additional investment).
The Case for Leave

If you understand and mitigate (or ignore!) the above however, serverless can be quite a compelling use case. For example:

  • From an environmental perspective, you will probably never find a more efficient or greener computing paradigm. It minimises the number of extraneous operating systems, virtual or physical machines required, as this is truly multi-tenant computing. Every serverless host could undoubtedly be run at 70-90% utilisation, rather than the 10-50% you typically see in most enterprise DCs today! If you could take every workload in the world and switch it to serverless overnight, based on those efficiency levels, how many data centres, how much power and how many thousands of tonnes of metals could you save? Greenpeace should be refactoring their website as we speak!Serverless Computing is green!
  • Although you do have to introduce a number of tools to help you track what is actually going on with your environment, you can move away from doing a whole load of the mundane management tasks such as patching, OS management etc, and move up the stack to spend your resources on more productive and creative activities; actually adding business value (Crazy idea! I thought in IT we just liked patching for a living?)!
  • The VM sprawl we have today would be reduced as workloads are rationalised. That said, you just end up with replacing this with container or function sprawl, which is even harder to manage! 🙂
  • You gain potentially massive scalability for your applications. Instead of scaling entire applications, you just scale the bottleneck functions, which means your application becomes more efficient overall. Definitely time to read The Goal by Goldratt and understand the Theory of Constraints before you go down this route!
  • Finally you can potentially see significant cost savings. If there are no requests, then there is no charge! If you were running some form of event driven application or trigger, instead of paying tens or hundreds of pounds per month for a server, you might only be paying pennies! Equate this to dev/test platforms which might only be needed to run workloads for a few hours a day, or production platforms which only need to process transactions when customers are actually online, it really starts to add up, even more than auto-scaling IaaS platforms.
    Taking that a step further, if you have are running a startup, why pay hundreds or thousands a month for compute you “might” need but which often sits idle, over-throwing your functions into a scalable platform which will only charge you for actual use! I know where I would be putting my money if I were a VC…

Serverless Computing is hot!

Closing Thoughts

Serverless is a really interesting technology move for the industry which (as always) comes with it’s own unique set of benefits and challenges. I can’t see it ever being the defacto standard for everything (for the same reasons we still use mainframes and physical servers today), however there are plenty of brilliant use cases for it. If devs and startups are comfortable with the vendor lock-in and other risks, why wouldn’t they consider using it?

7 Reasons Why You Should Read The Phoenix Project

The Phoenix Project

I began reading The Phoenix Project with no preconceptions, other than having been told that it is a great book, and hearing it mentioned many times on Eric Wright‘s GC On Demand podcast.

Written by Gene Kim, Kevin Behr, and George Stafford, it is told as a first-person narrative from the perspective of Bill, a middleware team manager who is promoted into a senior IT management role for a business in jeopardy. Through his experiences and a guiding hand from another key character, together we work through the problems facing the business, the IT department and the individuals within.

The story is told in an easy to read, informal style, and I made quick work of it over the course of just a few days. I really enjoyed it on numerous levels:

  1. I recognised every single character in the book as somebody I have worked with (or indeed currently work with!). I guarantee you will feel the same!
  2. The book was pretty well written, and the story arc itself was compelling. I was really rooting for Bill to succeed in his endeavours! (But did he? You will have to read the book to find out!)
  3. The authors obviously have a great sense of humour! Quotes such as “Show me a dev who isn’t crashing production systems, and I’ll show you one who can’t fog a mirror. Or more likely, is on vacation.” had me laughing out loud on the train in front of other passengers!
  4. The book is approachable and not elitist. You could pick it up as a cable monkey or an IT director (or maybe even a Sales person!!!), and relate to the concepts and methods described.
  5. I learned a huge amount about different methods for handling and improving processes around WIP (Work in Progress), such as the Theory of Constraints or the use of Kanban boards (I am currently testing this with my pre-sales customer workloads using Trello, but I’m told Kanbanize is also very good). Resilience Engineering (think Netflix Simian Army) and numerous other techniques are also covered, along with the overarching “Three Ways” (very Zen!).
  6. I actually picked up a few key tips which could be applied directly to my pre-sales design and requirements gathering workshops with my customer stakeholders.
  7. Finally, it didn’t feel “preachy”, which is always a risk when trying to sell an idea / concept as your main theme and I was initially concerned that the book would be ramming DevOps culture down my neck throughout. This could not be farther from the truth, and the full DevOps concepts do not come into play until the story is almost complete. There are many lessons to be learned throughout the story, which could be applied to any organisation!

The Phoenix Project Cover

Here are another few choice quotes from The Phoenix Project, both humorous and insightful:

“The only thing more dangerous than a developer is a developer conspiring with Security. The two working together gives us means, motive, and opportunity.”

“How can we manage production if we don’t know what the demand, priorities, status of work in process, and resource availability are?”

“You just described ‘technical debt’ that is not being paid down. It comes from taking shortcuts, which may make sense in the short-term. But like financial debt, the compounding interest costs grow over time. If an organization doesn’t pay down its technical debt, every calorie in the organization can be spent just paying interest, in the form of unplanned work.”

“On the other hand, if a resource is ninety percent busy, the wait time is ‘ninety percent divided by ten percent’, or nine hours. In other words, our task would wait in queue nine times longer than if the resource were fifty percent idle.”

In case you hadn’t felt like I was positive enough about The Phoenix Project yet, I would say that this book should be provided as mandatory training to every person working in every IT department today, from the guys plugging in cables to the CIO!

If you do read and enjoy the book, I highly recommend also reading The Goal by Eliyahu M. Goldratt. I was a little surprised, to say the least, that this appears to be a very similar story, following a similar arc and some almost identical characters to The Phoenix Project. That said, I am half way through it at the moment and still thoroughly enjoying it, though I am not too worried about missing the movie version!

The Goal by Eli Goldratt CoverThe Goal delves even deeper into the Theory of Constraints and explains some of the tools we can use to mitigate, bypass or remove constraints in a system. All of these tools and methods can be applied as easily to IT as they can to production lines, which (without stating the bleeding obvious) is exactly the point of The Phoenix Project!

Anyway, if you want to do yourself a favour both in terms of your career development, but also a really compelling story and a thoroughly decent book, you could do a lot worse than spending £5 on the Kindle Edition of The Phoenix Project!

Where To Get Them

For anything technical, I like to buy ebooks these days for both portability and the fact that I wont be chopping down trees needlessly. Both of the above titles are available very inexpensively on Kindle:

And Finally…

Sincerest apologies for one of the most click bait-y blog titles I’ve ever posted! Even worse than this one. Honestly, I feel ashamed!

I’ll get my coat…

Amazon AWS Tips and Gotchas – Part 4 – Direct Connect & Public / Private VIFs

Continuing in this series of blog posts taking a bit of a “warts and all” view of a few Amazon AWS features, below are a handful more tips and gotchas when designing and implementing solutions on Amazon AWS, specific to Direct Connect.

For the first post in this series with a bit of background on where it all originated from, see here:
Amazon #AWS Tips and Gotchas – Part 1

For more posts in this series, see here:
Index of AWS Tips and Gotchas

Tips and Gotchas – Part 4
10. VPC Private / Public Access Considerations

If you have gone out and bought a shiny new Direct Connect to your AWS platform, you might reasonably assume that all of the users and applications on your MPLS will automatically start using this for accessing S3 content and other AWS endpoints. Unfortunately, this is not so simple!

At a high level, here is a diagram showing the two primary Direct Connect configurations, Public and Private:

AWS Direct Connect Public and Private VIFMore Info on Direct Connect here:
AWS Direct Connect by Camil Samaha

A key point to note about Direct Connect is that it supports multiple VIFs per 1Gbps or 10Gbps link:

aws2If you are not a giant enterprise and don’t need this kind of bandwidth, you can buy single VIFs from your preferred network provider, but you will pay for it on a per-VIF basis and as such multiple VPCs Direct Connect access to public endpoints will bump up your costs a bit.

The question therefore becomes, what is the cost-effective and simple solution to access service endpoints (such as S3 in the examples below), when you also want to access your private resources in your own VPCs?

This is not always a straight forward answer if you are on a tight budget.

Accessing S3 via your Direct Connect

As I understand it, the S3 endpoint acts very much like VPC peering, only it is from your VPC to S3, and is therefore subject to similar restrictions. Specifically, the S3 endpoint documentation has a very key statement:

Endpoint connections cannot be extended out of a VPC. Resources on the other side of a VPN connection, a VPC peering connection, an AWS Direct Connect connection, or a ClassicLink connection in your VPC cannot use the endpoint to communicate with resources in the endpoint service”.

Basically this means for every VPC you want to communicate with directly from your MPLS, you need another VIF, and hence another connection from your service provider. If you want to access S3 services and other AWS public endpoints directly, you will also need an additional connection dedicated to that. This assumes your requirements are not enough to justify buying a 1Gbps / 10Gbps pipe for your sole use, and are using a partner to deliver it. If you can buy 1Gbps or above then you can subdivide your pipe into multiple VIFs for little / no extra cost.

Here are four example / potential solutions for different use cases, but they are definitely NOT all recommended or supported.

  • Assuming you are using a Private VIF, then by default, the content in S3 is actually accessed over the internet (e.g. using HTTPS if you bucket is configured as such):
    This may come as a surprise to people, as you would expect to buy a connection and access any AWS service.AWS Direct Connect Private VIF
  • If you have a Direct Connect from your MPLS into Amazon as a Public connection / VIF you can then route to the content over your Direct Connect, however this means you are bypassing your VPC and going straight into Amazon.
    This is a bit like having a private internet connection, so accessing VPCs etc securely would still require you run an IPsec VPN over the top of your “public” connection. This will work fine and will mean you can maximise the utilisation of the bandwidth on your direct connect, reduce your Direct Connect costs by sharing one between all VPCs. This is OK, but frankly not brilliant as you are ultimately still depending on VPNs to secure your data. If you want very secure, private access to your VPCs, you should really just spend the money! 🙂AWS Direct Connect Public VIF
  • If you have a Direct Connect from your MPLS into Amazon as a Private connection / VIF, you could proxy the connectivity to S3 via an EC2 instance. The content is requested by your instance using the standard S3 API and forwarded back to your clients. This means your EC2 instance is now a bottleneck to your S3 storage, and if you want to avoid it becoming a SPoF, you need at least a couple of them.
    It is worth specifically noting that although technically possible, this method would be strictly against all support and recommendations from AWS! S3 Endpoints and VPC peers are for accessing content from your VPCs, they are NOT meant to be transitive.AWS Direct Connect Private VIF
  • Lastly, Amazon’s primary recommended method is to run multiple VIFs, mixing both public and private. This biggest downside here is that each VIF will likely have a specific amount of bandwidth associated with it and you will have to procure multiple connections from your provider (unless you are big enough to need to buy a minimum of 1 Gbps!).AWS Direct Connect Public and Private VIFs

As this scales to many accounts, many VPCs and many VIFs, things also start to get a bit complex when it comes to routing (especially if you want many or all of the VPCs in question to be able to route to eachother), and I will cover that in the next post.

Until then…

AWS Direct Connect VIF networkingFind more posts in this series here:
http://www.tekhead.org/tag/awsgotchas/

Amazon AWS Tips and Gotchas – Part 5 – Managing Multiple VPCs

Amazon AWS Tips and Gotchas – Part 1 – AWS Intro, EBS and EC2

Although I have been very much aware of AWS for many years and understood it at a high level, I have never had the time to get deep down and dirty with the AWS platform… that is until now!

I have spent the past three weeks immersing myself in AWS via the most excellent ACloud.Guru Solution Architect Associate training course, followed by a one week intensive AWS instructor-led class from QA on AWS SA Associate and Professional.

While the 100 hours or so I have spent labbing and interacting with AWS is certainly not 10,000, it has given me some valuable insights on both how absolutely AWSome (sorry – had to be done!) the platform is, as well as experiencing a few eye openers which I felt were worth sharing.

It would be very easy for me to extoll the virtues of AWS, but I don’t think there would be much benefit to that. Everyone knows it is a great platform (but maybe I’ll do it later anyway)! In the meantime, I thought it would be worthwhile taking a bit more of a “warts and all” view of a few features. Hopefully, this will avoid others stepping into the potential traps which have come up directly or indirectly through my recent training materials, as well as being a memory aid to myself!

pretty cloud AWS EC2 EBS

The key thing is with all of these “gotchas”, they are not irreparable, and can generally be worked around by tweaking your infrastructure design. In addition, with the rate that AWS develop and update features on their platforms, it is likely that many of them will improve over the coming months / years anyway.

The general feeling around many of these “features” is that AWS are indirectly and gently encouraging you to avoid building your solutions on EC2 and other IaaS services, Instead, pushing you more towards using their more managed services such as RDS, Lambda, Elastic Beanstalk etc.

This did originally start off as a single “Top 10” post but realised quickly that there are a lot more than 10 items and some of them are pretty deep dive! As such, I have split the content into easily consumable chunks, with a few lightweight ones to get us started… keep your eyes open for a few whoppers later in the series!

The full list of posts will be available here:
Index of AWS Tips and Gotchas

AWS Tips and Gotchas – Part 1
  1. Storage for any single instance may not exceed 20,000 IOPS and 320MB/sec per EBS volume. This is really only something which will impact very significant workloads. The current “recommended” workaround for this is to do some pretty scary things such as in-guest RAID / striping!

    Doing this with RAID0 means you then immediately risk loss of the entire datastore if a single EBS volume in the set goes offline for even a few seconds. Alternatively, you can buy twice as much storage and waste compute resources doing RAID calculations. In addition, you then have to do some really kludgy things to get consistent snapshots from your volume, such as taking your service offline. 
    In reality, only the most extreme workloads hit this kind of scale up. The real answer (which is probably better in the long term) is to refactor your application or database for scale-out, a far more cloudy design.
    amazon AWS EBS
  2. The internet gateway service does not provide a native method for capping of outbound bandwidth. It doesn’t take a genius to work out that when outbound bandwidth is chargeable, you could walk away with a pretty significant bandwidth bill should something decide to attack your platform with a high volume of traffic. One potential method to work around this would be to use NAT instances. You can then control the bandwidth using 3rd party software in the NAT instance OS.
  3. There is no SLA for EC2 instances unless you run them across multiple Availability Zones. Of course with typical RTTs of a few milliseconds at most, there is very little reason not to stretch your solutions across multiple AZs. The only time you might keep in one AZ is if you have highly latency sensitive applications, or potentially the type of app which requires a serialised string of DB queries to generate a response to the end user.

    In a way I actually quite like this SLA requirement as it pushes customers who might otherwise have accepted the risk of a single DC, into designing something more robust and accepting the (often minor) additional costs. With the use of Auto Scaling and Elastic Load Balancing there is often no reason you can’t have a very highly available application split across two or more AZs, whilst using roughly the same number of servers as a single site solution.

    For example the following solution would be resilient to a single AZ failure, whilst using no more infrastructure than a typical resilient on-premises single site solution:Teahead AWS Simple HA Web Configuration
    No DR replication required, no crazy metro clustering setup, nothing; just a cost effective, scalable, highly resilient and simple setup capable of withstanding the loss of an entire data centre (though not a region, obviously).

Find more posts in this series here:
Index of AWS Tips and Gotchas

Amazon AWS Tips and Gotchas – Part 2 – AWS EBS & RDS MS SQL

 

%d bloggers like this: