In a previous post, we discussed serverless architecture and particularly AWS Lambda, which is a serverless, event-driven computing platform that can be triggered by a long list of events. We also provided a high-level overview as to how serverless services can help cut costs and boost performance efficiency. In this post, we’ll dive a bit deeper into AWS Lambda, including a variety of its use cases as they relate to automation.
AWS Lambda is essentially used for two different purposes; one is to run applications, and the second is automate the majority of your infrastructure actions. The important thing to keep in mind about AWS Lambda, and AWS in general, is that everything is an API call — when you’re creating a server, when you’re destroying a server, when you’re provisioning other resources and taking them down, they’re all API calls. When you dive into it, Lambda can do anything that you want it to.
We are going to review some use cases on the infrastructure side that we’ve built and deployed successfully both for our own internal projects and use cases and for our customers as well.
Those of you using Amazon CloudWatch recognize that it can get quite tedious. With CloudWatch, you have to make one single alarm for every single metric you want to watch and capture. That’s sustainable if it’s a small environment and a small amount of resources that you’re monitoring, but when you get to a certain point in scale it’s no longer feasible.
The more efficient approach is to use Lambda for automating monitoring. So for our first use case scenario with Lambda, we want to create CloudWatch alarms that will initiate recovery or reboot actions. It’s a common thing for EC2 instances to become unhealthy or go offline from time to time. This is something that actually happens just running on AWS and you’re actually supposed to architect around it. However, you can use AWS Lambda to alleviate that by putting in CloudWatch alarms with actions tied to them, so when an alarm is triggered an action is run.
In this particular use case we have a CloudWatch timer run the Lambda, scan all EC2 instances, and filter with a particular tag — for example a tag designated for automated monitoring. So if you a have a thousand servers, one Lambda function can create a thousand monitors on your behalf without having to define every single alarm that you need to have.
Just as you have Lambda functions to create infrastructure, you need to have a follow-up Lambda to do a cleanup. Resources come and go, instances die, and you need to replace them. AWS Lambda can programmatically create all your monitors that you require and then programmatically remove them when they’re no longer valid.
Until the release of AWS Backup in early 2019, users of the AWS platform have had to either live with the available snapshot mechanisms built into particular services, only some being automated, or write custom scripts to perform backups; sometimes both. Lambda has always been a great use case for its ability to perform snapshots of the various services on scheduled intervals and purge older copies, while still allowing extensibility to build alerts and monitoring with services such as SNS and Cloudwatch.
However, in more complex situations the AWS Backup does not provide enough granularity for operators on the cloud. For example, one common use case with backups is the ability to copy snapshots into other regions for the purposes of redundancy. In the event of a regional outage, your data will still be present in other regions; a common model of a pilot light disaster recovery scenario. While this feature is on the roadmap for AWS Backup, this is a feasible use case today with Lambda. Not only can you copy snapshots between regions with Lambda, but it is much more extensible as you can plug in a variety of other services such as Cloudwatch (monitoring), SNS (notification), and SQS (batching/queuing), and build your very own disaster recovery solution, all natively with Lambda. This is not only just theoretical, we have helped customers do exactly this, with a series of Lambda functions, SQS queues, SNS topics, and Cloudwatch alarms to build automated disaster recovery solutions that just work.
This is a relatively new use case but it’s also built on the concept that basically everything is an API call away. By issuing simple describe API calls against your infrastructure and listing all the components, you can basically make your function parse everything that you’re running in your infrastructure - EC2 instances, EBS volumes, RDS instances/clusters, S3 buckets, DynamoDB tables, etc. You can scan all of your infrastructure for inventorying purposes and track what is running. This is especially useful for larger organizations with many AWS teams spanning many teams. Even if the infrastructure is managed with tools such as Cloudformation and Terraform, those typically exist in templates in source control repositories and require a manual dive to see what’s in the templates. Rather than dig through source code Lambda can generate an inventory report with any metadata you’d like to see.
In addition to simple inventories you can also have Lambda actually keep your architecture diagrams up-to-date and accurate 24/7, which is great for troubleshooting. Several open source and third party tools provide ways to generate diagrams based upon your AWS infrastructure using similar mechanisms. While this may be useful to generate a diagram by hand during the creation of infrastructure, wouldn’t it be better to have an automated process to produce this diagram so that it stays up to date as infrastructure changes over time? Lambda can enable this and provide always up to date diagrams that can be made available to operations teams and alleviate the headache of infrastructure changes with documentation drift.
Nothing is better than having really accurate, up-to-date documentation, and the only way you can do that is by automating it.
With its capability to orchestrate AWS infrastructure, Lambda can also be leveraged as a deployment tool. Many people tend to use compute based solutions for this; simple bastion hosts that execute scripts or even run open source software such as Jenkins or Gitlab CI. However, one concern with this method is that you have to provide a lot of authorization to this instance in order for it to perform these actions, usually the equivalent of administrator privileges on your AWS account. In some cases, these privileges extend to multiple accounts if the system is intended to perform orchestration on other AWS accounts. This is a lot of power to grant servers and should not be taken lightly. There are constantly new vulnerabilities being found in all operating systems and software, especially those which are intended to secure our communications and access mechanisms to machines (OpenSSL, OpenSSH). There is a tremendous risk of having administrative management instances and a lot of overhead in properly securing, patching, and maintaining them.
With that in mind, Lambda can be used to replace those traditional orchestration systems and tools or at least augment them. If your current deployment process includes executing scripts from a management instance and those scripts are interacting with various AWS service APIs using the AWS Command Line Interface (CLI) or some SDK, these are all cases in which Lambda functions could be utilized instead. Rather than having an always on machine with administrative permissions and the potential to be broken into, setup your deployment workflow on Lambda and provide your management instance only the ability to perform Lambda invocations.
One particular use case we have helped build involves a serverless deployment pipeline consisting of Lambda, AWS Systems Manager, and S3. For this particular example the code was for an autoscaling pool of EC2 instances which acted as load generators for a large scale load test. Rather than have the test code included as part of the user data or AMI, which would involve recycling hundreds of instances for a simple code change, we opted to have real-time deployments to an active fleet. Using this mechanism, the operators of the system could drop a built artifact containing load test code onto S3, which would invoke a Lambda function. This Lambda function would then use the Systems Manager run command to execute a shell script on all instances in the autoscaling group. The shell script contained instructions to fetch the built artifact from S3, extract it onto the filesystem of the load machines, install any dependencies, and execute a load test.
While this use case is out of the norm of traditional EC2 based deployments, it goes to show that Lambda can be whatever you make it, including a deployment tool, especially as workloads are migrating into container and serverless based technologies. It is the glue that can bind many AWS services and technologies together which is what makes it so powerful.
As all of these use cases demonstrate, AWS Lambda enables you to proactively cut costs, boost efficiency, and streamline a wide range of operations through the use of automation. Talk to the team at Mission to see if a serverless approach fits your organization’s needs.