Key takeaways and insights from Mission’s APN Ambassador and Cloud Architect Jake Malmad
AWS re:Invent 2021 is over, and hopefully, everyone returned home safely and healthy (and all virtual attendees logged off and enjoyed the rest of the weekend). This year was my first re:Invent attending as an AWS APN Ambassador. I’m grateful to be part of the program, particularly the AWS APN team and all they did to facilitate a wonderful conference experience. It was great getting to meet and talk shop with my fellow ambassadors and everyone else I had the pleasure of meeting.
Since being with Mission, I've consulted across many different verticals and levels of complexity, and the following is meant as a well-rounded, but certainly not an exhaustive recap of some of the computing, database, monitoring, security, and governance releases that any Cloud Architect might find of interest. There were plenty of other announcements that I found curious and surprising (I'd like to see the Vegas odds on private 5g and IOT digital twin market traction). Still, I've limited this to only a few services in the interest of brevity. The sessions I attended focused on Kubernetes, observability, and security/governance for complex account structures- but that could fill an entirely different post, so I'll save it for another day.
For those of us whose lives (professional, at least) revolve around AWS, the most exciting part of re:Invent is product announcements- each new service or feature can represent cost and time savings or a reduction in operational toil or technical debt. As per usual, this year’s conference was chock full of announcements, and while I haven’t had time to digest them all, here’s some that stood out:
It wouldn't be re:Invent without some new instance families launching, and this year is no exception with the new c7g, ls4gen and lm4gn instances (Graviton3 general and Graviton2 nitro storage optimized). Admittedly, when the Graviton processors launched in 2018, I was skeptical about their adoption. However, when Graviton2 was released, I saw many of my customers not only adopting it for "drop-in" managed services like Elasticache/OpenSearch but rearchitecting their applications and images to run on Graviton2 family EC2 instances or container workers. I suspect we'll continue to see further adoption with Graviton3's better price/performance ratio, and I am also excited to finally bring Graviton's performance to all of our Fargate customers (this was one of the few things I felt confident predicting given the recent launch of Graviton support for Lambda).
While announced with much smaller fanfare, two features I am excited for are ECR Pull-through Cache Repos and the open-source Karpenter Kubernetes Cluster Autoscaler, as I'm passionate about containers. While pull-through cache repositories aren't exactly groundbreaking, upstream container repositories can be an overlooked point of failure. This greatly simplifies that process, ensuring container images are canonical, scanned for vulnerabilities, and highly available if an upstream provider is unavailable, which is never fun. It is important to note that the release states: "Today, we have announced pull through cache repository support in Amazon Elastic Container Registry, for publicly accessible registries that do not require authentication" so, unfortunately, this will not work with Docker Hub currently (we'll see if/when that happens) but does support quay.io and ECR public images.
The Karpenter announcement wasn't a surprise to me, I stumbled upon the repo last month, but I have yet to use it. The Cluster Autoscaler requires a decent amount of familiarity with both Kubernetes and the quirks of the Autoscaler and the given cloud provider/environment you are working in; it's a well-maintained tool that does its job well across a wide variety of platforms. But I'm excited about a purpose-built tool specifically for cloud workloads. This past year alone, I encountered a few instances of the Cluster Autoscaler being OOMKilled trying to pull the full list of EC2 instance types. I'm interested to see what provisioners bring to the table. Defining taints and labels within multiple provisioner tiers versus creating new node groups with the correct configuration sounds great, and the promise of faster TTL by bypassing node group retry limits and binding containers at provisioning time sounds promising.
Database & Analytics:
I was excited by a spate of serverless analytics releases (serverless EMR, serverless Redshift, MSK/Kinesis on-demand), but I'll save those for my colleague from Mission's DAML team. I will say that I was initially excited for the two on-demand offerings, but after reading the pricing, the math requires particular use cases to be truly beneficial. Something, however, which will certainly allow for cost reduction is leveraging the new DynamoDB Standard-IA table class, which promises 60% cost savings over standard, but with higher read/write costs, consideration on use case is necessary. Fortunately, table classes can be switched easily in place.
The RDS team has done an amazing job of reducing the barriers to adoption for MS SQL customers over the past few years, and the new RDS Custom for SQL should address any of the remaining few (or give Babelfish a try). And, as someone with middling DBA skills, I'm interested to see the recommendations the new DevOps Guru for Aurora generates. I've seen minimal adoption of the Guru services, but this seems tailor-made for scrappier teams without DBA depth.
Management & Security:
It's no secret that Mission loves Terraform. It is our IAC of choice, but leveraging it in Control Tower environments with account factory provisioning has always posed several logistical challenges. With that in mind, I am very excited for the Account Factory for Terraform announcement, and it will be among the first things I get my hands on this week. Along the lines of governance, some helpful new data residency SCPs were added, especially for those dealing with compliance concerns, such as GDPR. CDK got some airtime as well with the construct hub and v2 going GA.
AWS Inspector also got a major makeover and one that will significantly enhance its appeal to complex, multi-account environments. New features include; continual and automated assessment scans (previously manual/scheduled), automated resource discovery, and support for containerized workloads (yay!). There were also three integration improvements; EventBridge, Security Hub (what was there previously is now inspector classic), and finally, it supports organizations! Outside of these impressive enhancements, I'm happy about it operationally because the agent now is bundled with SSM-Agent! Now, if they'd just do the same with CloudWatch and unify them all.
Speaking of CloudWatch, there were several interesting announcements there: Amazon CloudWatch Metrics Insights, CloudWatch Evidently, and Real-User Monitoring (RUM). Metrics Insights is the most valuable to the majority- investigating and querying any large amount of metrics in CloudWatch is either painful or outright unsupported. This new feature allows you to query up to 10,000 metrics with a SQL-based query engine. You can run top-N queries and quickly discover ALB requests, topics by messages for SNS, SQS queues with oldest messages, longest-running Lambda functions, overloaded EC2s, and many others with relative ease. The documentation has some nice, illustrative examples. Evidently, outside of any naming concerns, AWS is dipping its toe into feature flags, which allows for A/B testing and experimental rollouts to user subsections. Similarly, the RUM offering is similar to other offerings in the space, such as collecting web page load and rendering performance metrics. The only major benefit as of now is some additional integrations into X-Ray. Say what you will about it, but CloudWatch is a service that seems to continually and significantly add capabilities each year.
Outside of these announcements, there was a host of storage and S3 enhancements (Glacier instance retrieval, AWS Backup support, EBS snap restore/archival being the highlights). Many improvements to the various optimizers and path analyzers, and many, many, many SageMaker AI/ML focused releases. I'll be honest, compared to previous years, some of the announcements were a bit underwhelming, but sometimes the smallest improvement can make a massive difference in your new architecture/build and provide support for new environments. On that note, I'm quite pleased to see some better error verbiage making its way into IAM messages, which means less questioning your sanity for a policy only to realize SCP denies you! Unfortunately, only for three services to start.
A reduced-size re:Invent made for an exciting but manageable experience. Although many of the sessions I wanted to attend were waitlisted or at capacity. I was incredibly grateful to connect with people in person, and I only barely lost my voice due to the raucous success of the Mission Ignite party! I'm also thankful to all the people who helped make this year's conference a success and all those who help daily to make AWS my preferred platform to build on.