As web technologies and the public cloud continue to evolve, workload requirements have grown increasingly complex. Due to this complexity, it’s often necessary to implement creative solutions to unconventional problems. Your end-game is always customer satisfaction, and if you can provide quick relief to a pressing problem, all parties involved benefit.
Context
Take this recent real-life example: a Mission customer runs an on-premise data center hosting their web application origin. This application is hosted on bare metal servers on the customer’s own physical network. The web application public endpoint is an AWS CloudFront distribution, which provides caching to quickly deliver static assets quickly to its users – dynamic content requests are passed back to the origin.
The customer sells a myriad of IoT devices directly to end-users. Within each device, the firmware contains a phone-home feature which sends a status check request to the web application. Until recently, these requests have been passed back to the origin without any issue.
As with many things on the web, for every solution there is an exploit.
The Challenge
In an attempt to either deny service or probe for exploits, malicious requests began mimicking this phone-home functionality. The first sign of trouble began when the origin application servers started crashing due to high utilization. The customer inspected traffic logs and detected a sharp increase in phone-home requests, with usage rising sharply from 20 requests per second to more than 200 requests per second.
This predicament is generally known as a low-level DDoS attack.
The challenge in this situation was made more complex by the combination of the valid nature of the phone-home functionality, which was exploited to probe for vulnerabilities in infrastructure.
The Response
As a stop gap, the customer configured their network load balancers to inspect each request based on the IoT device’s custom user-agent, and to provide a static response until a proper solution could be put in place to decouple the phone-home feature from the rest of the web application. This provided utilization relief to the origin application servers. However, this relief proved only temporary.
Soon after, these requests ramped up even further – to nearly 500 requests per second. While no longer affecting the origin application servers, the network load balancers now tasked with providing the response are now running at over 70% CPU utilization. This added additional worry and frustration to the customer.
The DDoS attack was growing.
The Solution
The customer came to us for assistance with this unique situation. Specifically, how could they provide a response to legitimate requests, but offload the processing of these requests away from their on-premise resources until a permanent solution could be put in place?
We suggested moving the processing of this request to the edge by taking advantage of AWS Lambda@Edge functionality.
Lambda@Edge currently allows for executing NodeJS Lambda functions at each of their points of presence (POPs) around the world. This meant we could provide a static response for the phone-home requests on AWS’ global infrastructure and completely offload all processing away from the customer’s on-premise resources.
First we created a NodeJS Lambda function:
'use strict';
// Static content response
let content = `OK`;
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request;
const headers = request.headers;
if (headers['user-agent']) {
const ua = headers['user-agent'][0].value.toLowerCase();
// If customAgent is detected in user-agent string, return an OK response
// with static content.
if (ua.toLowerCase().indexOf('customAgent') != -1) {
const response = {
status: '200',
statusDescription: 'OK',
headers: {
'cache-control': [{
key: 'Cache-Control',
value: 'max-age=3600'
}],
'content-type': [{
key: 'Content-Type',
value: 'text/html'
}],
'content-encoding': [{
key: 'Content-Encoding',
value: 'UTF-8'
}],
},
body: content,
};
callback(null, response);
} else {
callback(null, request);
}
} else {
callback(null, request);
}
};
Next, on the CloudFront distribution, we added the Lambda@Edge function as a Viewer-Request and specified the Lambda function ARN and version number.
Conclusion
After applying this Lambda@Edge function to their CloudFront distribution, CPU utilization on the network load balancers dropped immediately – from over 70% to under 15%! While only a temporary solution, it afforded the customer ample time to implement a more permanent solution to manage the phone-home feature and decouple the application without the fear of their web application or network crashing due to high utilization.
DDoS attack averted.