Streamlining Monitoring: How to Receive AWS Health Check Alerts on Slack
Monitoring health checks in Slack, and getting a notification if something goes wrong seems simple and straightforward. Until it isn't.
Mainly because AWS uses the US-east-1 region to maintain the services related to route 53. Which makes things problematic if you use AWS services from other regions. In that case, you need to set up the infrastructure for the US region as well.
As of today, we can't trigger an alarm in our desired region (in my case it was the EU). So the idea is like this
- Configure health check in route 53 [Global]
- Attach an alarm into it [US region]
- Attach a SNS topic to this alarm [US region]
- Create a subscription, and attach a lambda there. This lamba can be any region.
- Write the logic to parse the SNS topic (whether the state is healthy or unhealthy) and send a notification based on this.
We'll use AWS Cloudformation in this blog - but you can follow along to do it manually as well.
Prepare Configuration:
If a template for the US-east-1 region is absent, update your samconfig.toml
file with relevant configuration details. This file helps streamline deployment processes.
[us-east-1]
[us-east-1.deploy]
[us-east-1.deploy.parameters]
stack_name = "your_stack_name"
s3_bucket = "your_s3_bucket"
s3_prefix = "whateveryouwant"
region = "us-east-1"
capabilities = "CAPABILITY_IAM CAPABILITY_AUTO_EXPAND CAPABILITY_NAMED_IAM"
image_repositories = []
Set Up SNS Topic:
Create an SNS, its topic and its subscription. You can keep the endpoint blank for now, we'll update it later once we write our lambda.
AWSTemplateFormatVersion: "2010-09-09"
Resources:
MySNSTopic:
Type: AWS::SNS::Topic
Properties:
DisplayName: "MyGenericTopic"
TopicName: "MyGenericTopic"
MyLambdaSubscription:
Type: AWS::SNS::Subscription
Properties:
Protocol: lambda
TopicArn: !Ref MySNSTopic
Endpoint: "arn:aws:lambda:REGION:ACCOUNT_ID:function:MyLambdaFunction"
Create Health Checks and Alarms
Craft health checks and associated alarms within a template specific to the US region (e.g., template.us.yaml
).
AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31
Description: US Stack
Resources:
FooSNS:
Type: AWS::Serverless::Application
Properties:
Location: {{path to your SNS resource}}
HealthCheckFoo:
Type: AWS::Route53::HealthCheck
Properties:
HealthCheckConfig:
FailureThreshold: 1
FullyQualifiedDomainName: www.example.com
Port: 443
RequestInterval: 30
ResourcePath: /health
Type: HTTPS
HealthCheckAlarmFoo:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: HealthCheckAlarmFoo
ComparisonOperator: LessThanThreshold
EvaluationPeriods: 1
MetricName: HealthCheckStatus
Namespace: AWS/Route53
Period: 60
Statistic: Minimum
Threshold: 1
Dimensions:
- Name: "HealthCheckId"
Value: !Ref HealthCheckFoo
AlarmDescription: "Foo Health check failed"
AlarmActions:
- !GetAtt FooSNS.Outputs.MySNSTopicArn
OKActions:
- !GetAtt FooSNS.Outputs.MySNSTopic
Quick knowledge: How this HealthCheckAlarmFoo
is being attached to this HealthCheckFoo
?
It’s by these lines.
Dimensions:
- Name: "HealthCheckId"
Value: !Ref HealthCheckFoo
If you have different types of health checks e.g.: type calculate, you should use AlarmIdentifier in the health check config. If you try to use this AlarmIdentifier
property for basic type healthcare (the one we’re creating, you will get a 400 error).
Now let’s create the lambda. In your template.yaml
add the configuration for lambda
AWSLambdas:
Type: AWS::CloudFormation::Stack
Properties:
TemplateURL: {{path to your lambda config}}
Parameters: { }
Now let’s configure the lambda:
AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31
Description: Lambda functions
Resources:
SendSlackNotificationFunction:
Type: AWS::Serverless::Function
Architecture: x86_64
Properties:
Handler: pathToYourLambdaFunction/index.handler
Runtime: nodejs18.x [Change it if you use other version]
CodeUri: [path to the codebase for lambda]
FunctionName: yourLambdaFunctionNameToSendSlackNotification
Policies:
- AWSLambdaBasicExecutionRole
# this is important, otherwise your SNS from US region can't invoke this
AllowSNSInvoke:
Type: AWS::Lambda::Permission
Properties:
Action: lambda:InvokeFunction
FunctionName: !GetAtt SendSlackNotificationFunction.Arn
Principal: 'sns.amazonaws.com'
Outputs:
SendSlackNotificationFunctionArn:
Description: "Lambda Function ARN for Slack Notifications"
Value: !GetAtt SendSlackNotificationFunction.Arn
And then, let’s write the lambda code itself
const https = require('https');
const SLACK_WEBHOOK_URL = 'URL FOR SLACK WEBHOOK';
function getMessageFromTopic(topicName, alarmState) {
// your logic to generate lamba URL
}
function handler(event) {
// Extracting topic name [in case you have multiple sns pointing out same Lambda
const topicArn = event.Records[0].Sns.TopicArn;
const topicName = topicArn.split(':').pop();
const alarmMessage = JSON.parse(event.Records[0].Sns.Message);
const alarmState = alarmMessage.NewStateValue;
const message = getMessageFromTopic(topicName, alarmState);
const postData = JSON.stringify({ text: message });
const options = {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Content-Length': postData.length
}
};
return new Promise((resolve, reject) => {
const req = https.request(SLACK_WEBHOOK_URL, options, (res) => {
if (res.statusCode === 200) {
resolve({ statusCode: 200, body: 'Message sent to Slack' });
} else {
reject(new Error(`Request to slack returned an error ${res.statusCode}, ${res.statusMessage}`));
}
});
req.on('error', (e) => {
reject(new Error(e.message));
});
req.write(postData);
req.end();
});
};
module.exports = {
handler
};
Finally, for deploying you can use these commands
sam build --template template.us.yaml --use-container
sam deploy --config-file samconfig.toml --no-confirm-changeset --no-fail-on-empty-changeset --config-env dev-us-east-1
So, there you go! Handling health check notifications across various parts of AWS doesn't have to be daunting. With SNS and Lambda, you're equipped to stay in the loop when issues arise.
Pro tip: You can also do some basic steps to recover your endpoint, may be restarting the server from lambda as well?