dev-resources.site
for different kinds of informations.
Use AWS StepFunctions for SSM Patching Alerts
In this blog post we'll explore how to use AWS Step Functions and SSM Patch Manager to monitor the patch compliance status of EC2 instances and send alerts, reducing manual tracking and enhancing the security of our cloud environment.
AWS Step Functions is a service that doesn't require a server to run. It allows us to connect with Lambda functions and other services to construct important business applications.
The service is built around the principle of linked tasks put together in a workflow called "state machine". A task is able to invoke other AWS services or more recently third-party APIs.
AWS Systems Manager (SSM), which includes the Patch Manager feature, provides a unified interface for managing your AWS resources, including the ability to automate patching for EC2 instances which can make things easier for us.
However, there are instances that need a reboot so the EC2 patch is completely done. We don't want to restart them automatically, especially when they're running critical services like databases. In this case, we prefer to choose when to restart.
To keep track of the EC2 instances that aren't fully compliant in the SSM patch report, we need alerts.
My goal was to send alerts to a Microsoft Teams channel, listing the EC2s that aren't compliant and need additional actions like rebooting. Initially, I used a Lambda function to do this but I didn't want to manage its dependencies over time so I switched to using Step Functions, taking advantage of its new feature that supports HTTPS endpoints.
Overview of the State Machine
The entire process is initiated by an EventBridge rule, which acts as a trigger for the Step Function state machine.
The state machine begins by identifying all currently active EC2 instances in the AWS account. It then retrieves all the instance IDs and filters them based on parameters such as ComplianceType=Patch and Status=NON_COMPLIANT.
Next it determines if there are any instances that need review. If not, the state machine will skip to the end and stop. To do this, we use a task that counts the number of instances in the list from the previous step. If the count is more than zero, indicating that there are instances requiring attention, the state machine continues to filter these instances by their tags. This information is then used to format a message sent to a Microsoft Teams channel, which includes the names and IDs of the EC2 instances that need our attention.
In the end we call the 3rd party API to send the formatted message to the Microsoft Teams channel.
Full steps description
- DescribeInstances: starts the process by identifying all currently active EC2 instances within the AWS account.
- ExtractInstanceIDs: retrieves all the instance IDs from the previously fetched list of EC2 instances.
- FetchInstanceComplianceData: filters the instances based on the ComplianceType=Patch and Status=NON_COMPLIANT parameters.
- CalculateArrayLength: calculates the size of the list of non-compliant instances.
- CheckIfInstancesFound: checks if the size of the list is greater than zero (indicating that there are non-compliant instances) or not. If no non-compliant instances are found during this step, the state machine skips to the end state and stops.
- DescribeTagsForFilteredInstances: if there are non-compliant instances, this step fetches the tags for these instances.
- PrepareNonCompliantInstanceList: prepares a list of non-compliant instances along with their names and IDs.
- CallThirdPartyAPI: formats the message with the non-compliant instances' information and sends it to a Microsoft Teams channel.
Additional Configuration Details
This state machine can be used to send the message to any communications tool like Slack, MS Teams or to a ticketing system.
For MS Teams, the endpoint URL needs to be encoded so the "@" needs to be replaced with "%40" or you can use a URL shortener service.
An HTTP Task requires an EventBridge connection, which securely manages the authentication credentials of an API provider. A connection specifies the authorization type and credentials to use for authorizing a third-party API.
In our case we are just sending a message/payload to an external URL without the need of authentication but in order to use the StepFunction HTTP Task, we need to create this connection. When creating the connection, as a requirement, you also create an AWS Secret used for authentication. Again, since there's no need to authenticate to the MS Teams channel, the Secret values contain the keyname of the API and the secret is the ARN of the EventBrdige API connection:
Infrastructure as Code
The entire PoC was done using the AWS Console but since we are living in the age of automation, I wanted to have an easy and repeatable way of deploying the solution.
In the past weeks, the Cloudformation service team announced the new IaC generator (infrastructure as code generator) which must be one of the most desired features for years now so I definitely wanted to give it a try.
It turns out that I was able to get the Cloudformation template for all the needed resources pretty easy. The hardest thing was to select from a huge dropdown list all the resources involved in my scenario and to make sure I don't leave out any. After the template was generated, inside the StepFunction JSON definition, it was a bit difficult to replace the hardcoded values with parameters. Now it seems like a piece of cake.
If you want to use this solution, checkout the Github repo which includes the entire Cloudformation stack needed for deployment.
During the initial stages of this PoC, I encountered difficulties with the CallThirdPartyAPI task so I asked around for guidance other AWS CommunityBuilders in the dedicated Slack space and got almost instant help from Benoît Bouré, Jimmy Dahlqvist and Andres Moreno. Chapeau bas!
Featured ones: