dev-resources.site
for different kinds of informations.
Swapping out microservices gracefully with the help of AWS
Introduction
The AWS load balancer controller is a key enabler for running services in Amazon EKS, using AWS APIs to provision load balancer resources. But this controller can help with more than just everyday management of load balancers. For instance, it greatly simplified how my team released APIs during a major project to rewrite our services in a new language.
Background
Our application follows a common pattern for running microservices in EKS. Outside requests come into our clusters through application load balancers (ALBs). The ALBs’ target groups forward requests according to path-based rules that correspond to services’ endpoints.
The load balancer controller manages our ALBs based on Ingress resources defined in services’ Helm manifests. We keep these manifests in our version control system, and deploy them through pull requests.
Here’s an abbreviated example from one of our services:
ingress:
ingressClassName: alb
enabled: true
annotations:
alb.ingress.kubernetes.io/group.name: login
alb.ingress.kubernetes.io/subnets: 'subnet-a,subnet-b,subnet-c'
alb.ingress.kubernetes.io/healthcheck-path: '/help-i-am-alive'
alb.ingress.kubernetes.io/success-codes: '200,404'
alb.ingress.kubernetes.io/target-type: 'ip'
alb.ingress.kubernetes.io/backend-protocol: 'HTTPS'
When an Ingress is deployed, the controller provisions the ALB, applies the path-based rules, and creates the target group that points to the service’s pods. It handles additional behaviors for certificates, ELB access logs, health checks and more through annotations. If we deploy updates to an ingress, the controller keeps the ALB in sync with its definition.
In with the new (but not quite out with the old)
During the project to rewrite our microservices, we continued to define service and ingress resources in Helm manifests. A new challenge would be to run old and new services side by side while we incrementally rewrote and released individual APIs. We wanted requests for rewritten APIs to be forwarded to the new service, while requests for all other APIs remain forwarded to its older counterpart.
The Ingress Group feature made this possible in part by consolidating old and new Ingress resources under the same ALB with the original group.name
annotation. When the team released an API, we just added a pathType: Exact
rule for that endpoint and deployed its ingress.
Here is an excerpt from a new service’s ingress, with some pathType: Exact
path-based rules:
ingress:
ingressClassName: alb
enabled: true
annotations:
alb.ingress.kubernetes.io/group.name: login
paths:
- path: '/api/login/path1'
pathType: Exact
- path: '/api/login/path2'
pathType: Exact
Here again is the original service’s ingress, which has a single pathType: Prefix
rule, catching anything that does not match the new service’s path-based rules.
ingress:
ingressClassName: alb
enabled: true
annotations:
alb.ingress.kubernetes.io/group.name: login
paths:
- path: '/api/login/'
pathType: Prefix
Because we defined both ingresses with alb.ingress.kubernetes.io/group.name: login
, the controller would apply both sets of rules to the original ALB, letting the new service steal requests, or so we hoped, from the original service.
Not so fast
The problem with this was that the pathType: Prefix
would match every request to /api/login/
, including /api/login/path1
and /api/login/path2
. We had no guarantee that requests for those would be forwarded to the new service.
To solve this, we could have just replaced the Prefix
path with Exact
paths for all the APIs we still wanted forwarded to the old service. That would have spared us from creating a new ALB, but would add complexity and friction to our releases, requiring changes to two ingresses with every release.
Help from AWS
We found a more elegant solution with a subtle but powerful controller feature called group.order. By assigning a smaller order number to the new service, group ordering ensured the controller would find a match for its path rules first.
Here's the new service's Ingress again, now with alb.ingress.kubernetes.io/group.order
:
ingress:
ingressClassName: alb
enabled: true
annotations:
alb.ingress.kubernetes.io/group.name: login
alb.ingress.kubernetes.io/group.order: 10
paths:
- path: '/api/login/rewritten-path1'
pathType: Exact
- path: '/api/login/rewritten/path2'
pathType: Exact
With that, we could set a higher group.order
value for the original Ingress and leave it alone until all endpoints were transitioned. Then we just replaced all the pathType: Exact rules in the new service’s manifest with a pathType: Prefix
rule and deleted the old service. The same approach worked for all of our services with Ingress resources.
Conclusion
The AWS load balancer controller's group.order
feature has made it trivial for my team release new APIs. The experience reminds me that maintaining infrastructure as code provides benefits beyond everyday management of infrastructure. Features like group.order
allow engineers to spend more time on features and less less time managing the infrastructure that they run on.
Featured ones: