dev-resources.site
for different kinds of informations.
Solving Cert-Manager and Azure Application Gateway Integration for AKS
Hey all! I recently went through a bit of a journey getting cert-manager to play nicely with Azure Application Gateway and AKS (Azure Kubernetes Service). It was a learning experience, to say the least, and I wanted to share what I discovered in hopes that it might help others facing the same challenges.
The Problem
I was setting up an AKS cluster and wanted to use cert-manager to automate Let's Encrypt SSL certificate issuance for my services. I'm using Azure Application Gateway Ingress Controller (AGIC) to manage ingress to my cluster.
Everything was going smoothly until I tried to obtain certificates using the HTTP-01 challenge. I kept running into 404 and 502 errors when the Let's Encrypt validation servers tried to access the challenge URL.
Symptoms
- 404 Not Found when accessing
http://mydomain.com/.well-known/acme-challenge/*
. - 502 Bad Gateway errors from the Application Gateway.
- The health probes in Application Gateway were failing, marking the backend pool as unhealthy.
The Culprit
After digging into logs and configurations, I realized that the Azure Application Gateway wasn't correctly routing the Let's Encrypt HTTP-01 challenge requests to the cert-manager's solver pod. Here's what was happening:
- Cert-manager creates temporary pods and services to respond to the HTTP-01 challenge.
- The AGIC wasn't updating the Application Gateway configuration quickly enough to route traffic to these temporary resources.
- The default health probes in Application Gateway were failing because the solver pods return 404 for any path except the specific challenge URL.
The Solution
To fix this, I needed to:
- Create a dedicated service that consistently routes traffic to any cert-manager solver pod.
- Create a dedicated ingress that directs the challenge path to this service.
- Adjust the health probe settings in Application Gateway to consider 404 responses as healthy.
Step 1: Create the Service
I created a service named acme-challenge-service
that selects any pod with the label acme.cert-manager.io/http01-solver=true
, which cert-manager adds to its solver pods.
apiVersion: v1
kind: Service
metadata:
name: acme-challenge-service
namespace: hitc5 # Replace with your actual namespace
spec:
selector:
acme.cert-manager.io/http01-solver: "true"
ports:
- protocol: TCP
port: 8089 # Port exposed by the Service
targetPort: 8089 # Port the solver pods are listening on
Step 2: Create the Dedicated Ingress
Next, I created an ingress resource specifically for the challenge path.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: acme-challenge-ingress
namespace: hitc5
annotations:
appgw.ingress.kubernetes.io/backend-protocol: "http"
appgw.ingress.kubernetes.io/request-timeout: "300"
appgw.ingress.kubernetes.io/use-private-ip: "false"
appgw.ingress.kubernetes.io/request-logging-enabled: "true"
appgw.ingress.kubernetes.io/diagnostic-settings: "true"
appgw.ingress.kubernetes.io/health-probe-status-codes: "200-499"
spec:
ingressClassName: azure-application-gateway
rules:
- host: mc.10kvtech.co.uk # Replace with your domain
http:
paths:
- path: "/.well-known/acme-challenge/*"
pathType: Prefix
backend:
service:
name: acme-challenge-service
port:
number: 8089`
Key points:
- The
health-probe-status-codes
annotation tells Application Gateway to consider 404 responses as healthy. - The ingress routes the challenge path to the
acme-challenge-service
.
Step 3: Update Application Gateway Health Probes
By adjusting the health probe settings through the ingress annotations, the Application Gateway now considers the backend healthy even if it receives a 404 status code. This is crucial because the cert-manager solver pods return 404 for any path other than the exact challenge URL.
Testing the Setup
After applying these configurations, I wanted to ensure everything was working correctly.
Checking the Certificate Status
To verify that the certificate was successfully issued, I used the following commands:
# Check certificate status and watch for changes
kubectl get certificate -n hitc5 -w
# Describe certificate for detailed status
kubectl describe certificate mc-10kvtech-tls -n hitc5`
``
Sample Output:
Name: mc-10kvtech-tls
Namespace: hitc5
...
Status:
Conditions:
Type: Ready
Status: True
Reason: Issued
Message: Certificate issued successfully
...
``
When the Status
shows True
and the Reason
is Issued
, it means the certificate was successfully obtained and is ready for use.
Checking Ingress Resources
To see all the ingresses and their routing, especially to confirm that the ACME challenge ingress was created correctly:
Check all ingresses and their routing
kubectl get ingress -n hitc5 -o yaml
Check specifically for the ACME solver ingress
kubectl get ingress -n hitc5 -l acme.cert-manager.io/http01-solver=true -o yaml
Checking the ACME Challenge Status
To monitor the status of the ACME challenges:
kubectl get challenges -n hitc5
Checking Cert-Manager Logs for Challenge Failures
If there are issues, it's helpful to look at the cert-manager logs:
# Check cert-manager logs for challenge failures
kubectl logs -n cert-manager -l app=cert-manager
Testing the Challenge URL Before the Fix
Before applying the fix, accessing the challenge URL resulted in a 502 Bad Gateway error:
curl -v http://mc.10kvtech.co.uk/.well-known/acme-challenge/test
Output:
- Trying :80...
- Connected to mc.10kvtech.co.uk () port 80 (#0) > GET /.well-known/acme-challenge/test HTTP/1.1 > Host: mc.10kvtech.co.uk > User-Agent: curl/7.68.0 > Accept: / >
- Mark bundle as not supporting multiuse
< HTTP/1.1 502 Bad Gateway
< Server: Microsoft-Azure-Application-Gateway/v2
< Date: Mon, 02 Dec 2024 15:49:23 GMT
< Content-Type: text/html
< Content-Length: 183
< Connection: keep-alive
<
502 Bad Gateway
502 Bad Gateway
Microsoft-Azure-Application-Gateway/v2 - Connection #0 to host mc.10kvtech.co.uk left intact
This happened because the Application Gateway marked the backend as unhealthy due to failing health probes.
Testing the Challenge URL After the Fix
After applying the fix and ensuring the backend was healthy, I tested the challenge URL with the actual challenge token:
curl -v http://mc.10kvtech.co.uk/.well-known/acme-challenge/<challenge-token>
Output:
- Trying :80...
- Connected to mc.10kvtech.co.uk () port 80 (#0) > GET /.well-known/acme-challenge/ HTTP/1.1 > Host: mc.10kvtech.co.uk > User-Agent: curl/7.68.0 > Accept: / >
- Mark bundle as not supporting multiuse < HTTP/1.1 200 OK < Content-Type: text/plain < Date: Mon, 02 Dec 2024 16:00:00 GMT < Content-Length: 87 < .
- Connection #0 to host mc.10kvtech.co.uk left intact
I received the expected challenge response, indicating that the request was correctly routed to the solver pod.
Verifying the Backend Health
In the Azure Portal, I checked the Application Gateway's backend health:
- Backend Pool:
pool-hitc5-acme-challenge-service-8089-bp-8089
- Status: Healthy
This confirmed that the health probes were passing, and the Application Gateway considered the backend pool healthy.
Lessons Learned
Understanding Health Probes: Application Gateway health probes can be customized via annotations. Knowing how to adjust these settings is essential when dealing with services that don't respond with standard status codes.
Cert-Manager Solver Pods are Ephemeral: They come and go quickly, so creating a service that selects them based on labels ensures consistent routing.
AGIC Sync Intervals: The Application Gateway Ingress Controller might not sync changes instantly. Patience (or adjusting the sync interval) can help.
Detailed Logs are Your Friend: Checking logs from cert-manager, AGIC, and the Application Gateway was crucial in pinpointing where things were going wrong.
Conclusion
Integrating cert-manager with Azure Application Gateway on AKS isn't entirely "out-of-the-box," but with a bit of tweaking, it's entirely possible. I hope this write-up helps anyone else facing similar challenges.
If you're struggling with this setup, try creating dedicated services and ingresses for your challenge paths and adjust your health probes accordingly. Don't forget to check the status of your certificates and backend health to ensure everything is functioning as expected.
Feel free to reach out if you have questions or run into issues—I know how tricky it can be!
Happy coding!
Featured ones: