Logo

dev-resources.site

for different kinds of informations.

Scaling Applications with Kubernetes: A Guide to Horizontal Pod Autoscaling (HPA)

Published at
12/24/2024
Categories
kubernetes
hpa
horizontalscaling
cloudnative
Author
abhay_yt_52a8e72b213be229
Author
25 person written this
abhay_yt_52a8e72b213be229
open
Scaling Applications with Kubernetes: A Guide to Horizontal Pod Autoscaling (HPA)

Scaling and Horizontal Pod Autoscaling (HPA) in Kubernetes

Kubernetes provides robust mechanisms to scale applications and workloads efficiently to meet demand while optimizing resource utilization. Scaling can be done manually or automatically through Horizontal Pod Autoscaling (HPA). In this article, we'll explore the concept of scaling in Kubernetes, how HPA works, and how to implement it in your cluster.


Types of Scaling in Kubernetes

  1. Manual Scaling

    • Developers or operators manually adjust the number of pods in a deployment or replica set using commands like kubectl scale or by editing the deployment manifest.
    • Example:
     kubectl scale deployment <deployment-name> --replicas=5
    
  2. Automatic Scaling

    • Kubernetes can automatically scale workloads using built-in features like:
      • Horizontal Pod Autoscaling (HPA): Adjusts the number of pods in a deployment based on CPU, memory, or custom metrics.
      • Vertical Pod Autoscaling (VPA): Adjusts resource requests and limits (CPU and memory) for pods dynamically.
      • Cluster Autoscaler: Adds or removes nodes to/from the cluster based on workload demands.

Horizontal Pod Autoscaling (HPA)

HPA dynamically adjusts the number of pods in a deployment, replica set, or stateful set based on observed metrics (e.g., CPU or memory utilization). It ensures that applications have the necessary resources during peak demand while scaling down during low usage to save costs.

How HPA Works

  1. Metrics Collection: HPA relies on the Kubernetes Metrics Server to collect resource utilization data (CPU, memory, or custom application metrics).
  2. Target Threshold: You specify a threshold value (e.g., CPU utilization at 70%), and HPA ensures the workload maintains this target.
  3. Adjustment: If utilization exceeds the target, HPA increases the number of pods. If utilization falls below the target, it reduces the number of pods.

Setting Up Horizontal Pod Autoscaling

To implement HPA in Kubernetes, follow these steps:

1. Ensure Metrics Server is Running

HPA depends on the Metrics Server to collect resource utilization data. Verify that the Metrics Server is installed:

kubectl get deployment metrics-server -n kube-system
Enter fullscreen mode Exit fullscreen mode

If not installed, deploy it using the official manifest:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Enter fullscreen mode Exit fullscreen mode

2. Define Resource Requests and Limits

HPA requires pods to have defined resource requests for CPU or memory. Without these definitions, HPA cannot calculate utilization metrics.

Example Deployment Manifest with Resource Requests and Limits:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: example-app
  template:
    metadata:
      labels:
        app: example-app
    spec:
      containers:
        - name: app-container
          image: nginx
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "200m"
              memory: "256Mi"
Enter fullscreen mode Exit fullscreen mode

3. Create an HPA Resource

Use the kubectl autoscale command or define an HPA manifest to create an autoscaler.

Example HPA Manifest:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: example-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
Enter fullscreen mode Exit fullscreen mode

In this example:

  • scaleTargetRef specifies the deployment to scale.
  • minReplicas and maxReplicas define the scaling range.
  • averageUtilization is the CPU utilization target (70%).

4. Apply the HPA Manifest

Apply the HPA configuration using kubectl:

kubectl apply -f hpa.yaml
Enter fullscreen mode Exit fullscreen mode

5. Monitor HPA Behavior

Use the following command to monitor the HPA’s status:

kubectl get hpa
Enter fullscreen mode Exit fullscreen mode

Output:

NAME              REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
example-app-hpa   Deployment/example-app   60%/70%   2         10        3          10m
Enter fullscreen mode Exit fullscreen mode

Key Features of HPA

  1. Metrics Support: HPA can use CPU, memory, or custom metrics (e.g., requests per second).
  2. Scaling Range: Define a range for scaling using minReplicas and maxReplicas.
  3. Dynamic Scaling: Automatically adjusts the number of pods based on observed metrics.
  4. Custom Metrics: HPA can integrate with custom metrics (via Prometheus or other systems) to scale workloads based on application-specific metrics like HTTP request rates.

Custom Metrics with HPA

In addition to CPU and memory metrics, HPA supports custom metrics via the Custom Metrics API. For example, you can scale pods based on HTTP requests or queue length.

Example Custom Metric HPA:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: custom-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "10"
Enter fullscreen mode Exit fullscreen mode

This configuration scales the pods based on an average of 10 HTTP requests per second.


Best Practices for HPA

  1. Define Resource Requests and Limits: Ensure all pods have CPU and memory requests defined to enable effective scaling.
  2. Set Realistic Thresholds: Use appropriate thresholds for CPU, memory, or custom metrics based on your application’s performance benchmarks.
  3. Monitor Metrics Server: Ensure the Metrics Server is healthy and operational to avoid scaling issues.
  4. Combine with Cluster Autoscaler: Use HPA in conjunction with the Cluster Autoscaler to ensure the cluster can provision enough nodes during peak demand.
  5. Test Scaling Behavior: Simulate high traffic or load scenarios to verify that the HPA behaves as expected.

Scaling Limits and Considerations

  • Cool Down Periods: HPA may take a few minutes to adjust pod counts due to metrics collection intervals and decision-making delays.
  • Minimum and Maximum Limits: Define minReplicas and maxReplicas to avoid over-scaling or under-scaling.
  • Cluster Capacity: Ensure the cluster has sufficient resources (nodes) to accommodate the maximum number of pods defined by HPA.
  • Custom Metrics: Use Prometheus or an adapter to provide custom metrics for advanced scaling use cases.

Conclusion

Horizontal Pod Autoscaling (HPA) in Kubernetes is a powerful feature for maintaining application performance and optimizing resource utilization. By automatically adjusting the number of pods based on workload demands, HPA ensures that your applications remain responsive under varying loads while avoiding unnecessary costs during idle periods.

When combined with best practices, custom metrics, and tools like the Cluster Autoscaler, HPA enables dynamic, efficient scaling for modern cloud-native applications.


cloudnative Article's
30 articles in total
Favicon
Exploring the CNCF Landscape: A Comprehensive Overview of Cloud Native Technologies
Favicon
Highlights from Our January 2025 Cloud Native KL Event: Open Policy Agent, DexIDP Single Sign-On, and More!
Favicon
Understand Kubernetes Troubleshooting Cheat Sheet Now
Favicon
Understanding Node Problem Detector in Kubernetes: Beyond Default Node Conditions
Favicon
Site Reliability Engineering (SRE)
Favicon
What is Cloud Native?
Favicon
Managing Sensitive Data in Kubernetes: A Comprehensive Guide to K8s Secrets
Favicon
How to Automate CI/CD Pipelines with GitHub Actions
Favicon
Frugal Architecture: Embracing Cost-Effective Cloud-Native Design
Favicon
Hidden Power of Go: Unveiling the Secrets of a Robust Language
Favicon
Do you run your database on Kubernetes?
Favicon
Implementing Automated Scaling with Horizontal Pod Autoscaling (HPA)
Favicon
10,000 Followers on Dev.to
Favicon
12 Factor App Principles Explained
Favicon
Unlocking Cloud-Native Security with Cilium and eBPF
Favicon
A Cost-Effective Guide to prepare and pass the KCNA
Favicon
Bare Metal Provisioning Consulting Services Experts
Favicon
Cloud Migration Best Practices
Favicon
The Power of AWS Services
Favicon
Generics in Go: Transforming Code Reusability
Favicon
Leveraging AWS Lambda for a Modern Nigerian Lifestyle: A Serverless Computing Solution
Favicon
Leveraging AWS EC2 for a Modern Nigerian Lifestyle: A Cloud Computing Solution
Favicon
Kubernetes CRDs: The Backbone of Kubernetes Extensibility
Favicon
Tackling CPU Throttling in Kubernetes for Better Application Performance
Favicon
Key Lessons and Mistakes from Setting Up EKS Clusters
Favicon
Building a Simple Cloud-Native App with Docker
Favicon
Optimizing Kubernetes for High Availability (HA)
Favicon
Deploying and Managing Microservices with Kubernetes
Favicon
Scaling Applications with Kubernetes: A Guide to Horizontal Pod Autoscaling (HPA)
Favicon
Leveraging Docker for Cloud-Native Application Development

Featured ones: