Logo

dev-resources.site

for different kinds of informations.

Kubernetes on Hybrid Cloud: Talos network

Published at
1/14/2025
Categories
kubernetes
cloud
hubridcloud
network
Author
sergelogvinov
Author
13 person written this
sergelogvinov
open
Kubernetes on Hybrid Cloud: Talos network

The network management is an important part of a Kubernetes cluster, especially in hybrid and multi-cloud environments. The stability and predictability of the network are very important for the applications running on the cluster. The network is usually more stable in one physical location than in a cross-cloud environment.

The basic components can impact the stability of the application:

  • DNS resolving
  • Network stability
  • Network latency
  • Network bandwidth

DNS resolving

The application needs to resolve DNS names to IP addresses. By default, a Kubernetes cluster uses CoreDNS as its DNS server. CoreDNS is deployed as a Kubernetes deployment and can be scaled up or down. However, if the CoreDNS pods are very far from the application pod, latency may increase, and DNS names might fail to resolve.

To solve this issue, use a DaemonSet to deploy CoreDNS on each node. Additionally, set the TrafficPolicy for the CoreDNS service to Local Service traffic topology and routing. The DNS traffic will stay within the node, keeping the latency very low.

Network stability

For kubelet and kube-proxy, network stability is crucial. These components communicate with the Kubernetes API server to configure the network and run the pods. The kubelet also updates the status of the pods and node. If the status is not updated regularly, the Kubernetes API can mark the node as unhealthy, and the pods may be rescheduled to another node.

Imagine a situation where the pods and network are working fine, but the kubelet loses connection to the API server (for example, if the Kubernetes API load balancer goes down). Kubernetes will create new copies of the pods on another node, and the old pods will be terminated once the kubelet reconnects to the API server. For stateless applications, this behavior is usually not a problem. However, for stateful applications, like databases, it can cause significant issues.

Talos solves this problem by using an embedded load balancer on each node. The kubelet and kube-proxy (or CNI plugins) connect to the local load balancer, which forwards traffic to the API server. This ensures consistent connectivity and helps avoid unnecessary disruptions caused by API server load balancer failures.

You can switch it on by setting in machine configuration:

machine:
  features:
    kubePrism:
      enabled: true
      port: 7445
Enter fullscreen mode Exit fullscreen mode

After this config, the Kubernetes API server becomes accessible on port 7445 on each node using the local host address.

Network latency and bandwidth

The best way to reduce network latency is to use native network routing. However, in hybrid and multi-cloud environments, this is not possible. The CNI (Container Network Interface) provides network overlays to address this issue, using technologies like VXLAN, GRE, or WireGuard. In all these cases, the network overlay adds an additional header to the packets, increasing network latency and reducing network bandwidth.

Talos includes an embedded network mesh based on WireGuard, a fast and secure VPN protocol that encrypts traffic between nodes. Regardless of where the nodes are located or whether they are behind NAT, the nodes can communicate with each other seamlessly.

However, since this mesh is an additional component in the network stack, it can introduce latency and some instability. The recovery process in case of issues can be slow and may take a long time.

The network mesh can be enabled in the machine configuration:

machine:
  network:
    kubespan:
      enabled: true
cluster:
  discovery:
    enabled: true
Enter fullscreen mode Exit fullscreen mode

To reduce the recovery time, you can set filters to limit the IP addresses that can be used to create the tunnels. By specifying these filters, you can ensure that the network mesh uses only specific IP in ranges:

machine:
  network:
    kubespan:
      filters:
        endpoints:
          - 0.0.0.0/0
          - '::/0'
          - '!192.168.0.0/16'
          - '!172.16.0.0/12'
          - '!10.0.0.0/8'
          - '!fd00::/8'
Enter fullscreen mode Exit fullscreen mode

Or opposite case. If you have both public and private networks and want to use only the private network for the mesh (because the public network is slower and more expensive), you can configure the network mesh to exclusively use the private network.

machine:
  network:
    kubespan:
      filters:
        endpoints:
          - '192.168.0.0/16'
          - '172.16.0.0/12'
          - '10.0.0.0/8'
Enter fullscreen mode Exit fullscreen mode

If you want to establish a mesh network only between datacenters while using the native network for communication between nodes within each datacenter, consider using kilo

Kilo can deploy as CNI plugin that creates a WireGuard-based mesh network across Kubernetes zones, region and datacenters. It allows efficient and secure connectivity between nodes in different datacenters while maintaining native networking within each datacenter. This hybrid approach can optimize performance by reducing latency and overhead for intra-datacenter traffic while ensuring secure and reliable communication between datacenters.

cloud Article's
30 articles in total
Favicon
What is Cloud Service Providers? Types, Benefits, & Examples
Favicon
DEPLOYING A WEB APPLICATION WITH ARM TEMPLATE AND AZURE CLI
Favicon
Top 7 Kubernetes Certifications in 2025
Favicon
Configuring Public IP addresses in Azure
Favicon
Kubernetes on Hybrid Cloud: Talos network
Favicon
Cloud computing can be confusing, but it doesn't have to be! ☁️🤔 In the latest episode of Cloud in List of Threes (CiLoTs), I’m serving up easy-to-digest (pun intended 🤭) explanations analogy to explain Regions, Availability Zones, and Edge Locations
Favicon
From Regions to Edge Locations: A CiLoTs Guide to Cloud Infrastructure
Favicon
Unlocking AI Potential: Simplifying Generative AI with AWS Bedrock
Favicon
Want to learn a top skill in 2025? Then check out Cloud Computing.
Favicon
🤖 DevOps-GPT: Automating SRE Resolutions with AI-Powered Agents and Insights 🤖
Favicon
Joining the AWS Community Builders Program A Journey of Growth and Collaboration
Favicon
Por onde começar os estudos da AWS - Parte 1
Favicon
Introduction to AWS services & It's carrier opportunities
Favicon
Research Paper Series: Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3
Favicon
Patching Scheduled Auto Scaling Groups with AWS
Favicon
S3 Batch Operations: Simplify Repetitive Tasks
Favicon
Firebase Alternatives to Consider in 2025
Favicon
Key Lift and Shift Migration Use Cases for Cloud Migration Success
Favicon
AWS, Azure, and GCP: Which provider is best for your needs?
Favicon
open source multi tenant cloud database
Favicon
Iniciando nuevamente...
Favicon
Oracle Cloud’s 25A Release Is Coming: Are You Prepared?
Favicon
Google Cloud Shell: Establishing Secure Connections via SSH
Favicon
Create spot instances on GCP & AWS
Favicon
Kubernetes on Hybrid Cloud: Bare-metal or Hypervisor
Favicon
🌥️ Evolução da Hospedagem em Nuvem
Favicon
Por que e como rodar bancos de dados em diferentes nuvens?
Favicon
What Makes a Good Cloud Architect?
Favicon
Attended Re:invent Recap at aws user group surat
Favicon
MDM Cloud Migration Architecture: The Future of Data Management

Featured ones: