In December 2022, we introduced our partnership with Isovalent to convey subsequent technology prolonged Berkeley Packet Filter (eBPF) dataplane for cloud-native purposes in Microsoft Azure and it was revealed that the subsequent technology of Azure Container Community Interface (CNI) dataplane can be powered by eBPF and Cilium.
At the moment, we’re thrilled to announce the overall availability of Azure CNI powered by Cilium. Azure CNI powered by Cilium is a next-generation networking platform that mixes two highly effective applied sciences: Azure CNI for scalable and versatile Pod networking management, built-in with the Azure Digital Community stack, and Cilium, an open-source undertaking that makes use of eBPF-powered knowledge aircraft for networking, safety, and observability in Kubernetes. Azure CNI powered by Cilium takes benefit of Cilium’s direct routing mode inside visitor digital machines and combines it with the Azure native routing contained in the Azure community, enabling improved community efficiency for workloads deployed in Azure Kubernetes Service (AKS) clusters, and with inbuilt assist for imposing networking safety.
On this weblog, we are going to delve additional into the efficiency and scalability outcomes achieved by this highly effective networking providing in Azure Kubernetes Service.
Efficiency and scale outcomes
Efficiency exams are carried out in AKS clusters in overlay mode to research system habits and consider efficiency below heavy load situations. These exams simulate eventualities the place the cluster is subjected to excessive ranges of useful resource utilization, resembling giant concurrent requests or excessive workloads. The target is to measure numerous efficiency metrics like response instances, throughput, scalability, and useful resource utilization to know the cluster’s habits and determine any efficiency bottlenecks.
Service routing latency
The experiment utilized the Commonplace D4 v3 SKU nodepool (16 GB mem, four vCPU) in an AKS cluster. The apachebench instrument, generally used for benchmarking and cargo testing internet servers, was used for measuring service routing latency. A complete of 50,000 requests had been generated and measured for total completion time. It has been noticed that the service routing latency of Azure CNI powered by Cilium and kube-proxy initially exhibit related efficiency till the variety of pods reaches 5000. Past this threshold, the latency for the service routing for kube-proxy based mostly cluster begins to extend, whereas it maintains a constant latency stage for Cilium based mostly clusters.
Notably, when scaling as much as 16,000 pods, the Azure CNI powered by Cilium cluster demonstrates a big enchancment with a 30 p.c discount in service routing latency in comparison with the kube-proxy cluster. These outcomes reconfirm that eBPF based mostly service routing performs higher at scale in comparison with IPTables based mostly service routing utilized by kube-proxy.
Service routing latency in seconds
Scale take a look at efficiency
The size take a look at was carried out in an Azure CNI powered by Cilium Azure Kubernetes Service cluster, using the Commonplace D4 v3 SKU nodepool (16 GB mem, four vCPU). The aim of the take a look at was to judge the efficiency of the cluster below excessive scale situations. The take a look at targeted on capturing the central processing unit (CPU) and reminiscence utilization of the nodes, in addition to monitoring the load on the API server and Cilium.
The take a look at encompassed three distinct eventualities, every designed to evaluate totally different facets of the cluster’s efficiency below various situations.
Scale take a look at with 100okay pods with no community coverage
The size take a look at was executed with a cluster comprising 1k nodes and a complete of 100okay pods. The take a look at was carried out with none community insurance policies and Kubernetes providers deployed.
Through the scale take a look at, because the variety of pods elevated from 20Okay to 100Okay, the CPU utilization of the Cilium agent remained persistently low, not exceeding 100 milli cores and reminiscence is round 500 MiB.
Scale take a look at with 100okay pods with 2k community insurance policies
The size take a look at was executed with a cluster comprising 1K nodes and a complete of 100Okay pods. The take a look at concerned the deployment of 2K community insurance policies however didn’t embody any Kubernetes providers.
The CPU utilization of the Cilium agent remained below 150 milli cores and reminiscence is round 1 GiB. This demonstrated that Cilium maintained low overhead although the variety of community insurance policies received doubled.
Scale take a look at with 1k providers with 60okay pods backend and 2k community insurance policies
This take a look at is executed with 1K nodes and 60Okay pods, accompanied by 2K community insurance policies and 1K providers, every having 60 pods related to it.
The CPU utilization of the Cilium agent remained at round 200 milli cores and reminiscence stays at round 1 GiB. This demonstrates that Cilium continues to keep up low overhead even when giant variety of providers received deployed and as we now have seen beforehand service routing by way of eBPF gives important latency positive aspects for purposes and it’s good to see that’s achieved with very low overhead at infra layer.
Get began with Azure CNI powered by Cilium
To wrap up, as evident from above outcomes, Azure CNI with eBPF dataplane of Cilium is most performant and scales significantly better with nodes, pods, providers, and community insurance policies whereas protecting overhead low. This product providing is now typically out there in Azure Kubernetes Service (AKS) and works with each Overlay and VNET mode for CNI. We’re excited to ask you to strive Azure CNI powered by Cilium and expertise the advantages in your AKS atmosphere.
To get began at the moment, go to the documentation out there on Azure CNI powered by Cilium.