Skip to main content

Large-Scale Cilium Tuning Guide

Applicable Scenarios

When a TKE cluster reaches any of the following scale thresholds, cilium's default configuration may exhibit issues such as high apiserver pressure, cilium-agent OOM, slow policy computation, or insufficient BPF map capacity. We recommend tuning as described in this document:

DimensionTrigger Threshold (Reference)
Node count≥ 200
Pod count≥ 10,000
Service count≥ 1,000
Identity count≥ 1,000
NetworkPolicy count≥ 500

Thresholds are for reference only. Whether tuning is needed should be determined by examining the resource usage and latency metrics of cilium-agent, cilium-operator, and apiserver.

Tuning Checklist

The table below summarizes all tuning items, categorized by "recommended priority + evaluation needed" for quick decision-making:

PriorityTuning ItemRisk/CostWhen to Enable
⭐ Strongly recommended1. Enable CiliumEndpointSliceStill Beta in 1.19, track GA statusEnable when node count ≥ 200
⭐ Strongly recommended2. Enable APF Rate LimitingNearly zeroShould be enabled at any scale (default in install script)
Recommended3. Adjust K8s Client QPS/BurstOver-configuring may overwhelm apiserverEnable when cilium-agent sync latency is observed
Recommended4. Trim Security IdentityLabel exclusion policy must match business needsEnable when identity count ≥ 1000 or identity bloat is observed
Recommended5. Increase Agent/Operator ResourcesConsumes more node resourcesEnable when default limits are insufficient, OOM or throttling occurs
On demand6. Adjust BPF Map SizeLarger maps consume more kernel memoryEnable when BPF map writes fail or capacity warnings appear

1. Enable CiliumEndpointSlice

Effect: Aggregates multiple CiliumEndpoints into a single CiliumEndpointSlice resource, significantly reducing apiserver watch/list pressure.

Background: By default, each Pod corresponds to one CiliumEndpoint object. In a cluster with tens of thousands of Pods, this means tens of thousands of objects that cilium-agent must watch and apiserver must maintain. CiliumEndpointSlice borrows the EndpointSlice concept to aggregate multiple CEPs into one slice object, reducing the total object count to roughly 1/100 of the original.

Configuration:

ciliumEndpointSlice:
enabled: true
Beta Feature

This feature was introduced in cilium 1.11 and is still Beta in 1.19. We recommend thorough verification in a test cluster before using it in production. Track Stable progress: cilium/cilium#31904.

Once enabled, a smooth rollback is not possible (CEPSlice and CEP are not dual-written). Evaluate your rollback plan.

2. Enable APF Rate Limiting

Effect: Uses Kubernetes API Priority and Fairness to configure dedicated FlowSchema and PriorityLevelConfiguration for cilium, preventing cilium-agent's large list requests from consuming apiserver quota for other control plane components.

Configuration: The one-click install script in Installing Cilium already creates dedicated APF configuration for cilium by default (see the "Configure APF Rate Limiting" section in install.md). For manual helm installations, we recommend applying the YAML from the script separately.

Benefits:

  • cilium-agent restarts or cilium upgrades will not slow down core components like kube-controller-manager or kube-scheduler
  • Prevents "Too many requests" / 429 errors on apiserver that could stall cilium synchronization

3. Adjust K8s Client QPS/Burst

Effect: cilium-agent and cilium-operator use client-go internally to communicate with apiserver. The default QPS/Burst values are conservative and may become a synchronization bottleneck at scale.

Default values:

ComponentQPSBurst
cilium-agent1020
cilium-operator100200

Tuning configuration (adjust based on cluster size):

k8sClientRateLimit:
qps: 20
burst: 40
operator:
qps: 200
burst: 400
Determining if adjustment is needed

Run the following command to check cilium-agent's client rate limit metrics (significant throttling indicates rate limiting):

kubectl -n kube-system exec ds/cilium -- cilium metrics list | grep client_rate_limiter

If the throttle count keeps increasing, you need to raise QPS/Burst.

4. Trim Security Identity

Effect: Cilium assigns a Security Identity to each unique set of labels. Too many identities increases cilium-agent memory usage and policy computation overhead, as well as the storage pressure on CiliumIdentity resources on apiserver.

Typical sources of identity bloat:

High-cardinality labelSource
pod-template-hashChanges with every Deployment update
controller-revision-hashStatefulSet/DaemonSet rolling updates
job-nameJob instance name
batch.kubernetes.io/controller-uidJob controller UID

Configuration: Exclude these labels via extraConfig.labels to prevent them from participating in Identity computation:

extraConfig:
labels: "!pod-template-hash !controller-revision-hash !job-name !batch.kubernetes.io/controller-uid"

! means exclude (negation). Only the specified labels are excluded; all other labels still participate in Identity computation.

Verify the effect:

# Check current total Identity count
kubectl get ciliumidentities | wc -l

After adjustment, observe for a while — the total Identity count should decrease significantly.

5. Increase Agent/Operator Resources

Effect: The default resource requests/limits for cilium-agent and cilium-operator are conservative. At large scale, OOM or CPU throttling may occur, causing policy synchronization delays and late Pod network configuration.

Recommended configuration (adjust based on actual observation):

resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi
operator:
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 1000m
memory: 1Gi
How to determine appropriate limits

Observe the actual resource usage of cilium-agent and cilium-operator:

kubectl -n kube-system top pod -l app.kubernetes.io/part-of=cilium

Set limits to ≥ 2x the actual peak usage (to avoid OOMKill during traffic bursts).

6. Adjust BPF Map Size

Effect: Cilium stores service, endpoint, policy, and other data in BPF maps. The default map capacity is auto-calculated based on node memory (mapDynamicSizeRatio=0.0025, i.e., 0.25% of total memory). Once a single Pod or Service runs out of capacity, writes will fail.

When to adjust:

  • cilium-agent logs show Unable to update element for cilium_lb4_services_v2 or similar BPF map full errors
  • Hubble alerts BPF map usage near 100%

Tuning configuration:

bpf:
mapDynamicSizeRatio: 0.005 # Calculate as 0.5% of node memory (default 0.0025)

Or specify individual map sizes directly (not recommended unless you have special requirements):

bpf:
lbMapMax: 131072 # LoadBalancer service map (default 65536)
policyMapMax: 32768 # NetworkPolicy map (default 16384)
Memory Overhead

Increasing BPF map sizes increases kernel memory usage (not counted against container memory limits; it directly consumes node memory). Adjust gradually and observe, to avoid exhausting node memory.

Post-Deployment Observation

After completing the tuning, observe the following metrics to verify the effect:

MetricHealthy Baseline
cilium-agent CPU/memory usageWell below limits (recommend 50% headroom)
cilium_endpoint_regeneration_time_secondsp99 < 5s
cilium_policy_l7_total / policy computation timeNo significant backlog
apiserver apiserver_request_duration_secondscilium-related requests do not affect other components
Total CiliumIdentity countSignificant downward trend after tuning