Skip to main content

Cilium Tuning for Large-Scale Clusters

When to Use This Guide

When your TKE cluster grows past any of the following thresholds, cilium's defaults may show apiserver pressure, cilium-agent OOMs, slow policy compilation, or BPF map exhaustion. Use this guide for tuning:

DimensionApproximate Threshold
Node count≥ 200
Pod count≥ 10,000
Service count≥ 1,000
Identity count≥ 1,000
NetworkPolicy count≥ 500

Thresholds are guidance only — judge with concrete signals from cilium-agent / cilium-operator / apiserver (resource usage, latency, throttle metrics).

Tuning Checklist

The table below summarizes every tuning item, sorted by recommended priority and risk:

PriorityItemRisk / CostWhen to enable
⭐ Strongly recommended1. Enable CiliumEndpointSliceBeta on 1.19; track GA statusNode count ≥ 200
⭐ Strongly recommended2. Enable APF rate limitingAlmost noneAny scale (the install script enables it by default)
Recommended3. Tune K8s client QPS/BurstToo high overloads apiserverWhen you see cilium-agent sync latency spikes
Recommended4. Trim Security IdentitiesLabel exclusion needs business designIdentity count ≥ 1000 or visible identity bloat
Recommended5. Raise Agent/Operator resourcesUses more node resourcesDefault limits hit OOM or CPU throttle
As needed6. Adjust BPF map sizeLarger maps consume more kernel memoryBPF map writes fail or saturation alerts

1. Enable CiliumEndpointSlice

Why: Aggregates many CiliumEndpoint objects into a single CiliumEndpointSlice resource, dramatically reducing apiserver watch/list pressure.

Background: By default, each Pod gets a CiliumEndpoint object. In a cluster with tens of thousands of Pods, this means tens of thousands of objects for cilium-agent to watch and apiserver to maintain. CiliumEndpointSlice borrows the EndpointSlice design to group multiple CEPs into one slice object — reducing total objects by roughly 100x.

Configuration:

ciliumEndpointSlice:
enabled: true
Beta Feature

This feature was introduced in cilium 1.11 and is still Beta in 1.19. Validate thoroughly in a test cluster before enabling in production. Stable tracking: cilium/cilium#31904.

There is no smooth rollback once enabled (CEPSlice and CEP are not dual-written) — plan your rollback strategy in advance.

2. Enable APF Rate Limiting

Why: Use Kubernetes API Priority and Fairness to give cilium dedicated FlowSchema + PriorityLevelConfiguration, preventing cilium-agent's heavy list traffic from squeezing out other control-plane components.

Configuration: The one-click installer in Installing Cilium provisions cilium-specific APF objects by default (see the "Configure API Priority and Fairness (APF)" section). For manual helm installs, apply the same YAML separately.

Benefits:

  • cilium-agent restarts or cilium upgrades won't slow down kube-controller-manager, kube-scheduler, or other core components
  • Avoid "Too many requests" / 429 errors stalling cilium synchronization

3. Tune K8s Client QPS/Burst

Why: cilium-agent / cilium-operator use client-go to talk to apiserver. Defaults are conservative and can become a sync bottleneck at scale.

Defaults:

ComponentQPSBurst
cilium-agent1020
cilium-operator100200

Tuned configuration (adjust based on cluster size):

k8sClientRateLimit:
qps: 20
burst: 40
operator:
qps: 200
burst: 400
How to decide if you need this

Check cilium-agent's client rate limit metrics (a high throttle count means you're being limited):

kubectl -n kube-system exec ds/cilium -- cilium metrics list | grep client_rate_limiter

If throttle counts are constantly growing, raise QPS/Burst.

4. Trim Security Identities

Why: cilium allocates one Security Identity per unique label combination. Too many identities drive up cilium-agent memory and policy compilation cost, plus apiserver storage pressure for CiliumIdentity resources.

Typical sources of identity bloat:

High-cardinality labelSource
pod-template-hashChanges on every Deployment update
controller-revision-hashStatefulSet/DaemonSet rollouts
job-nameJob instance names
batch.kubernetes.io/controller-uidJob controller UID

Configuration: Exclude these labels via extraConfig.labels so they don't participate in Identity calculation:

extraConfig:
labels: "!pod-template-hash !controller-revision-hash !job-name !batch.kubernetes.io/controller-uid"

! means "exclude" (negation) — only the listed labels are excluded, all other labels still contribute to Identity.

Verify the effect:

# Total Identity count
kubectl get ciliumidentities | wc -l

After tuning, the count should drop noticeably over time.

5. Raise Agent/Operator Resources

Why: Default cilium-agent / cilium-operator resource requests/limits are conservative. Large clusters may hit OOMs or CPU throttling, causing policy sync lag and slow Pod network setup.

Recommended configuration (adjust based on observation):

resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi
operator:
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 1000m
memory: 1Gi
How to size the limits

Observe actual cilium-agent and cilium-operator usage:

kubectl -n kube-system top pod -l app.kubernetes.io/part-of=cilium

Set limit ≥ 2× the observed peak, leaving headroom for spikes.

6. Adjust BPF Map Size

Why: cilium stores service / endpoint / policy data in BPF maps. Default sizes are calculated dynamically from node memory (mapDynamicSizeRatio=0.0025, i.e. about 0.25% of total memory). When a single Pod / Service fills up the map, writes fail.

When to adjust:

  • cilium-agent logs show Unable to update element for cilium_lb4_services_v2 or similar BPF map saturation errors
  • Hubble alerts on BPF map utilization approaching 100%

Tuned configuration:

bpf:
mapDynamicSizeRatio: 0.005 # Use 0.5% of node memory (default 0.0025)

Or specify exact map sizes (not recommended unless you have specific needs):

bpf:
lbMapMax: 131072 # LoadBalancer service map (default 65536)
policyMapMax: 32768 # NetworkPolicy map (default 16384)
Memory Cost

Larger BPF maps consume more kernel memory (not counted against container memory limits — they come directly from node memory). Observe before adjusting to avoid OOM-killing the node.

Post-Tuning Observability

After tuning, monitor these signals to confirm impact:

MetricHealthy Baseline
cilium-agent CPU / memory usageWell under limits (keep 50% headroom)
cilium_endpoint_regeneration_time_secondsp99 < 5s
cilium_policy_l7_total / policy compile timeNo visible backlog
apiserver apiserver_request_duration_secondscilium traffic doesn't degrade other components
Total CiliumIdentity countClear downward trend after tuning