nting a production-grade Kubernetes network requires a deliberate architecture selection and rigorous configuration. This section details the implementation of an eBPF-based data plane with strict network policies, representing the current best practice for high-scale, secure clusters.
Architecture Decisions
- CNI Selection: Cilium is selected for its eBPF data plane, which replaces
kube-proxy, provides native NetworkPolicy enforcement, and offers deep observability via Hubble.
- Routing Strategy: Native routing is preferred where cloud provider IP limits allow. If IP exhaustion is a risk, an overlay (VXLAN) is used, but with eBPF acceleration to minimize overhead.
- Security Model: Zero Trust. Default-deny policies are enforced, with explicit allow rules for required traffic flows.
- Observability: Hubble is deployed to provide flow visibility, DNS monitoring, and security events without packet sampling.
Step-by-Step Implementation
1. Prerequisites and Kernel Verification
eBPF requires a Linux kernel version 5.10+ for full feature support. Verify kernel capabilities:
# Check eBPF support and map sizes
uname -r
cat /proc/sys/kernel/bpf_jit_harden
2. Install Cilium with eBPF Data Plane
Deploy Cilium using Helm, enabling kubeProxyReplacement to offload service routing to eBPF.
# values.yaml
kubeProxyReplacement: true
k8sServiceHost: <control-plane-host>
k8sServicePort: 6443
bpf:
masquerade: true
# Enable BPF-based load balancing
lbExternalIPPool:
cidr: 192.168.100.0/24
# Enable Hubble for observability
hubble:
enabled: true
relay:
enabled: true
ui:
enabled: true
Apply the installation:
helm install cilium cilium/cilium \
--namespace kube-system \
-f values.yaml
3. Verify Data Plane Transition
Confirm that kube-proxy is disabled and eBPF maps are populated.
# Check Cilium status
cilium status
# Verify eBPF maps are active
cilium bpf lb list
4. Implement Zero Trust Network Policies
Create a default-deny policy for the namespace, followed by specific allow rules.
# default-deny.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
# allow-frontend-to-backend.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-backend
namespace: production
spec:
podSelector:
matchLabels:
app: backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
5. Enable L7 Policy Enforcement (eBPF Specific)
Leverage eBPF to enforce HTTP-level policies, restricting access based on path and method.
# l7-policy.yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: l7-frontend-policy
namespace: production
spec:
endpointSelector:
matchLabels:
app: backend
ingress:
- fromEndpoints:
- matchLabels:
app: frontend
toPorts:
- ports:
- port: "8080"
protocol: TCP
rules:
http:
- method: "GET"
path: "/api/v1/data"
Pitfall Guide
Production networking failures often stem from subtle misconfigurations or misunderstandings of the underlying Linux primitives.
-
MTU Mismatch and Fragmentation
- Issue: Overlay networks add headers (VXLAN adds 50 bytes). If the physical MTU is 1500 and pod MTU remains 1500, packets fragment or drop, causing TCP performance degradation.
- Resolution: Calculate effective MTU:
Physical MTU - Encapsulation Header. Configure CNI to set pod MTU accordingly. For VXLAN on 1500 MTU, set pod MTU to 1450. Verify with ping -M do -s 1472 <target>.
-
IPAM Exhaustion
- Issue: Default pod CIDR sizes (e.g., /24) limit nodes to 254 pods. In high-density clusters, this causes scheduling failures.
- Resolution: Align Pod CIDR size with node count and expected pod density. Use
/16 for large clusters or configure IPAM to allocate smaller blocks per node (e.g., /27) to maximize address utilization.
-
Conntrack Table Saturation
- Issue: In
iptables or IPVS modes, high connection rates fill the nf_conntrack table, causing nf_conntrack: table full, dropping packet errors.
- Resolution: Increase
net.netfilter.nf_conntrack_max. However, the superior fix is migrating to eBPF, which bypasses conntrack for service routing, eliminating this bottleneck.
-
DNS Resolution Failures
- Issue: Pods cannot resolve internal services due to CoreDNS resource limits, incorrect
search domains, or network policies blocking UDP port 53.
- Resolution: Ensure NetworkPolicies allow egress to CoreDNS. Tune CoreDNS resources (
resources.limits.memory). Verify ndots configuration; high ndots values cause excessive DNS queries and latency.
-
Service IP Collision
- Issue: Kubernetes Service CIDR overlaps with an external network reachable via the node, causing routing loops or unreachable services.
- Resolution: Audit external routing tables. Ensure
service-cluster-ip-range is disjoint from all external subnets. Use ip route to verify no overlaps exist on worker nodes.
-
Assumption of Policy Enforcement
- Issue: Applying
NetworkPolicy resources without a CNI that supports them results in no enforcement. Flannel, for example, does not enforce policies.
- Resolution: Verify CNI capabilities. Use
kubectl get networkpolicy and test connectivity. Ensure the CNI is actively watching and translating policies into iptables/eBPF rules.
-
NodePort vs. LoadBalancer Confusion
- Issue: Exposing services via NodePort without an external load balancer exposes services to the public internet if security groups are misconfigured.
- Resolution: Use Ingress controllers or Cloud LoadBalancers for external traffic. Restrict NodePort access via cloud provider security groups/firewalls to trusted sources only.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Small Cluster (< 50 nodes), Simple Workloads | Calico or Flannel with iptables | Lower operational complexity; sufficient performance for low density. | Low |
| High Scale (> 500 services), Microservices | Cilium with eBPF | Eliminates kube-proxy overhead; high throughput; L7 policy support. | Medium (Learning curve) |
| Bare Metal, Low Latency Requirements | Calico with BGP | Native routing avoids overlay overhead; direct L3 connectivity. | Medium (BGP config) |
| Cloud Managed (EKS/AKS/GKE) | Cloud Native CNI | Optimized for cloud IPAM; integrates with cloud load balancers/security. | Variable (Cloud pricing) |
| Strict Compliance, Multi-Tenancy | Cilium with Identity-Based Policies | Granular L7 enforcement; identity-based security vs. IP-based. | Medium |
Configuration Template
Cilium Helm Values for Production Hardening:
# production-cilium-values.yaml
kubeProxyReplacement: true
k8sServiceHost: <control-plane-ip>
k8sServicePort: 6443
bpf:
masquerade: true
# Optimize for high throughput
lb:
algorithm: maglev
# Disable conntrack for service routing where possible
conntrack:
enabled: false
# Security: Enable strict default deny
policyEnforceMode: "always"
# Observability
hubble:
enabled: true
relay:
enabled: true
replicas: 2
ui:
enabled: true
metrics:
enabled:
- dns
- drop
- tcp
- flow
- port-distribution
- icmp
- http
# Resource Limits
resources:
limits:
cpu: "1"
memory: "1Gi"
requests:
cpu: "200m"
memory: "256Mi"
# IPAM Configuration
ipam:
mode: "kubernetes"
Quick Start Guide
-
Install Cilium CLI:
curl -L --remote-name-all https://github.com/cilium/cilium-cli/releases/latest/download/cilium-linux-amd64.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-amd64.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
rm cilium-linux-amd64.tar.gz{,.sha256sum}
-
Deploy Cilium:
cilium install --version v1.14.0
-
Verify Installation:
cilium status --wait
# Expected output: All pods running, kube-proxy replaced.
-
Apply Default Deny:
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny
namespace: default
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
EOF
-
Test Connectivity:
cilium connectivity test
# Verifies pod-to-pod, node-to-pod, and policy enforcement.