rchitecture, native NetworkPolicy support, and L7 visibility.
Step 1: CNI Installation and Configuration
Replace the default CNI with Cilium using Helm. This disables kube-proxy and installs eBPF-based routing, Hubble observability, and identity-based policy enforcement.
# values-cilium.yaml
kubeProxyReplacement: true
k8sServiceHost: <control-plane-host>
k8sServicePort: 6443
hubble:
enabled: true
relay:
enabled: true
ui:
enabled: true
ipam:
mode: kubernetes
bpf:
masquerade: true
tproxy: true
Apply with:
helm install cilium cilium/cilium --version 1.14.0 -f values-cilium.yaml -n kube-system
Step 2: Service Architecture and Routing
Kubernetes Services are virtual IPs backed by endpoint slices. Never route directly to Service IPs from outside the cluster unless using an Ingress controller or cloud load balancer. For internal microservice communication, use ClusterIP with explicit port naming. For stateful workloads requiring stable network identity, use Headless Services (clusterIP: None) to expose Pod IPs directly.
apiVersion: v1
kind: Service
metadata:
name: api-backend
spec:
selector:
app: api-backend
ports:
- name: http
port: 80
targetPort: 8080
protocol: TCP
type: ClusterIP
Step 3: Ingress Routing
Decouple routing logic from service abstraction. Ingress controllers terminate TLS, apply path-based routing, and forward to backend Services. Use Gateway API for modern deployments, or NGINX Ingress for legacy compatibility.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- api.example.com
secretName: api-tls
rules:
- host: api.example.com
http:
paths:
- path: /v1
pathType: Prefix
backend:
service:
name: api-backend
port:
name: http
Step 4: NetworkPolicy Enforcement
Default-deny all ingress/egress traffic, then explicitly allow required flows. Cilium enforces NetworkPolicies at the eBPF layer, eliminating iptables rule bloat.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-backend-policy
spec:
podSelector:
matchLabels:
app: api-backend
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: api-gateway
ports:
- port: 8080
protocol: TCP
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- port: 5432
protocol: TCP
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- port: 53
protocol: UDP
Step 5: DNS and Service Discovery Tuning
CoreDNS must be tuned for high query volumes. Default configurations cache aggressively but lack upstream timeout handling. Adjust forward, cache, and loop plugins to prevent resolution stalls.
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
policy sequential
}
cache 30
loop
reload
loadbalance
}
Architecture Decisions and Rationale
- eBPF over iptables: Eliminates O(N) rule scanning, reduces CPU overhead by ~70%, and enables L7 policy enforcement without sidecars.
- Default-deny NetworkPolicy: Enforces zero-trust networking at the cluster level. Additive policies prevent accidental exposure.
- Ingress decoupling: Separates routing, TLS termination, and rate limiting from service logic, enabling independent scaling and security patching.
- CoreDNS tuning: Prevents resolution bottlenecks during scaling events. Sequential forwarding and connection limits avoid upstream DNS overload.
Pitfall Guide
-
Assuming default CNI covers security requirements
Most default CNIs (Flannel, Calico in BGP mode) lack L7 visibility and enforce policies at the iptables layer. Without explicit NetworkPolicies, all pod-to-pod traffic flows unrestricted. Best practice: Deploy a CNI with eBPF enforcement and apply default-deny policies immediately after cluster bootstrap.
-
Ignoring conntrack table exhaustion
Linux connection tracking maintains state for every TCP/UDP flow. The default nf_conntrack_max is often 65,536, which exhausts under high connection rates, causing silent packet drops. Best practice: Tune net.netfilter.nf_conntrack_max to 1,048,576+ on nodes, monitor /proc/net/nf_conntrack_count, and use eBPF to bypass conntrack for pod-to-pod traffic.
-
Overusing NodePort and LoadBalancer internally
NodePort exposes services on every node IP, breaking network segmentation and causing IP conflicts in multi-cloud environments. LoadBalancer creates external cloud resources for internal traffic, increasing cost and latency. Best practice: Use ClusterIP for internal communication, Ingress for north-south routing, and cloud load balancers only for public endpoints.
-
Treating DNS as infinite and stateless
CoreDNS query rates scale with pod count and service discovery patterns. Unbounded caching, missing TTL controls, and recursive search paths (search: default.svc.cluster.local svc.cluster.local cluster.local) cause resolution delays and stale endpoints. Best practice: Limit search paths, set explicit TTLs, monitor coredns_dns_request_duration_seconds, and use headless services for stateful workloads requiring direct Pod IP resolution.
-
Misunderstanding Service IP vs Pod IP routing
Service IPs are virtual and never assigned to network interfaces. Routing directly to a Service IP from outside the cluster fails because kube-proxy/eBPF only intercepts traffic within the cluster CIDR. Best practice: Always route through Ingress or cloud load balancers for external traffic. Use kubectl get endpoints to verify backend health before debugging routing.
-
Ignoring east-west vs north-south traffic patterns
East-west traffic (pod-to-pod) requires low latency and high throughput. North-south traffic (client-to-cluster) requires TLS termination, rate limiting, and WAF capabilities. Applying the same routing logic to both causes asymmetric routing, certificate mismatch errors, and performance degradation. Best practice: Use eBPF for east-west optimization and dedicated Ingress controllers for north-south termination.
-
Failing to validate traffic flows before deployment
NetworkPolicies and routing rules are declarative but not self-validating. Deploying without connectivity testing leads to silent failures. Best practice: Use kubectl run debug --image=nicolaka/netshoot -it --rm -- bash to simulate traffic, verify DNS resolution, test TCP handshakes, and validate policy enforcement with cilium policy trace.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Small cluster (<50 nodes), low traffic | Calico with iptables | Simpler configuration, lower operational overhead | Low (standard node sizing) |
| High-scale microservices (>10k endpoints) | Cilium with eBPF | Eliminates conntrack bottlenecks, reduces CPU by 70% | Medium (requires eBPF-capable kernels) |
| Multi-cloud hybrid deployment | Cilium + Gateway API | Consistent routing across providers, native L7 visibility | High (cross-cloud data transfer costs) |
| Compliance-heavy (PCI/DSS) | Cilium + default-deny + Hubble | Audit-ready traffic logs, explicit policy enforcement | Medium (observability storage) |
| Legacy application with static IPs | Headless Service + external DNS | Preserves IP stability, avoids service abstraction mismatch | Low (minimal infrastructure change) |
Configuration Template
# default-deny.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: default
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
---
# cilium-values.yaml (production baseline)
kubeProxyReplacement: true
bpf:
masquerade: true
tproxy: true
conntrack:
enabled: true
hubble:
enabled: true
relay:
enabled: true
ui:
enabled: true
ipam:
mode: kubernetes
operator:
replicas: 2
---
# coredns-tuned.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health { lameduck 5s }
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
policy sequential
}
cache 30
loop
reload
loadbalance
}
Quick Start Guide
-
Install Cilium with eBPF and Hubble
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium --version 1.14.0 \
--set kubeProxyReplacement=true \
--set hubble.enabled=true \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
-n kube-system
-
Apply default-deny NetworkPolicy
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny
namespace: default
spec:
podSelector: {}
policyTypes: [Ingress, Egress]
EOF
-
Deploy test workload and service
kubectl create deployment nginx --image=nginx:latest --replicas=2
kubectl expose deployment nginx --port=80 --target-port=80 --name=nginx-svc
-
Validate connectivity and policy enforcement
kubectl run curl --image=nicolaka/netshoot -it --rm --restart=Never -- curl -s http://nginx-svc.default.svc.cluster.local
# Expected: HTML response
# Verify policy: kubectl get networkpolicy
# Check flow: cilium status && hubble observe
-
Tune node networking parameters
sysctl -w net.netfilter.nf_conntrack_max=1048576
sysctl -w net.ipv4.tcp_tw_reuse=1
echo "net.netfilter.nf_conntrack_max=1048576" >> /etc/sysctl.conf
Kubernetes networking is not a configuration task; it is an architecture decision. Treat traffic flow as a first-class design constraint, enforce policies at the kernel layer, and validate routing before scaling. The overhead of upfront networking design pays exponential dividends in stability, observability, and incident response time.