Kubernetes Production Guide

Overview

Kubernetes (K8s) is an open-source container orchestration platform originally designed by Google and now maintained by the Cloud Native Computing Foundation (CNCF). It automates the deployment, scaling, and management of containerized applications across clusters of machines.

At its core, Kubernetes follows a declarative model: you describe the desired state of your workloads (how many replicas, what image, what resources, what networking), and Kubernetes continuously reconciles the actual state to match. This is fundamentally different from imperative scripting where you tell the system what to do step-by-step.

Architecture

Key concepts

Control Plane — The brain of the cluster. The API Server is the single entry point for all operations. The Scheduler places pods on nodes. The Controller Manager runs reconciliation loops. etcd stores all cluster state.
Worker Nodes — Machines that run your workloads. Each node runs a kubelet (agent that talks to the API server), kube-proxy (networking rules), and a container runtime (containerd, CRI-O).
Pods — The smallest deployable unit. A pod contains one or more containers that share networking and storage. Pods are ephemeral by design.
Deployments — Declarative way to manage ReplicaSets and pods. You specify the desired number of replicas and the update strategy, and the Deployment controller handles the rest.
Services — Stable network endpoints that abstract away pod IPs. Services provide load balancing across pods that match a label selector.
Namespaces — Virtual clusters within a physical cluster. Used for multi-tenancy, environment separation (dev/staging/prod), and resource quota boundaries.

Declarative vs imperative

Declarative

You write YAML manifests that describe the desired state. Kubernetes controllers continuously reconcile actual state to match. If a pod crashes, it gets recreated. If a node dies, pods get rescheduled. This is the production-correct approach.

kubectl apply -f deployment.yaml

Imperative

You issue one-off commands that directly modify cluster state. Useful for debugging and quick experiments, but not suitable for production because changes are not tracked or reproducible.

kubectl create deployment nginx --image=nginx
kubectl scale deployment nginx --replicas=3

Key insight

Kubernetes does not run containers. It orchestrates them. The actual container execution is handled by the container runtime (containerd or CRI-O). Kubernetes manages the lifecycle, scheduling, networking, and storage for those containers. Think of Kubernetes as the operating system for your datacenter — it abstracts away individual machines and lets you treat a cluster as a single compute surface.

Distributions

Kubernetes is a set of components, not a single binary you install. Distributions package those components with opinionated defaults for networking, storage, ingress, and container runtime. The three most common lightweight/edge distributions are MicroK8s, K3s, and RKE2.

Comparison table

Feature	MicroK8s	K3s	RKE2
Maintainer	Canonical	Rancher Labs (SUSE)	Rancher Labs (SUSE)
Packaging	Snap package	Single binary	RPM / tarball
Default CNI	Calico	Flannel	Canal (Flannel + Calico)
Default Ingress	None (addon available)	Traefik	Nginx Ingress Controller
Default Storage	hostpath-storage (addon)	Local-path provisioner	None (manual setup)
Container Runtime	containerd	containerd	containerd
Datastore	Dqlite (default) / etcd	Embedded SQLite (single) / etcd (HA)	Embedded etcd
Security Hardening	Manual	Manual	CIS hardened by default
Best For	Dev, IoT, single-node, Ubuntu	Edge, IoT, resource-constrained	Production, gov, air-gapped
HA Support	Yes (3+ nodes)	Yes (embedded etcd or external DB)	Yes (embedded etcd)
Addon System	Yes (`microk8s enable`)	No (use Helm/manifests)	No (use Helm/manifests)

When to use which

MicroK8s

Developer workstations (especially Ubuntu / WSL)
Single-node clusters for testing
IoT and edge with snap-based infrastructure
Quick enablement of common addons (dns, dashboard, registry, gpu, istio)

K3s

Edge computing and resource-constrained environments
CI/CD pipelines needing a quick cluster
ARM devices (Raspberry Pi)
When you need the smallest possible footprint (~2GB RAM minimum recommended; ~512MB technically possible but impractical for real workloads)

RKE2

Production clusters where security compliance is required (FedRAMP, STIG, CIS)
Government and defense environments
Air-gapped deployments (designed for it)
When you need FIPS-validated cryptography (currently FIPS 140-2; plan for 140-3 transition by Sept 2026)
Rancher-managed multi-cluster environments

Consultant tip

For production workloads that require security hardening, RKE2 is the default recommendation. It ships CIS-hardened out of the box, which saves weeks of manual hardening. For dev/test and edge, K3s is the go-to choice for its simplicity and minimal resource footprint. MicroK8s is best when the client is heavily invested in the Ubuntu/Canonical ecosystem and wants snap-based management.

kubectl & Kubeconfig

kubectl is the primary CLI for interacting with Kubernetes clusters. It communicates with the API server using configuration stored in a kubeconfig file (default: ~/.kube/config).

Kubeconfig structure

A kubeconfig file has three main sections:

apiVersion: v1
kind: Config
clusters:
  - name: production
    cluster:
      server: https://10.0.1.100:6443
      certificate-authority-data: <base64-ca-cert>
users:
  - name: admin
    user:
      client-certificate-data: <base64-client-cert>
      client-key-data: <base64-client-key>
contexts:
  - name: prod-admin
    context:
      cluster: production
      user: admin
      namespace: default
current-context: prod-admin

clusters — Define API server endpoints and CA certificates
users — Define authentication credentials (certs, tokens, OIDC)
contexts — Bind a cluster + user + optional namespace into a named context
current-context — The active context that kubectl uses by default

Merging kubeconfigs

When managing multiple clusters, you can merge kubeconfigs using the KUBECONFIG environment variable:

# Merge multiple kubeconfig files
export KUBECONFIG=~/.kube/config:~/.kube/cluster2.yaml:~/.kube/cluster3.yaml

# Flatten into a single file
kubectl config view --flatten > ~/.kube/merged-config
export KUBECONFIG=~/.kube/merged-config

# Switch between contexts
kubectl config get-contexts
kubectl config use-context prod-admin
kubectl config use-context staging-dev

Common kubectl commands

Command	Purpose
`kubectl get pods -A`	List all pods across all namespaces
`kubectl describe pod <name>`	Detailed info including events
`kubectl logs <pod> -f`	Stream logs from a pod
`kubectl exec -it <pod> -- /bin/sh`	Shell into a running container
`kubectl apply -f manifest.yaml`	Declaratively apply a resource
`kubectl delete -f manifest.yaml`	Delete resources defined in a file
`kubectl get events --sort-by=.lastTimestamp`	View recent cluster events
`kubectl top pods`	Resource usage (requires metrics-server)
`kubectl port-forward svc/myapp 8080:80`	Forward local port to a service
`kubectl drain <node> --ignore-daemonsets`	Safely evict pods before node maintenance

TLS SAN warnings

When connecting to a cluster, you may encounter a certificate error like:

Unable to connect to the server: x509: certificate is valid for 10.0.1.100,
127.0.0.1, not 192.168.1.50

Why this happens: The Kubernetes API server generates a TLS certificate during cluster initialization. That certificate includes a list of Subject Alternative Names (SANs) — the hostnames and IP addresses the certificate is valid for. If you connect to the API server using a hostname or IP that is not in the SAN list, TLS verification fails because the client cannot verify it is talking to the correct server.

This commonly occurs when:

Accessing a cluster from outside the network (the external IP is not in the cert)
Using a load balancer IP or DNS name that was not included at install time
Connecting via a VPN or bastion host with a different IP

Fixing SAN issues per distribution

RKE2 --tls-san flag

Add SANs at install time or in the config file:

# /etc/rancher/rke2/config.yaml
tls-san:
  - "k8s.example.com"
  - "192.168.1.50"
  - "10.0.0.100"

Restart the RKE2 server after modifying. The API server certificate will be regenerated with the new SANs.

K3s --tls-san flag

Pass SANs during install or in the config:

# During install
curl -sfL https://get.k3s.io | \
  sh -s - server \
  --tls-san k8s.example.com \
  --tls-san 192.168.1.50

# Or in /etc/rancher/k3s/config.yaml
tls-san:
  - "k8s.example.com"
  - "192.168.1.50"

MicroK8s CSR config modification

MicroK8s requires editing the CSR configuration template and refreshing certificates:

# Edit the CSR config
sudo nano /var/snap/microk8s/current/certs/csr.conf.template

# Add your SANs under [alt_names]
# IP.3 = 192.168.1.50
# DNS.4 = k8s.example.com

# Refresh the certificates
sudo microk8s refresh-certs --cert server.crt

Workaround: skip TLS verification

Warning

Skipping TLS verification should only be used for debugging, never in production. It disables certificate validation, which means you cannot verify the identity of the API server (man-in-the-middle risk).

# One-off command
kubectl --insecure-skip-tls-verify get nodes

# Set in kubeconfig context permanently
kubectl config set-cluster my-cluster \
  --insecure-skip-tls-verify=true

Ingress & Load Balancing

Ingress is a Kubernetes API object that manages external access to services within a cluster, typically HTTP/HTTPS. It provides URL-based routing, TLS termination, and virtual hosting. An Ingress resource is useless without an Ingress Controller — a pod that reads Ingress objects and configures the underlying proxy (Nginx, Traefik, HAProxy, etc.).

Industry shift

The Gateway API is the official successor to the Ingress API, offering richer routing (header-based, multi-protocol), role-oriented RBAC, and better extensibility. The community Ingress NGINX controller is being retired (March 2026). While the Ingress API itself is not deprecated, new projects should evaluate Gateway API first. All major controllers (Traefik, Cilium, Envoy Gateway, Kong, Istio) support Gateway API.

Ingress controllers

Controller	Pros	Cons	Default In
Nginx Ingress	Mature, widely used, extensive annotations, good docs, supports gRPC via backend-protocol annotation	Config via annotations can get messy; community Ingress NGINX controller is being retired March 2026 — migrate to Gateway API or NGINX's own controller	RKE2
Traefik	Auto-discovery, middlewares, IngressRoute CRD, built-in dashboard, Gateway API support	Less familiar to ops teams, v1 to v2 migration was painful	K3s
HAProxy Ingress	High performance, TCP/UDP support, enterprise support available	Smaller community, fewer examples online	—

Ingress example

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - myapp.example.com
      secretName: myapp-tls
  rules:
    - host: myapp.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: myapp-svc
                port:
                  number: 80

Service type: LoadBalancer

In cloud environments, creating a Service of type LoadBalancer automatically provisions a cloud load balancer (AWS ELB, GCP LB, Azure LB). On bare-metal, there is no cloud API to call, so the Service stays in Pending state forever — unless you install MetalLB.

MetalLB for bare-metal

MetalLB provides LoadBalancer service support for bare-metal clusters. It operates in two modes:

Layer 2 Mode

MetalLB responds to ARP requests for the service IP on the local network. Simple to set up, no router configuration needed. The downside is that all traffic for a given service IP goes through a single node (no true load balancing at the network level).

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: default-pool
  namespace: metallb-system
spec:
  addresses:
    - 192.168.1.200-192.168.1.250
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: default
  namespace: metallb-system

BGP Mode

MetalLB peers with your network router via BGP and announces service IPs as routes. Provides true multi-path load balancing (ECMP). Requires BGP-capable routers and network team coordination.

apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
  name: router
  namespace: metallb-system
spec:
  myASN: 64500
  peerASN: 64501
  peerAddress: 10.0.0.1
---
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
  name: default
  namespace: metallb-system

Practical note

Most on-prem and homelab deployments use MetalLB in Layer 2 mode because it requires zero router configuration. The single-node bottleneck is rarely an issue for small-to-medium clusters. BGP mode is worth the effort when you have a proper network infrastructure with BGP-capable switches (e.g., Cisco, Arista, or even a FRRouting-based software router).

TLS & Certificate Management

cert-manager is the standard way to manage TLS certificates in Kubernetes. It automates the issuance, renewal, and rotation of certificates from various sources including Let's Encrypt, HashiCorp Vault, and self-signed CAs.

Issuer vs ClusterIssuer

Issuer

Namespace-scoped. Can only issue certificates for resources in the same namespace. Use when you want to isolate certificate management per team or environment.

ClusterIssuer

Cluster-scoped. Can issue certificates for any namespace. The most common choice for production because you typically have one certificate authority for the entire cluster.

Let's Encrypt with cert-manager

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@example.com
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
      - http01:
          ingress:
            class: nginx

ACME challenge types

Challenge	How it works	When to use
HTTP-01	cert-manager creates a temporary pod/ingress that serves a token at `/.well-known/acme-challenge/`. Let's Encrypt hits that URL to verify domain ownership.	Standard web-facing services. Requires port 80 to be publicly reachable.
DNS-01	cert-manager creates a TXT record in your DNS zone (e.g., `_acme-challenge.example.com`). Let's Encrypt queries DNS to verify ownership.	Wildcard certificates (`*.example.com`). Works even if the cluster is not publicly accessible. Requires DNS provider API integration (Route53, Cloudflare, etc.).

Using cert-manager with Ingress annotations

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  tls:
    - hosts:
        - myapp.example.com
      secretName: myapp-tls   # cert-manager creates this Secret
  rules:
    - host: myapp.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: myapp
                port:
                  number: 80

When this Ingress is created, cert-manager detects the cert-manager.io/cluster-issuer annotation, requests a certificate from Let's Encrypt, completes the ACME challenge, and stores the resulting certificate in the myapp-tls Secret. The Ingress controller then uses that Secret for TLS termination. Renewal happens automatically before expiry (default: 2/3 through the certificate's duration, which is ~30 days before expiry for standard 90-day Let's Encrypt certificates). You can customize this with spec.renewBefore or spec.renewBeforePercentage.

Self-signed CA

For internal services, air-gapped environments, or development, you can use a self-signed CA:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: selfsigned-issuer
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: internal-ca
  namespace: cert-manager
spec:
  isCA: true
  commonName: internal-ca
  secretName: internal-ca-secret
  issuerRef:
    name: selfsigned-issuer
    kind: ClusterIssuer
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: internal-ca-issuer
spec:
  ca:
    secretName: internal-ca-secret

Best practice

Always use letsencrypt-staging for testing to avoid hitting rate limits. The staging server issues untrusted certificates but has much higher rate limits. Switch to letsencrypt-prod only when you have confirmed the flow works end-to-end.

GitOps

GitOps is a paradigm where Git is the single source of truth for your infrastructure and application state. A GitOps operator watches a Git repository and automatically synchronizes the cluster state to match what is committed. Changes are made through pull requests, which provides an audit trail, code review, and easy rollback (just revert the commit).

How GitOps works

Developer pushes a change to a Git repository (e.g., updates an image tag in a Deployment manifest)
The GitOps operator detects the change (via polling or webhook)
The operator compares the desired state (Git) with the actual state (cluster)
If there is drift, the operator applies the changes to the cluster
Health checks verify the deployment succeeded

ArgoCD vs FluxCD

Feature	ArgoCD	FluxCD
UI	Rich web UI with app visualization, diff view, sync status	No built-in UI (use Weave GitOps or CLI)
Architecture	Centralized server with API	Decentralized controllers (source, kustomize, helm, notification)
CRDs	Application, ApplicationSet, AppProject	GitRepository, Kustomization, HelmRelease, etc.
Multi-cluster	Built-in (register external clusters)	Via Flux on each cluster or Cluster API
Helm support	Native (renders Helm charts as manifests)	Native (HelmRelease CRD)
Kustomize support	Native	Native (first-class citizen)
RBAC	Built-in with SSO integration	Kubernetes-native RBAC
Image automation	Argo CD Image Updater (separate component)	Built-in (image-reflector-controller + image-automation-controller)
Notifications	Built-in (Slack, webhook, etc.)	notification-controller (Slack, Teams, etc.)
Community	CNCF Graduated, very large community	CNCF Graduated, strong but smaller community

When to use which

ArgoCD

Teams that want a visual dashboard for deployments
Multi-cluster management from a single pane of glass
Organizations that need SSO-integrated RBAC for GitOps
When you want to demo deployment state to stakeholders

FluxCD

Teams that prefer CLI-first, no-UI workflows
When you want tighter integration with Kustomize
Automated image updates as a first-class feature
When you want each cluster to be self-contained (no central server)

ArgoCD Application example

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/org/k8s-manifests.git
    targetRevision: main
    path: apps/myapp/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: myapp
  syncPolicy:
    automated:
      prune: true        # Delete resources removed from Git
      selfHeal: true      # Revert manual changes in cluster
    syncOptions:
      - CreateNamespace=true

Consultant tip

For most clients, ArgoCD is the default recommendation because the web UI is a massive operational advantage. Being able to see at a glance which apps are synced, out-of-sync, degraded, or healthy is invaluable. FluxCD is the better choice when the team is deeply CLI-native and does not want to manage the ArgoCD server component.

Helm vs Kustomize

Helm and Kustomize are the two primary tools for managing Kubernetes manifests at scale. They solve overlapping but different problems, and many teams use them together.

Comparison

Aspect	Helm	Kustomize
Approach	Templating (Go templates)	Patching (overlay-based)
Package format	Charts (packaged, versioned, shareable)	Directories of plain YAML
Value injection	values.yaml + --set flags	Patches, JSON merge patches, strategic merge patches
Repository	Helm chart repositories (Artifact Hub)	Git repositories or local directories
Release management	Built-in (helm install/upgrade/rollback)	None (uses kubectl apply)
Learning curve	Higher (Go templates, chart structure, hooks)	Lower (just YAML patching)
3rd-party software	Standard distribution format for OSS	Rarely used by upstream projects
Built into kubectl	No (separate binary)	Yes (`kubectl apply -k`)

Helm basics

# Add a chart repository
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

# Install a chart
helm install my-postgres bitnami/postgresql \
  --namespace databases --create-namespace \
  --values custom-values.yaml

# Upgrade a release
helm upgrade my-postgres bitnami/postgresql \
  --values custom-values.yaml

# List releases
helm list -A

# Rollback
helm rollback my-postgres 1

Kustomize basics

# base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - deployment.yaml
  - service.yaml

# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base
namespace: production
patches:
  - target:
      kind: Deployment
      name: myapp
    patch: |-
      - op: replace
        path: /spec/replicas
        value: 5
images:
  - name: myapp
    newTag: v2.1.0

# Build and apply
kubectl apply -k overlays/production/

# Preview rendered output
kubectl kustomize overlays/production/

Using them together

A common pattern is to use Helm for third-party software (databases, monitoring, ingress controllers) and Kustomize for your own applications. You can also render Helm charts into plain YAML and manage them with Kustomize:

# Render a Helm chart to plain YAML
helm template my-release bitnami/postgresql \
  --values values.yaml > base/postgresql.yaml

# Then manage with Kustomize overlays for env-specific tweaks

Recommendation

Do not fight the ecosystem. Install third-party charts with Helm — it is how they are designed to be consumed. For your own application manifests, Kustomize is often simpler because you avoid the complexity of Go templates and can keep manifests as valid, readable YAML. If using ArgoCD or FluxCD, both support Helm and Kustomize natively.

KubeVirt

KubeVirt is a Kubernetes add-on that allows you to run traditional virtual machines alongside containers on the same cluster. It extends Kubernetes with custom resource definitions (CRDs) for managing VM lifecycle using the same kubectl tooling.

Why it matters

Converged infrastructure — Run VMs and containers side-by-side. No need for separate VMware/Proxmox infrastructure and a separate Kubernetes cluster.
Migration path — Move legacy workloads that cannot be containerized (Windows apps, kernel-dependent software, legacy databases) into the Kubernetes platform without rewriting them.
Unified tooling — Use the same CI/CD pipelines, monitoring, networking, and storage for both VMs and containers.
Harvester — Rancher's Harvester HCI platform is built on KubeVirt, providing a complete hyperconverged infrastructure solution on top of Kubernetes.

Key CRDs

CRD	Purpose
`VirtualMachine`	Persistent VM definition. Survives restarts. Analogous to a Deployment for containers.
`VirtualMachineInstance`	A running VM instance. Analogous to a Pod. Created by the VirtualMachine controller.
`DataVolume`	Declarative way to import VM disk images (from URL, registry, or PVC clone) using CDI (Containerized Data Importer).

Basic VM example

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: ubuntu-vm
spec:
  running: true
  template:
    metadata:
      labels:
        kubevirt.io/vm: ubuntu-vm
    spec:
      domain:
        cpu:
          cores: 2
        memory:
          guest: 4Gi
        devices:
          disks:
            - name: rootdisk
              disk:
                bus: virtio
            - name: cloudinit
              disk:
                bus: virtio
          interfaces:
            - name: default
              masquerade: {}
      networks:
        - name: default
          pod: {}
      volumes:
        - name: rootdisk
          dataVolume:
            name: ubuntu-dv
        - name: cloudinit
          cloudInitNoCloud:
            userData: |
              #cloud-config
              password: changeme
              chpasswd: { expire: false }
---
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: ubuntu-dv
spec:
  source:
    http:
      url: "https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img"
  pvc:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 20Gi

KubeVirt vs traditional virtualization

Aspect	KubeVirt	VMware / Proxmox
Platform	Runs on Kubernetes	Standalone hypervisor
Management	kubectl, GitOps, Kubernetes APIs	vCenter, Proxmox UI, proprietary APIs
Networking	CNI plugins (Calico, Cilium, etc.)	vSphere networking, OVS
Storage	CSI drivers (Longhorn, Ceph, etc.)	VMFS, NFS, vSAN
Container co-location	Native — VMs and containers on same nodes	Separate platform
Maturity	CNCF Incubating, growing rapidly	Decades of production use
Licensing	Apache 2.0 (free)	vSphere is expensive; Proxmox is AGPL (free + paid support)

Consultant tip

KubeVirt is not a VMware replacement for enterprise clients with thousands of VMs and deep VMware integration. It is ideal for organizations that are Kubernetes-first and need to run a handful of VMs alongside their containerized workloads. The sweet spot is running legacy apps, Windows servers, or network appliances as VMs within the same platform that runs the container workloads. Harvester (built on KubeVirt + Longhorn) is worth evaluating for clients who want a full HCI solution without the VMware licensing cost.

Storage

Kubernetes storage is built around three key abstractions: StorageClasses define how storage is provisioned, PersistentVolumes (PVs) represent actual storage resources, and PersistentVolumeClaims (PVCs) are requests for storage by pods. The Container Storage Interface (CSI) is the standard plugin API that connects Kubernetes to storage backends.

Storage flow

Pod | v (references PVC in volumes) PersistentVolumeClaim (PVC) | v (bound to) PersistentVolume (PV) | v (provisioned by) StorageClass --> CSI Driver --> Storage Backend (Longhorn, Ceph, NFS, local-path, cloud disks)

Dynamic provisioning

With dynamic provisioning, you do not need to pre-create PVs. When a PVC is created that references a StorageClass, the CSI driver automatically provisions the underlying storage and creates the PV:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-volume
spec:
  storageClassName: longhorn
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi

Storage solutions comparison

Solution	Type	Replication	Best For
Local-path	Local disk (node-bound)	None	Development, single-node, CI/CD. Default in K3s.
Longhorn	Distributed block storage	Configurable (2-3 replicas)	Production bare-metal clusters. Easy to deploy, built-in backup/restore, Rancher integration.
Ceph / Rook	Distributed (block, file, object)	Configurable	Large-scale production. High performance, mature, but complex to operate.
NFS	Network file system	Depends on backend	Shared storage (ReadWriteMany). Simple but not performant.
Cloud CSI	Cloud disks (EBS, PD, Azure Disk)	Provider-managed	Cloud-hosted clusters. Automatic provisioning.

Access modes

ReadWriteOnce (RWO) — Mounted as read-write by a single node. Most common for databases and stateful apps.
ReadOnlyMany (ROX) — Mounted as read-only by many nodes. Good for shared configuration or static content.
ReadWriteMany (RWX) — Mounted as read-write by many nodes. Required for shared storage across pods. NFS, CephFS, and Longhorn (via built-in NFSv4 share-manager since v1.1) support this.

Recommendation

For on-prem bare-metal clusters, Longhorn is the recommended starting point. It is simple to install (single Helm chart), provides replicated storage with automatic failover, has a built-in UI, supports backups to S3-compatible targets, and integrates natively with Rancher. Rook/Ceph is more powerful but significantly more complex to operate — only use it when you need the scale (100+ TB) or need object storage (S3 API).

Networking

Kubernetes networking follows a flat model: every pod gets its own IP address, and all pods can communicate with each other without NAT. This is implemented by Container Network Interface (CNI) plugins. The choice of CNI affects performance, security policy support, and operational complexity.

CNI plugins

CNI	Mode	Network Policy	Notes
Calico	BGP, VXLAN, IPIP	Full support	Most popular CNI. Excellent Network Policy support. Default in MicroK8s. Can run in eBPF mode for performance.
Flannel	VXLAN, host-gw	None	Simplest CNI. Default in K3s. No Network Policy support — pair with Calico (Canal) if needed.
Canal	Flannel networking + Calico policy	Full support	Combines Flannel's simplicity with Calico's policy engine. Default in RKE2.
Cilium	eBPF-based	Full + L7 policies	Most advanced CNI. eBPF-based dataplane bypasses iptables. L7 visibility and policy (HTTP, gRPC, Kafka). Hubble for observability.

Network Policies

Network Policies are Kubernetes-native firewall rules that control pod-to-pod traffic. By default, all pods can talk to all other pods. Network Policies restrict this based on labels, namespaces, and ports.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-ingress
  namespace: production
spec:
  podSelector: {}    # Applies to all pods in namespace
  policyTypes:
    - Ingress
  ingress: []        # Empty = deny all ingress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: frontend
      ports:
        - port: 8080
          protocol: TCP

Service types

Type	Scope	Use Case
ClusterIP	Internal only	Default. Internal service discovery. Pods within the cluster can reach the service via its DNS name (`svc-name.namespace.svc.cluster.local`).
NodePort	External (via node IP:port)	Exposes the service on a static port (30000-32767) on every node. Simple but not production-grade for web traffic.
LoadBalancer	External (via LB IP)	Provisions an external load balancer (cloud LB or MetalLB on bare-metal). The standard way to expose services externally.
ExternalName	DNS alias	Maps a service to an external DNS name (CNAME). No proxying. Used to reference external services from within the cluster.

DNS (CoreDNS)

CoreDNS runs as a Deployment in the kube-system namespace and provides DNS-based service discovery for the cluster. Every Service gets a DNS entry:

my-service.my-namespace.svc.cluster.local — Full qualified domain name
my-service.my-namespace — Short form (from any namespace)
my-service — Shortest form (from same namespace only)

Consultant tip

If the client needs Network Policies (and they should for any production cluster), ensure the CNI supports them. Flannel alone does not. The easiest path is Canal (Flannel + Calico policy), which is why RKE2 defaults to it. For advanced use cases (L7 policies, observability, service mesh replacement), Cilium is the future, but it requires kernel 5.10+ (as of Cilium 1.19; v1.20 will require 6.1+) and has a steeper learning curve.

Security

Kubernetes security is a broad topic that spans authentication, authorization, workload isolation, secrets management, and supply chain security. The fundamental principle is defense in depth — no single mechanism is sufficient; you need layers.

RBAC (Role-Based Access Control)

RBAC controls who can do what in the cluster. It uses four resource types:

Role — Namespace-scoped permissions (e.g., "can read pods in namespace X")
ClusterRole — Cluster-scoped permissions (e.g., "can read nodes", "can create namespaces")
RoleBinding — Binds a Role to a user/group/ServiceAccount within a namespace
ClusterRoleBinding — Binds a ClusterRole to a user/group/ServiceAccount cluster-wide

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: production
  name: pod-reader
rules:
  - apiGroups: [""]
    resources: ["pods", "pods/log"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: production
  name: read-pods
subjects:
  - kind: User
    name: jane
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

ServiceAccounts

Every pod runs as a ServiceAccount. If not specified, it uses the default ServiceAccount in its namespace. Best practice: create dedicated ServiceAccounts for each workload with only the permissions it needs.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: myapp-sa
  namespace: production
automountServiceAccountToken: false  # Don't mount token unless needed

Pod Security Standards

Pod Security Standards (PSS) replaced the deprecated PodSecurityPolicy (PSP). They are enforced via the built-in Pod Security Admission controller using namespace labels:

Level	Description
Privileged	No restrictions. For system-level workloads (CNI, storage drivers).
Baseline	Prevents known privilege escalations. Allows most workloads. Good starting point.
Restricted	Strict security. Requires non-root, dropped all capabilities (except NET_BIND_SERVICE), seccomp profile, and disallows privilege escalation. Read-only root filesystem is a recommended best practice but not enforced by PSS. Target for production workloads.

# Apply restricted security to a namespace
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Secrets management

Kubernetes Secrets are base64-encoded (not encrypted) by default. For production:

Enable encryption at rest — Configure the API server to encrypt Secrets in etcd using AES-GCM (preferred) or AES-CBC. AES-GCM is faster and provides authenticated encryption.
External secrets management — Use the External Secrets Operator to sync secrets from HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, or GCP Secret Manager
Sealed Secrets — Bitnami's Sealed Secrets controller allows you to store encrypted secrets in Git. Only the controller in the cluster can decrypt them
SOPS + age/GPG — Encrypt secret values in YAML files using Mozilla SOPS. Works well with FluxCD's native SOPS decryption

Image scanning and supply chain

Scan images in CI — Use Trivy, Grype, or Snyk to scan container images during the build pipeline, before they reach the cluster
Admission control — Use a policy engine to block deployment of unscanned or vulnerable images
Image signing — Sign images with Cosign and verify signatures at admission time

OPA / Gatekeeper

Open Policy Agent (OPA) Gatekeeper is an admission controller that enforces custom policies on Kubernetes resources. It uses Rego (a policy language) to define constraints:

Require all images to come from an approved registry
Block containers running as root
Require resource limits on all pods
Enforce label standards across all resources
Prevent use of latest image tag

Non-negotiable

At minimum, every production cluster must have: (1) RBAC enabled and configured (no wildcard ClusterRoleBindings), (2) Network Policies to restrict pod-to-pod traffic, (3) Pod Security Standards at baseline or restricted level, (4) Secrets encrypted at rest, (5) Container images scanned for vulnerabilities. Everything else is defense in depth.

Consultant's Checklist

Use this checklist when assessing, deploying, or auditing a Kubernetes cluster.

Cluster Foundation

Distribution selected (K3s/RKE2/MicroK8s/managed)
HA control plane (3+ control plane nodes)
etcd backup strategy configured and tested
Node OS hardened and patched
Container runtime configured (containerd)
Kubeconfig access controlled and distributed securely
TLS SANs configured for all access paths

Networking

CNI plugin selected and deployed
Network Policies enforced (default deny + allow rules)
Ingress controller deployed and configured
LoadBalancer solution in place (MetalLB for bare-metal)
DNS resolution working (CoreDNS health)
TLS certificates automated (cert-manager)
External DNS configured if needed

Storage

StorageClass configured with dynamic provisioning
Storage backend deployed (Longhorn/Ceph/cloud CSI)
Backup solution for persistent data
Volume snapshot support if needed
Storage capacity monitoring and alerting
Reclaim policy set appropriately (Retain for production)

Security

RBAC configured (no default admin bindings)
Pod Security Standards enforced
Secrets encrypted at rest
External secrets management in place
Image scanning in CI pipeline
Admission controller for policy enforcement
Audit logging enabled
ServiceAccount tokens not auto-mounted

GitOps & Deployment

GitOps operator deployed (ArgoCD or FluxCD)
Git repository structure defined (monorepo vs multi-repo)
Helm charts or Kustomize overlays for all environments
Image update automation configured
Rollback procedure documented and tested
Sync policies configured (auto-sync, prune, self-heal)

Operations

Monitoring stack deployed (Prometheus + Grafana)
Alerting rules configured for critical conditions
Logging aggregation (Loki, EFK, or cloud logging)
Resource requests and limits set on all workloads
Horizontal Pod Autoscaler configured where appropriate
Node upgrade procedure documented (drain, upgrade, uncordon)
Disaster recovery plan documented and tested

Priority order

When building a new cluster from scratch, work through these areas in order: (1) Cluster foundation + HA, (2) Networking + Ingress + TLS, (3) Storage, (4) Security hardening, (5) GitOps setup, (6) Monitoring + alerting. Do not skip ahead — each layer depends on the one before it.

MicroK8s

Snap-based Kubernetes distribution by Canonical

MicroK8s is a lightweight, CNCF-certified Kubernetes distribution maintained by Canonical (the company behind Ubuntu). It is distributed as a snap package, which means installation, upgrades, and channel tracking are managed through the snap ecosystem.

Key characteristics

Snap-based — Single snap install microk8s --classic command. Automatic updates via snap channels. Track specific Kubernetes versions (e.g., 1.32/stable).
Addons system — Enable features with microk8s enable dns dashboard registry gpu istio. This is the killer feature — no Helm charts or manifests needed for common capabilities.
Default CNI — Calico (provides Network Policy support out of the box).
Default storage — hostpath-storage addon (simple local storage for dev/test).
HA — Supported with 3+ nodes using Dqlite (a distributed SQLite implementation) instead of etcd.

Common commands

snap install microk8s --classic --channel=1.32/stable
microk8s status
microk8s enable dns dashboard storage
microk8s kubectl get all -A
microk8s add-node       # Generate join token for HA
microk8s config > ~/.kube/config  # Export kubeconfig

Best for: Developer workstations (especially Ubuntu and WSL2), single-node testing clusters, IoT/edge devices with Ubuntu Core, and environments that want snap-based lifecycle management. Not the typical choice for production server workloads.

K3s

Lightweight Kubernetes by Rancher Labs

K3s is a lightweight, fully CNCF-certified Kubernetes distribution created by Rancher Labs (now SUSE). It packages the entire Kubernetes control plane into a single binary under 100MB. The name is a play on K8s — it is "half the size" (5 letters instead of 8).

Key characteristics

Single binary — One binary contains the API server, scheduler, controller manager, kubelet, and containerd. Minimal dependencies.
Default CNI — Flannel (VXLAN mode). Lightweight but no Network Policy support.
Default ingress — Traefik is deployed automatically as the ingress controller.
Default storage — Local-path provisioner for automatic PVC provisioning on the host filesystem.
Datastore — Embedded SQLite for single-node, embedded etcd for HA (3+ server nodes), or external database (MySQL, PostgreSQL).
Resource footprint — Minimal overhead, but practical minimum is ~2GB RAM per node for meaningful workloads (K3s itself uses ~1.6GB baseline). Ideal for ARM (Raspberry Pi).

Installation

# Server (control plane)
curl -sfL https://get.k3s.io | sh -

# Agent (worker node)
curl -sfL https://get.k3s.io | K3S_URL=https://server:6443 \
  K3S_TOKEN=<token> sh -

# Kubeconfig location
cat /etc/rancher/k3s/k3s.yaml

Best for: Edge computing, IoT, CI/CD test clusters, ARM devices, resource-constrained environments, and any scenario where you need a fully functional Kubernetes cluster with minimal overhead. K3s is also the foundation for K3d (K3s in Docker) for local development.

RKE2

Security-focused Kubernetes by Rancher Labs

RKE2, also known as "RKE Government", is a Kubernetes distribution from Rancher Labs designed for security-sensitive and regulated environments. It combines the ease of K3s with the security hardening required for government and enterprise deployments.

Key characteristics

CIS hardened by default — Passes CIS Kubernetes Benchmark out of the box without manual configuration.
Default CNI — Canal (Flannel for networking + Calico for Network Policy). Provides both connectivity and security.
Default ingress — Nginx Ingress Controller (deployed as a DaemonSet).
Embedded etcd — Uses etcd (not SQLite or Dqlite) for the datastore, even in single-node mode.
FIPS 140-2 compliant — Builds are available with FIPS-validated cryptographic modules. Note: FIPS 140-2 certificates retire September 2026; plan for transition to FIPS 140-3.
No Docker dependency — Uses containerd exclusively. Docker is not supported.
SELinux support — Works with SELinux in enforcing mode.

Installation

# Install server
curl -sfL https://get.rke2.io | sh -
systemctl enable rke2-server
systemctl start rke2-server

# Kubeconfig location
cat /etc/rancher/rke2/rke2.yaml

# Add kubectl to PATH
export PATH=$PATH:/var/lib/rancher/rke2/bin
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml

Best for: Production workloads requiring security compliance (FedRAMP, STIG, CMMC), government and defense environments, air-gapped deployments, FIPS-required environments, and any organization that needs CIS-hardened Kubernetes without manual hardening effort. RKE2 is the default choice for Rancher-managed production clusters.

ArgoCD

Declarative GitOps continuous delivery for Kubernetes

ArgoCD is a CNCF Graduated project that implements the GitOps pattern for Kubernetes. It continuously monitors Git repositories and synchronizes the desired state (defined in Git) with the live state in the cluster. ArgoCD is distinguished by its rich web UI that provides visual application topology, sync status, and diff views.

Core concepts

Application — The primary CRD. Defines a source (Git repo + path or Helm chart) and a destination (cluster + namespace). An Application represents a single deployable unit.
ApplicationSet — Generates multiple Applications from a template. Useful for deploying the same app across many clusters or environments (matrix/list generators).
AppProject — Logical grouping of Applications with RBAC boundaries. Controls which repos, clusters, and namespaces an Application can use.
Sync — The process of applying the desired state from Git to the cluster. Can be manual or automated.
Self-heal — When enabled, ArgoCD automatically reverts manual changes made directly to the cluster (kubectl edits, etc.).

Installation

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# Get initial admin password
argocd admin initial-password -n argocd

# Access the UI
kubectl port-forward svc/argocd-server -n argocd 8080:443

When to choose ArgoCD: When the team values a visual dashboard for deployment management, when you need multi-cluster management from a central point, when stakeholders want to see deployment status without CLI access, and when SSO-integrated RBAC for GitOps is required.

FluxCD

GitOps toolkit for Kubernetes

FluxCD (Flux v2) is a CNCF Graduated project that provides a set of composable controllers for implementing GitOps on Kubernetes. Unlike ArgoCD's centralized server model, Flux follows a decentralized, toolkit-based approach where each capability (source management, Kustomize rendering, Helm releases, notifications, image automation) is handled by a separate controller.

Core controllers

source-controller — Manages Git repositories, Helm repositories, S3-compatible buckets, and OCI artifacts as sources of Kubernetes manifests.
kustomize-controller — Reconciles Kustomization objects. Applies manifests from sources with Kustomize transformations. First-class SOPS decryption support.
helm-controller — Reconciles HelmRelease objects. Installs, upgrades, tests, and rolls back Helm charts.
notification-controller — Sends alerts to Slack, Teams, Discord, webhooks, etc. Also receives webhooks from Git providers for immediate reconciliation.
image-reflector-controller — Scans container registries for new image tags.
image-automation-controller — Automatically updates image tags in Git based on policies (semver, regex, alphabetical).

Bootstrap

# Bootstrap Flux on a cluster with GitHub
flux bootstrap github \
  --owner=my-org \
  --repository=fleet-infra \
  --branch=main \
  --path=clusters/production \
  --personal

When to choose FluxCD: When the team prefers CLI-first workflows with no UI dependency, when you want each cluster to be fully self-contained (no central server), when automated image updates are a primary requirement, and when tight Kustomize + SOPS integration is needed.

Helm

The package manager for Kubernetes

Helm is the standard tool for packaging, distributing, and installing Kubernetes applications. A Helm chart is a collection of Go-templated YAML files, a values.yaml file for configuration, and metadata. Helm provides release management with install, upgrade, rollback, and uninstall lifecycle.

Key concepts

Chart — A package of templated Kubernetes manifests. Published to chart repositories (Artifact Hub, OCI registries).
Release — An installed instance of a chart. You can have multiple releases of the same chart (e.g., postgres-prod and postgres-staging).
Values — Configuration parameters that customize a chart. Provided via values.yaml or --set flags.
Repository — A server (or OCI registry) that hosts chart packages. helm repo add registers a repository.
Hooks — Special resources that run at specific points in the release lifecycle (pre-install, post-upgrade, pre-delete, etc.).

Chart structure

mychart/
  Chart.yaml          # Chart metadata (name, version, dependencies)
  values.yaml         # Default configuration values
  templates/          # Go-templated Kubernetes manifests
    deployment.yaml
    service.yaml
    ingress.yaml
    _helpers.tpl      # Template helper functions
  charts/             # Dependency charts

Practical advice: Use Helm for consuming third-party software (databases, monitoring stacks, ingress controllers) — it is how upstream projects distribute their Kubernetes manifests. For your own applications, consider whether the complexity of Go templates is justified. If you only need environment-specific overrides, Kustomize may be simpler.

Kustomize

Template-free YAML customization for Kubernetes

Kustomize is a tool built into kubectl that lets you customize Kubernetes YAML manifests without templates. Instead of parameterizing YAML with Go template syntax, you write plain, valid YAML and then apply patches and overlays to modify it for different environments.

Key concepts

Base — The original, unmodified manifests. Typically your "default" configuration.
Overlay — A directory that references a base and applies modifications (patches, name prefixes, namespace changes, image tag updates).
Patch — A partial YAML document that modifies specific fields. Supports strategic merge patches (Kubernetes-native) and JSON patches (RFC 6902).
kustomization.yaml — The manifest that defines resources, patches, generators (ConfigMap/Secret generators), transformers, and image overrides.

Directory structure

app/
  base/
    kustomization.yaml
    deployment.yaml
    service.yaml
  overlays/
    dev/
      kustomization.yaml    # References ../../base, applies dev patches
      replica-patch.yaml
    staging/
      kustomization.yaml
    production/
      kustomization.yaml
      replica-patch.yaml
      resource-patch.yaml

Why Kustomize: Your manifests remain valid, readable YAML at all times. There is no template rendering step, no curly braces, no conditionals to debug. The overlay model makes it easy to see exactly what changes between environments. Kustomize is built into kubectl (kubectl apply -k), so no additional tooling is required.

KubeVirt

Run virtual machines on Kubernetes

KubeVirt is a CNCF Incubating project that extends Kubernetes to support running traditional virtual machines alongside containers. It uses KVM/QEMU under the hood and exposes VM management through Kubernetes-native CRDs and the standard Kubernetes API.

How it works

KubeVirt installs a set of controllers and agents (virt-controller, virt-handler, virt-api) into the cluster.
When you create a VirtualMachine CR, the virt-controller creates a VirtualMachineInstance, which is scheduled as a pod.
The pod runs a virt-launcher process that starts a QEMU/KVM virtual machine inside the pod's cgroup and network namespace.
The VM gets a pod IP address and can be accessed via Kubernetes Services and NetworkPolicies just like a container.
Storage is provided via PVCs (backed by any CSI driver), and CDI (Containerized Data Importer) handles importing VM disk images from URLs, registries, or existing PVCs.

Use with Harvester

Rancher's Harvester is a hyperconverged infrastructure (HCI) solution built on Kubernetes + KubeVirt + Longhorn. It provides a complete VM management platform (similar to VMware or Proxmox) but running on Kubernetes. Harvester is the production-ready way to deploy KubeVirt at scale with integrated storage, networking, and a management UI.

Key requirement: KubeVirt requires hardware virtualization support (Intel VT-x or AMD-V). Nodes must have KVM enabled. This means KubeVirt does not work on most cloud VMs (nested virtualization) unless the cloud provider explicitly supports it (e.g., GCE with nested virt enabled, bare-metal instances on AWS/Azure).

MetalLB

Load balancer for bare-metal Kubernetes

MetalLB is a load-balancer implementation for bare-metal Kubernetes clusters. In cloud environments, a Service of type LoadBalancer automatically provisions a cloud load balancer. On bare-metal, there is no such integration, and the Service stays in Pending state. MetalLB fills this gap by assigning external IPs to LoadBalancer Services from a configured pool.

Modes

Layer 2 (ARP/NDP) — MetalLB responds to ARP requests for assigned IPs. A single node becomes the "leader" for each IP and attracts all traffic. Simple to set up (no router config needed), but all traffic for a given service goes through one node (failover happens but is not instantaneous).
BGP — MetalLB peers with network routers via BGP and announces service IPs as routes. Routers can then load-balance traffic across multiple nodes using ECMP (Equal Cost Multi-Path). Requires BGP-capable routers and coordination with the network team.

Installation

# Install via Helm
helm repo add metallb https://metallb.github.io/metallb
helm install metallb metallb/metallb -n metallb-system --create-namespace

# Then create IPAddressPool and L2Advertisement/BGPPeer resources

Practical note: For most on-premises and homelab deployments, Layer 2 mode is sufficient and dramatically simpler than BGP. The single-node bottleneck is rarely a problem for small-to-medium clusters. MetalLB is the de facto standard for bare-metal LoadBalancer services.

cert-manager

Automated TLS certificate management for Kubernetes

cert-manager is a CNCF Graduated project that automates the issuance, renewal, and lifecycle management of TLS certificates in Kubernetes. It supports multiple certificate authorities including Let's Encrypt (ACME), HashiCorp Vault, Venafi, self-signed CAs, and internal PKI.

How it works

You create an Issuer or ClusterIssuer that defines how certificates are obtained (e.g., Let's Encrypt ACME, Vault PKI, self-signed).
You create a Certificate resource (or annotate an Ingress) requesting a certificate for specific DNS names.
cert-manager creates an Order (ACME flow), completes the challenge (HTTP-01 or DNS-01), and stores the resulting certificate in a Kubernetes Secret.
cert-manager monitors certificates and automatically renews them before expiry (default: at 2/3 of the certificate's duration, i.e., ~30 days before expiry for 90-day certs).

Installation

# Install via Helm
helm repo add jetstack https://charts.jetstack.io
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager --create-namespace \
  --set crds.enabled=true

Essential for production: cert-manager eliminates the operational burden of manual certificate management. Without it, teams forget to renew certificates, leading to outages. With cert-manager, certificate lifecycle is fully automated. It should be installed on every production Kubernetes cluster.

Calico

CNI plugin with network policy enforcement

Calico is the most widely deployed CNI (Container Network Interface) plugin for Kubernetes. It provides both networking (pod-to-pod connectivity) and Network Policy enforcement. Calico is maintained by Tigera and is the default CNI in MicroK8s.

Networking modes

BGP (default) — Calico uses BGP to distribute routes between nodes. Each node advertises its pod CIDR to other nodes. No encapsulation overhead. Best performance but requires L2 adjacency or BGP-capable routers.
VXLAN — Encapsulates pod traffic in VXLAN tunnels. Works across any network (L3 routed). Slight overhead due to encapsulation.
IPIP — Similar to VXLAN but uses IP-in-IP encapsulation. Lower overhead than VXLAN but less compatible with some networks.
eBPF dataplane — Bypasses kube-proxy and iptables entirely. Uses eBPF for packet processing. Better performance at scale. Requires Linux 5.3+ minimum (5.8+ recommended for CO-RE support, 6.6+ for all eBPF features).

Network Policy

Calico supports both Kubernetes-standard NetworkPolicy resources and its own extended GlobalNetworkPolicy and NetworkPolicy CRDs (Calico API), which add features like DNS-based policies, application-layer rules, and global (cluster-wide) policies.

When to choose Calico: When you need robust Network Policy support, when you want BGP-based routing (no encapsulation overhead), when operating at scale, or when you need the Calico-specific policy extensions (DNS policies, global policies). Calico is the safe, well-tested default for most clusters.

Flannel

Simple overlay network for Kubernetes

Flannel is one of the simplest CNI plugins for Kubernetes. It provides basic pod-to-pod networking using overlay networks. Flannel is the default CNI in K3s and is part of Canal (Flannel + Calico) in RKE2.

Key characteristics

VXLAN mode (default) — Creates VXLAN tunnels between nodes for pod traffic. Works on any network topology without special configuration.
host-gw mode — Uses host routes instead of tunnels. Better performance but requires L2 adjacency between all nodes (all nodes must be on the same subnet).
No Network Policy support — Flannel does not implement Kubernetes NetworkPolicy. If you need network policies, pair Flannel with Calico (Canal) or switch to a CNI that supports policies natively.
Simplicity — Minimal configuration, minimal resource usage, easy to understand and debug.

Limitation: The biggest limitation of Flannel is the lack of Network Policy support. For development and testing, this is fine. For production, you should either use Canal (Flannel + Calico policy engine, the RKE2 default) or switch to a full-featured CNI like Calico or Cilium.

Longhorn

Cloud-native distributed storage for Kubernetes

Longhorn is a CNCF Incubating project that provides distributed block storage for Kubernetes. It is developed by Rancher Labs (SUSE) and is the default storage recommendation for RKE2 and Harvester environments.

Key features

Distributed replicas — Each volume is replicated across multiple nodes (configurable, default 3). If a node fails, the volume is still accessible from other replicas.
Built-in backup — Supports incremental backups to S3-compatible targets (MinIO, AWS S3, etc.) and NFS.
Snapshot and clone — Volume snapshots and clones for data protection and testing.
UI — Built-in web UI for volume management, node status, and backup configuration.
CSI driver — Standard CSI interface. Works with any Kubernetes distribution.
DR volumes — Disaster recovery volumes that continuously replicate from a primary cluster to a secondary.

Installation

helm repo add longhorn https://charts.longhorn.io
helm install longhorn longhorn/longhorn \
  --namespace longhorn-system --create-namespace

Best for: On-premises and bare-metal Kubernetes clusters that need replicated, production-grade storage without the complexity of Ceph. Longhorn is dramatically simpler to operate than Rook/Ceph. The trade-off is that Longhorn is block storage only (no object storage or CephFS-like shared filesystem), and performance at extreme scale favors Ceph.

Cilium

eBPF-based networking, security, and observability

Cilium is a CNCF Graduated project that provides networking, security, and observability for Kubernetes using eBPF (extended Berkeley Packet Filter). Unlike traditional CNI plugins that rely on iptables for packet processing, Cilium programs the Linux kernel directly via eBPF, which provides better performance and richer functionality.

Key capabilities

eBPF dataplane — Replaces kube-proxy and iptables. Packet processing happens in the kernel via eBPF programs, reducing overhead and improving performance at scale.
L7 Network Policies — In addition to standard L3/L4 network policies, Cilium supports application-layer policies (HTTP methods/paths, gRPC services, Kafka topics, DNS names).
Hubble — Built-in observability platform. Provides network flow visibility, service dependency maps, DNS monitoring, and HTTP request/response metrics. Hubble UI gives a visual service map.
Service mesh — Cilium can replace a sidecar-based service mesh (Istio/Linkerd) by implementing mTLS, traffic management, and L7 policies directly in eBPF without sidecar containers.
Cluster Mesh — Connect multiple Kubernetes clusters with pod-to-pod connectivity and shared services across clusters.
Bandwidth Manager — eBPF-based bandwidth management for fair scheduling and rate limiting.

Requirements

Linux kernel 5.10+ minimum (as of Cilium 1.19; v1.20 targeting 6.1+). Kernel 5.10+ recommended for full feature set.
eBPF support enabled in the kernel
Mounts /sys/fs/bpf filesystem

The future of Kubernetes networking: Cilium is rapidly becoming the default CNI recommendation for new clusters. Its eBPF-based approach eliminates the iptables bottleneck, provides L7 visibility without sidecars, and can replace kube-proxy entirely. The learning curve is steeper than Calico/Flannel, but the operational benefits at scale are significant. If the client's kernel version supports it, Cilium should be the first CNI evaluated.