Kubernetes Production Guide

Container orchestration — distributions, networking, storage, security & operations

01

Overview

Kubernetes (K8s) is an open-source container orchestration platform originally designed by Google and now maintained by the Cloud Native Computing Foundation (CNCF). It automates the deployment, scaling, and management of containerized applications across clusters of machines.

At its core, Kubernetes follows a declarative model: you describe the desired state of your workloads (how many replicas, what image, what resources, what networking), and Kubernetes continuously reconciles the actual state to match. This is fundamentally different from imperative scripting where you tell the system what to do step-by-step.

Architecture

+---------------------------------------------------------------+ | Control Plane | | +----------+ +---------------+ +----------+ +---------+ | | | API | | Scheduler | |Controller| | etcd | | | | Server | | | | Manager | |(key-val)| | | +----+-----+ +-------+-------+ +----+-----+ +----+----+ | | | | | | | +-------+----------------+---------------+--------------+-------+ | v (kubelet communicates with API server) +-------+-----------------------------------------------+ | Worker Nodes | | +--------+ +--------+ +--------+ +--------+ | | |kubelet | |kubelet | |kubelet | |kubelet | | | |kube- | |kube- | |kube- | |kube- | | | |proxy | |proxy | |proxy | |proxy | | | |container| |container| |container| |container| | | |runtime | |runtime | |runtime | |runtime | | | +--------+ +--------+ +--------+ +--------+ | +--------------------------------------------------------+

Key concepts

  • Control Plane — The brain of the cluster. The API Server is the single entry point for all operations. The Scheduler places pods on nodes. The Controller Manager runs reconciliation loops. etcd stores all cluster state.
  • Worker Nodes — Machines that run your workloads. Each node runs a kubelet (agent that talks to the API server), kube-proxy (networking rules), and a container runtime (containerd, CRI-O).
  • Pods — The smallest deployable unit. A pod contains one or more containers that share networking and storage. Pods are ephemeral by design.
  • Deployments — Declarative way to manage ReplicaSets and pods. You specify the desired number of replicas and the update strategy, and the Deployment controller handles the rest.
  • Services — Stable network endpoints that abstract away pod IPs. Services provide load balancing across pods that match a label selector.
  • Namespaces — Virtual clusters within a physical cluster. Used for multi-tenancy, environment separation (dev/staging/prod), and resource quota boundaries.

Declarative vs imperative

Declarative

You write YAML manifests that describe the desired state. Kubernetes controllers continuously reconcile actual state to match. If a pod crashes, it gets recreated. If a node dies, pods get rescheduled. This is the production-correct approach.

kubectl apply -f deployment.yaml

Imperative

You issue one-off commands that directly modify cluster state. Useful for debugging and quick experiments, but not suitable for production because changes are not tracked or reproducible.

kubectl create deployment nginx --image=nginx
kubectl scale deployment nginx --replicas=3
Key insight

Kubernetes does not run containers. It orchestrates them. The actual container execution is handled by the container runtime (containerd or CRI-O). Kubernetes manages the lifecycle, scheduling, networking, and storage for those containers. Think of Kubernetes as the operating system for your datacenter — it abstracts away individual machines and lets you treat a cluster as a single compute surface.

02

Distributions

Kubernetes is a set of components, not a single binary you install. Distributions package those components with opinionated defaults for networking, storage, ingress, and container runtime. The three most common lightweight/edge distributions are MicroK8s, K3s, and RKE2.

Comparison table

FeatureMicroK8sK3sRKE2
MaintainerCanonicalRancher Labs (SUSE)Rancher Labs (SUSE)
PackagingSnap packageSingle binaryRPM / tarball
Default CNICalicoFlannelCanal (Flannel + Calico)
Default IngressNone (addon available)TraefikNginx Ingress Controller
Default Storagehostpath-storage (addon)Local-path provisionerNone (manual setup)
Container Runtimecontainerdcontainerdcontainerd
DatastoreDqlite (default) / etcdEmbedded SQLite (single) / etcd (HA)Embedded etcd
Security HardeningManualManualCIS hardened by default
Best ForDev, IoT, single-node, UbuntuEdge, IoT, resource-constrainedProduction, gov, air-gapped
HA SupportYes (3+ nodes)Yes (embedded etcd or external DB)Yes (embedded etcd)
Addon SystemYes (microk8s enable)No (use Helm/manifests)No (use Helm/manifests)

When to use which

MicroK8s

  • Developer workstations (especially Ubuntu / WSL)
  • Single-node clusters for testing
  • IoT and edge with snap-based infrastructure
  • Quick enablement of common addons (dns, dashboard, registry, gpu, istio)

K3s

  • Edge computing and resource-constrained environments
  • CI/CD pipelines needing a quick cluster
  • ARM devices (Raspberry Pi)
  • When you need the smallest possible footprint (~2GB RAM minimum recommended; ~512MB technically possible but impractical for real workloads)

RKE2

  • Production clusters where security compliance is required (FedRAMP, STIG, CIS)
  • Government and defense environments
  • Air-gapped deployments (designed for it)
  • When you need FIPS-validated cryptography (currently FIPS 140-2; plan for 140-3 transition by Sept 2026)
  • Rancher-managed multi-cluster environments
Consultant tip

For production workloads that require security hardening, RKE2 is the default recommendation. It ships CIS-hardened out of the box, which saves weeks of manual hardening. For dev/test and edge, K3s is the go-to choice for its simplicity and minimal resource footprint. MicroK8s is best when the client is heavily invested in the Ubuntu/Canonical ecosystem and wants snap-based management.

03

kubectl & Kubeconfig

kubectl is the primary CLI for interacting with Kubernetes clusters. It communicates with the API server using configuration stored in a kubeconfig file (default: ~/.kube/config).

Kubeconfig structure

A kubeconfig file has three main sections:

apiVersion: v1
kind: Config
clusters:
  - name: production
    cluster:
      server: https://10.0.1.100:6443
      certificate-authority-data: <base64-ca-cert>
users:
  - name: admin
    user:
      client-certificate-data: <base64-client-cert>
      client-key-data: <base64-client-key>
contexts:
  - name: prod-admin
    context:
      cluster: production
      user: admin
      namespace: default
current-context: prod-admin
  • clusters — Define API server endpoints and CA certificates
  • users — Define authentication credentials (certs, tokens, OIDC)
  • contexts — Bind a cluster + user + optional namespace into a named context
  • current-context — The active context that kubectl uses by default

Merging kubeconfigs

When managing multiple clusters, you can merge kubeconfigs using the KUBECONFIG environment variable:

# Merge multiple kubeconfig files
export KUBECONFIG=~/.kube/config:~/.kube/cluster2.yaml:~/.kube/cluster3.yaml

# Flatten into a single file
kubectl config view --flatten > ~/.kube/merged-config
export KUBECONFIG=~/.kube/merged-config

# Switch between contexts
kubectl config get-contexts
kubectl config use-context prod-admin
kubectl config use-context staging-dev

Common kubectl commands

CommandPurpose
kubectl get pods -AList all pods across all namespaces
kubectl describe pod <name>Detailed info including events
kubectl logs <pod> -fStream logs from a pod
kubectl exec -it <pod> -- /bin/shShell into a running container
kubectl apply -f manifest.yamlDeclaratively apply a resource
kubectl delete -f manifest.yamlDelete resources defined in a file
kubectl get events --sort-by=.lastTimestampView recent cluster events
kubectl top podsResource usage (requires metrics-server)
kubectl port-forward svc/myapp 8080:80Forward local port to a service
kubectl drain <node> --ignore-daemonsetsSafely evict pods before node maintenance

TLS SAN warnings

When connecting to a cluster, you may encounter a certificate error like:

Unable to connect to the server: x509: certificate is valid for 10.0.1.100,
127.0.0.1, not 192.168.1.50

Why this happens: The Kubernetes API server generates a TLS certificate during cluster initialization. That certificate includes a list of Subject Alternative Names (SANs) — the hostnames and IP addresses the certificate is valid for. If you connect to the API server using a hostname or IP that is not in the SAN list, TLS verification fails because the client cannot verify it is talking to the correct server.

This commonly occurs when:

  • Accessing a cluster from outside the network (the external IP is not in the cert)
  • Using a load balancer IP or DNS name that was not included at install time
  • Connecting via a VPN or bastion host with a different IP

Fixing SAN issues per distribution

RKE2 --tls-san flag

Add SANs at install time or in the config file:

# /etc/rancher/rke2/config.yaml
tls-san:
  - "k8s.example.com"
  - "192.168.1.50"
  - "10.0.0.100"

Restart the RKE2 server after modifying. The API server certificate will be regenerated with the new SANs.

K3s --tls-san flag

Pass SANs during install or in the config:

# During install
curl -sfL https://get.k3s.io | \
  sh -s - server \
  --tls-san k8s.example.com \
  --tls-san 192.168.1.50

# Or in /etc/rancher/k3s/config.yaml
tls-san:
  - "k8s.example.com"
  - "192.168.1.50"

MicroK8s CSR config modification

MicroK8s requires editing the CSR configuration template and refreshing certificates:

# Edit the CSR config
sudo nano /var/snap/microk8s/current/certs/csr.conf.template

# Add your SANs under [alt_names]
# IP.3 = 192.168.1.50
# DNS.4 = k8s.example.com

# Refresh the certificates
sudo microk8s refresh-certs --cert server.crt

Workaround: skip TLS verification

Warning

Skipping TLS verification should only be used for debugging, never in production. It disables certificate validation, which means you cannot verify the identity of the API server (man-in-the-middle risk).

# One-off command
kubectl --insecure-skip-tls-verify get nodes

# Set in kubeconfig context permanently
kubectl config set-cluster my-cluster \
  --insecure-skip-tls-verify=true
04

Ingress & Load Balancing

Ingress is a Kubernetes API object that manages external access to services within a cluster, typically HTTP/HTTPS. It provides URL-based routing, TLS termination, and virtual hosting. An Ingress resource is useless without an Ingress Controller — a pod that reads Ingress objects and configures the underlying proxy (Nginx, Traefik, HAProxy, etc.).

Industry shift

The Gateway API is the official successor to the Ingress API, offering richer routing (header-based, multi-protocol), role-oriented RBAC, and better extensibility. The community Ingress NGINX controller is being retired (March 2026). While the Ingress API itself is not deprecated, new projects should evaluate Gateway API first. All major controllers (Traefik, Cilium, Envoy Gateway, Kong, Istio) support Gateway API.

Ingress controllers

ControllerProsConsDefault In
Nginx IngressMature, widely used, extensive annotations, good docs, supports gRPC via backend-protocol annotationConfig via annotations can get messy; community Ingress NGINX controller is being retired March 2026 — migrate to Gateway API or NGINX's own controllerRKE2
TraefikAuto-discovery, middlewares, IngressRoute CRD, built-in dashboard, Gateway API supportLess familiar to ops teams, v1 to v2 migration was painfulK3s
HAProxy IngressHigh performance, TCP/UDP support, enterprise support availableSmaller community, fewer examples online

Ingress example

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - myapp.example.com
      secretName: myapp-tls
  rules:
    - host: myapp.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: myapp-svc
                port:
                  number: 80

Service type: LoadBalancer

In cloud environments, creating a Service of type LoadBalancer automatically provisions a cloud load balancer (AWS ELB, GCP LB, Azure LB). On bare-metal, there is no cloud API to call, so the Service stays in Pending state forever — unless you install MetalLB.

MetalLB for bare-metal

MetalLB provides LoadBalancer service support for bare-metal clusters. It operates in two modes:

Layer 2 Mode

MetalLB responds to ARP requests for the service IP on the local network. Simple to set up, no router configuration needed. The downside is that all traffic for a given service IP goes through a single node (no true load balancing at the network level).

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: default-pool
  namespace: metallb-system
spec:
  addresses:
    - 192.168.1.200-192.168.1.250
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: default
  namespace: metallb-system

BGP Mode

MetalLB peers with your network router via BGP and announces service IPs as routes. Provides true multi-path load balancing (ECMP). Requires BGP-capable routers and network team coordination.

apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
  name: router
  namespace: metallb-system
spec:
  myASN: 64500
  peerASN: 64501
  peerAddress: 10.0.0.1
---
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
  name: default
  namespace: metallb-system
Practical note

Most on-prem and homelab deployments use MetalLB in Layer 2 mode because it requires zero router configuration. The single-node bottleneck is rarely an issue for small-to-medium clusters. BGP mode is worth the effort when you have a proper network infrastructure with BGP-capable switches (e.g., Cisco, Arista, or even a FRRouting-based software router).

05

TLS & Certificate Management

cert-manager is the standard way to manage TLS certificates in Kubernetes. It automates the issuance, renewal, and rotation of certificates from various sources including Let's Encrypt, HashiCorp Vault, and self-signed CAs.

Issuer vs ClusterIssuer

Issuer

Namespace-scoped. Can only issue certificates for resources in the same namespace. Use when you want to isolate certificate management per team or environment.

ClusterIssuer

Cluster-scoped. Can issue certificates for any namespace. The most common choice for production because you typically have one certificate authority for the entire cluster.

Let's Encrypt with cert-manager

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@example.com
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
      - http01:
          ingress:
            class: nginx

ACME challenge types

ChallengeHow it worksWhen to use
HTTP-01cert-manager creates a temporary pod/ingress that serves a token at /.well-known/acme-challenge/. Let's Encrypt hits that URL to verify domain ownership.Standard web-facing services. Requires port 80 to be publicly reachable.
DNS-01cert-manager creates a TXT record in your DNS zone (e.g., _acme-challenge.example.com). Let's Encrypt queries DNS to verify ownership.Wildcard certificates (*.example.com). Works even if the cluster is not publicly accessible. Requires DNS provider API integration (Route53, Cloudflare, etc.).

Using cert-manager with Ingress annotations

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  tls:
    - hosts:
        - myapp.example.com
      secretName: myapp-tls   # cert-manager creates this Secret
  rules:
    - host: myapp.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: myapp
                port:
                  number: 80

When this Ingress is created, cert-manager detects the cert-manager.io/cluster-issuer annotation, requests a certificate from Let's Encrypt, completes the ACME challenge, and stores the resulting certificate in the myapp-tls Secret. The Ingress controller then uses that Secret for TLS termination. Renewal happens automatically before expiry (default: 2/3 through the certificate's duration, which is ~30 days before expiry for standard 90-day Let's Encrypt certificates). You can customize this with spec.renewBefore or spec.renewBeforePercentage.

Self-signed CA

For internal services, air-gapped environments, or development, you can use a self-signed CA:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: selfsigned-issuer
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: internal-ca
  namespace: cert-manager
spec:
  isCA: true
  commonName: internal-ca
  secretName: internal-ca-secret
  issuerRef:
    name: selfsigned-issuer
    kind: ClusterIssuer
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: internal-ca-issuer
spec:
  ca:
    secretName: internal-ca-secret
Best practice

Always use letsencrypt-staging for testing to avoid hitting rate limits. The staging server issues untrusted certificates but has much higher rate limits. Switch to letsencrypt-prod only when you have confirmed the flow works end-to-end.

06

GitOps

GitOps is a paradigm where Git is the single source of truth for your infrastructure and application state. A GitOps operator watches a Git repository and automatically synchronizes the cluster state to match what is committed. Changes are made through pull requests, which provides an audit trail, code review, and easy rollback (just revert the commit).

How GitOps works

  1. Developer pushes a change to a Git repository (e.g., updates an image tag in a Deployment manifest)
  2. The GitOps operator detects the change (via polling or webhook)
  3. The operator compares the desired state (Git) with the actual state (cluster)
  4. If there is drift, the operator applies the changes to the cluster
  5. Health checks verify the deployment succeeded

ArgoCD vs FluxCD

FeatureArgoCDFluxCD
UIRich web UI with app visualization, diff view, sync statusNo built-in UI (use Weave GitOps or CLI)
ArchitectureCentralized server with APIDecentralized controllers (source, kustomize, helm, notification)
CRDsApplication, ApplicationSet, AppProjectGitRepository, Kustomization, HelmRelease, etc.
Multi-clusterBuilt-in (register external clusters)Via Flux on each cluster or Cluster API
Helm supportNative (renders Helm charts as manifests)Native (HelmRelease CRD)
Kustomize supportNativeNative (first-class citizen)
RBACBuilt-in with SSO integrationKubernetes-native RBAC
Image automationArgo CD Image Updater (separate component)Built-in (image-reflector-controller + image-automation-controller)
NotificationsBuilt-in (Slack, webhook, etc.)notification-controller (Slack, Teams, etc.)
CommunityCNCF Graduated, very large communityCNCF Graduated, strong but smaller community

When to use which

ArgoCD

  • Teams that want a visual dashboard for deployments
  • Multi-cluster management from a single pane of glass
  • Organizations that need SSO-integrated RBAC for GitOps
  • When you want to demo deployment state to stakeholders

FluxCD

  • Teams that prefer CLI-first, no-UI workflows
  • When you want tighter integration with Kustomize
  • Automated image updates as a first-class feature
  • When you want each cluster to be self-contained (no central server)

ArgoCD Application example

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/org/k8s-manifests.git
    targetRevision: main
    path: apps/myapp/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: myapp
  syncPolicy:
    automated:
      prune: true        # Delete resources removed from Git
      selfHeal: true      # Revert manual changes in cluster
    syncOptions:
      - CreateNamespace=true
Consultant tip

For most clients, ArgoCD is the default recommendation because the web UI is a massive operational advantage. Being able to see at a glance which apps are synced, out-of-sync, degraded, or healthy is invaluable. FluxCD is the better choice when the team is deeply CLI-native and does not want to manage the ArgoCD server component.

07

Helm vs Kustomize

Helm and Kustomize are the two primary tools for managing Kubernetes manifests at scale. They solve overlapping but different problems, and many teams use them together.

Comparison

AspectHelmKustomize
ApproachTemplating (Go templates)Patching (overlay-based)
Package formatCharts (packaged, versioned, shareable)Directories of plain YAML
Value injectionvalues.yaml + --set flagsPatches, JSON merge patches, strategic merge patches
RepositoryHelm chart repositories (Artifact Hub)Git repositories or local directories
Release managementBuilt-in (helm install/upgrade/rollback)None (uses kubectl apply)
Learning curveHigher (Go templates, chart structure, hooks)Lower (just YAML patching)
3rd-party softwareStandard distribution format for OSSRarely used by upstream projects
Built into kubectlNo (separate binary)Yes (kubectl apply -k)

Helm basics

# Add a chart repository
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

# Install a chart
helm install my-postgres bitnami/postgresql \
  --namespace databases --create-namespace \
  --values custom-values.yaml

# Upgrade a release
helm upgrade my-postgres bitnami/postgresql \
  --values custom-values.yaml

# List releases
helm list -A

# Rollback
helm rollback my-postgres 1

Kustomize basics

# base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - deployment.yaml
  - service.yaml

# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base
namespace: production
patches:
  - target:
      kind: Deployment
      name: myapp
    patch: |-
      - op: replace
        path: /spec/replicas
        value: 5
images:
  - name: myapp
    newTag: v2.1.0
# Build and apply
kubectl apply -k overlays/production/

# Preview rendered output
kubectl kustomize overlays/production/

Using them together

A common pattern is to use Helm for third-party software (databases, monitoring, ingress controllers) and Kustomize for your own applications. You can also render Helm charts into plain YAML and manage them with Kustomize:

# Render a Helm chart to plain YAML
helm template my-release bitnami/postgresql \
  --values values.yaml > base/postgresql.yaml

# Then manage with Kustomize overlays for env-specific tweaks
Recommendation

Do not fight the ecosystem. Install third-party charts with Helm — it is how they are designed to be consumed. For your own application manifests, Kustomize is often simpler because you avoid the complexity of Go templates and can keep manifests as valid, readable YAML. If using ArgoCD or FluxCD, both support Helm and Kustomize natively.

08

KubeVirt

KubeVirt is a Kubernetes add-on that allows you to run traditional virtual machines alongside containers on the same cluster. It extends Kubernetes with custom resource definitions (CRDs) for managing VM lifecycle using the same kubectl tooling.

Why it matters

  • Converged infrastructure — Run VMs and containers side-by-side. No need for separate VMware/Proxmox infrastructure and a separate Kubernetes cluster.
  • Migration path — Move legacy workloads that cannot be containerized (Windows apps, kernel-dependent software, legacy databases) into the Kubernetes platform without rewriting them.
  • Unified tooling — Use the same CI/CD pipelines, monitoring, networking, and storage for both VMs and containers.
  • Harvester — Rancher's Harvester HCI platform is built on KubeVirt, providing a complete hyperconverged infrastructure solution on top of Kubernetes.

Key CRDs

CRDPurpose
VirtualMachinePersistent VM definition. Survives restarts. Analogous to a Deployment for containers.
VirtualMachineInstanceA running VM instance. Analogous to a Pod. Created by the VirtualMachine controller.
DataVolumeDeclarative way to import VM disk images (from URL, registry, or PVC clone) using CDI (Containerized Data Importer).

Basic VM example

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: ubuntu-vm
spec:
  running: true
  template:
    metadata:
      labels:
        kubevirt.io/vm: ubuntu-vm
    spec:
      domain:
        cpu:
          cores: 2
        memory:
          guest: 4Gi
        devices:
          disks:
            - name: rootdisk
              disk:
                bus: virtio
            - name: cloudinit
              disk:
                bus: virtio
          interfaces:
            - name: default
              masquerade: {}
      networks:
        - name: default
          pod: {}
      volumes:
        - name: rootdisk
          dataVolume:
            name: ubuntu-dv
        - name: cloudinit
          cloudInitNoCloud:
            userData: |
              #cloud-config
              password: changeme
              chpasswd: { expire: false }
---
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: ubuntu-dv
spec:
  source:
    http:
      url: "https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img"
  pvc:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 20Gi

KubeVirt vs traditional virtualization

AspectKubeVirtVMware / Proxmox
PlatformRuns on KubernetesStandalone hypervisor
Managementkubectl, GitOps, Kubernetes APIsvCenter, Proxmox UI, proprietary APIs
NetworkingCNI plugins (Calico, Cilium, etc.)vSphere networking, OVS
StorageCSI drivers (Longhorn, Ceph, etc.)VMFS, NFS, vSAN
Container co-locationNative — VMs and containers on same nodesSeparate platform
MaturityCNCF Incubating, growing rapidlyDecades of production use
LicensingApache 2.0 (free)vSphere is expensive; Proxmox is AGPL (free + paid support)
Consultant tip

KubeVirt is not a VMware replacement for enterprise clients with thousands of VMs and deep VMware integration. It is ideal for organizations that are Kubernetes-first and need to run a handful of VMs alongside their containerized workloads. The sweet spot is running legacy apps, Windows servers, or network appliances as VMs within the same platform that runs the container workloads. Harvester (built on KubeVirt + Longhorn) is worth evaluating for clients who want a full HCI solution without the VMware licensing cost.

09

Storage

Kubernetes storage is built around three key abstractions: StorageClasses define how storage is provisioned, PersistentVolumes (PVs) represent actual storage resources, and PersistentVolumeClaims (PVCs) are requests for storage by pods. The Container Storage Interface (CSI) is the standard plugin API that connects Kubernetes to storage backends.

Storage flow

Pod | v (references PVC in volumes) PersistentVolumeClaim (PVC) | v (bound to) PersistentVolume (PV) | v (provisioned by) StorageClass --> CSI Driver --> Storage Backend (Longhorn, Ceph, NFS, local-path, cloud disks)

Dynamic provisioning

With dynamic provisioning, you do not need to pre-create PVs. When a PVC is created that references a StorageClass, the CSI driver automatically provisions the underlying storage and creates the PV:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-volume
spec:
  storageClassName: longhorn
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi

Storage solutions comparison

SolutionTypeReplicationBest For
Local-pathLocal disk (node-bound)NoneDevelopment, single-node, CI/CD. Default in K3s.
LonghornDistributed block storageConfigurable (2-3 replicas)Production bare-metal clusters. Easy to deploy, built-in backup/restore, Rancher integration.
Ceph / RookDistributed (block, file, object)ConfigurableLarge-scale production. High performance, mature, but complex to operate.
NFSNetwork file systemDepends on backendShared storage (ReadWriteMany). Simple but not performant.
Cloud CSICloud disks (EBS, PD, Azure Disk)Provider-managedCloud-hosted clusters. Automatic provisioning.

Access modes

  • ReadWriteOnce (RWO) — Mounted as read-write by a single node. Most common for databases and stateful apps.
  • ReadOnlyMany (ROX) — Mounted as read-only by many nodes. Good for shared configuration or static content.
  • ReadWriteMany (RWX) — Mounted as read-write by many nodes. Required for shared storage across pods. NFS, CephFS, and Longhorn (via built-in NFSv4 share-manager since v1.1) support this.
Recommendation

For on-prem bare-metal clusters, Longhorn is the recommended starting point. It is simple to install (single Helm chart), provides replicated storage with automatic failover, has a built-in UI, supports backups to S3-compatible targets, and integrates natively with Rancher. Rook/Ceph is more powerful but significantly more complex to operate — only use it when you need the scale (100+ TB) or need object storage (S3 API).

10

Networking

Kubernetes networking follows a flat model: every pod gets its own IP address, and all pods can communicate with each other without NAT. This is implemented by Container Network Interface (CNI) plugins. The choice of CNI affects performance, security policy support, and operational complexity.

CNI plugins

CNIModeNetwork PolicyNotes
CalicoBGP, VXLAN, IPIPFull supportMost popular CNI. Excellent Network Policy support. Default in MicroK8s. Can run in eBPF mode for performance.
FlannelVXLAN, host-gwNoneSimplest CNI. Default in K3s. No Network Policy support — pair with Calico (Canal) if needed.
CanalFlannel networking + Calico policyFull supportCombines Flannel's simplicity with Calico's policy engine. Default in RKE2.
CiliumeBPF-basedFull + L7 policiesMost advanced CNI. eBPF-based dataplane bypasses iptables. L7 visibility and policy (HTTP, gRPC, Kafka). Hubble for observability.

Network Policies

Network Policies are Kubernetes-native firewall rules that control pod-to-pod traffic. By default, all pods can talk to all other pods. Network Policies restrict this based on labels, namespaces, and ports.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-ingress
  namespace: production
spec:
  podSelector: {}    # Applies to all pods in namespace
  policyTypes:
    - Ingress
  ingress: []        # Empty = deny all ingress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: frontend
      ports:
        - port: 8080
          protocol: TCP

Service types

TypeScopeUse Case
ClusterIPInternal onlyDefault. Internal service discovery. Pods within the cluster can reach the service via its DNS name (svc-name.namespace.svc.cluster.local).
NodePortExternal (via node IP:port)Exposes the service on a static port (30000-32767) on every node. Simple but not production-grade for web traffic.
LoadBalancerExternal (via LB IP)Provisions an external load balancer (cloud LB or MetalLB on bare-metal). The standard way to expose services externally.
ExternalNameDNS aliasMaps a service to an external DNS name (CNAME). No proxying. Used to reference external services from within the cluster.

DNS (CoreDNS)

CoreDNS runs as a Deployment in the kube-system namespace and provides DNS-based service discovery for the cluster. Every Service gets a DNS entry:

  • my-service.my-namespace.svc.cluster.local — Full qualified domain name
  • my-service.my-namespace — Short form (from any namespace)
  • my-service — Shortest form (from same namespace only)
Consultant tip

If the client needs Network Policies (and they should for any production cluster), ensure the CNI supports them. Flannel alone does not. The easiest path is Canal (Flannel + Calico policy), which is why RKE2 defaults to it. For advanced use cases (L7 policies, observability, service mesh replacement), Cilium is the future, but it requires kernel 5.10+ (as of Cilium 1.19; v1.20 will require 6.1+) and has a steeper learning curve.

11

Security

Kubernetes security is a broad topic that spans authentication, authorization, workload isolation, secrets management, and supply chain security. The fundamental principle is defense in depth — no single mechanism is sufficient; you need layers.

RBAC (Role-Based Access Control)

RBAC controls who can do what in the cluster. It uses four resource types:

  • Role — Namespace-scoped permissions (e.g., "can read pods in namespace X")
  • ClusterRole — Cluster-scoped permissions (e.g., "can read nodes", "can create namespaces")
  • RoleBinding — Binds a Role to a user/group/ServiceAccount within a namespace
  • ClusterRoleBinding — Binds a ClusterRole to a user/group/ServiceAccount cluster-wide
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: production
  name: pod-reader
rules:
  - apiGroups: [""]
    resources: ["pods", "pods/log"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: production
  name: read-pods
subjects:
  - kind: User
    name: jane
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

ServiceAccounts

Every pod runs as a ServiceAccount. If not specified, it uses the default ServiceAccount in its namespace. Best practice: create dedicated ServiceAccounts for each workload with only the permissions it needs.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: myapp-sa
  namespace: production
automountServiceAccountToken: false  # Don't mount token unless needed

Pod Security Standards

Pod Security Standards (PSS) replaced the deprecated PodSecurityPolicy (PSP). They are enforced via the built-in Pod Security Admission controller using namespace labels:

LevelDescription
PrivilegedNo restrictions. For system-level workloads (CNI, storage drivers).
BaselinePrevents known privilege escalations. Allows most workloads. Good starting point.
RestrictedStrict security. Requires non-root, dropped all capabilities (except NET_BIND_SERVICE), seccomp profile, and disallows privilege escalation. Read-only root filesystem is a recommended best practice but not enforced by PSS. Target for production workloads.
# Apply restricted security to a namespace
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Secrets management

Kubernetes Secrets are base64-encoded (not encrypted) by default. For production:

  • Enable encryption at rest — Configure the API server to encrypt Secrets in etcd using AES-GCM (preferred) or AES-CBC. AES-GCM is faster and provides authenticated encryption.
  • External secrets management — Use the External Secrets Operator to sync secrets from HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, or GCP Secret Manager
  • Sealed Secrets — Bitnami's Sealed Secrets controller allows you to store encrypted secrets in Git. Only the controller in the cluster can decrypt them
  • SOPS + age/GPG — Encrypt secret values in YAML files using Mozilla SOPS. Works well with FluxCD's native SOPS decryption

Image scanning and supply chain

  • Scan images in CI — Use Trivy, Grype, or Snyk to scan container images during the build pipeline, before they reach the cluster
  • Admission control — Use a policy engine to block deployment of unscanned or vulnerable images
  • Image signing — Sign images with Cosign and verify signatures at admission time

OPA / Gatekeeper

Open Policy Agent (OPA) Gatekeeper is an admission controller that enforces custom policies on Kubernetes resources. It uses Rego (a policy language) to define constraints:

  • Require all images to come from an approved registry
  • Block containers running as root
  • Require resource limits on all pods
  • Enforce label standards across all resources
  • Prevent use of latest image tag
Non-negotiable

At minimum, every production cluster must have: (1) RBAC enabled and configured (no wildcard ClusterRoleBindings), (2) Network Policies to restrict pod-to-pod traffic, (3) Pod Security Standards at baseline or restricted level, (4) Secrets encrypted at rest, (5) Container images scanned for vulnerabilities. Everything else is defense in depth.

12

Consultant's Checklist

Use this checklist when assessing, deploying, or auditing a Kubernetes cluster.

Cluster Foundation

  • Distribution selected (K3s/RKE2/MicroK8s/managed)
  • HA control plane (3+ control plane nodes)
  • etcd backup strategy configured and tested
  • Node OS hardened and patched
  • Container runtime configured (containerd)
  • Kubeconfig access controlled and distributed securely
  • TLS SANs configured for all access paths

Networking

  • CNI plugin selected and deployed
  • Network Policies enforced (default deny + allow rules)
  • Ingress controller deployed and configured
  • LoadBalancer solution in place (MetalLB for bare-metal)
  • DNS resolution working (CoreDNS health)
  • TLS certificates automated (cert-manager)
  • External DNS configured if needed

Storage

  • StorageClass configured with dynamic provisioning
  • Storage backend deployed (Longhorn/Ceph/cloud CSI)
  • Backup solution for persistent data
  • Volume snapshot support if needed
  • Storage capacity monitoring and alerting
  • Reclaim policy set appropriately (Retain for production)

Security

  • RBAC configured (no default admin bindings)
  • Pod Security Standards enforced
  • Secrets encrypted at rest
  • External secrets management in place
  • Image scanning in CI pipeline
  • Admission controller for policy enforcement
  • Audit logging enabled
  • ServiceAccount tokens not auto-mounted

GitOps & Deployment

  • GitOps operator deployed (ArgoCD or FluxCD)
  • Git repository structure defined (monorepo vs multi-repo)
  • Helm charts or Kustomize overlays for all environments
  • Image update automation configured
  • Rollback procedure documented and tested
  • Sync policies configured (auto-sync, prune, self-heal)

Operations

  • Monitoring stack deployed (Prometheus + Grafana)
  • Alerting rules configured for critical conditions
  • Logging aggregation (Loki, EFK, or cloud logging)
  • Resource requests and limits set on all workloads
  • Horizontal Pod Autoscaler configured where appropriate
  • Node upgrade procedure documented (drain, upgrade, uncordon)
  • Disaster recovery plan documented and tested
Priority order

When building a new cluster from scratch, work through these areas in order: (1) Cluster foundation + HA, (2) Networking + Ingress + TLS, (3) Storage, (4) Security hardening, (5) GitOps setup, (6) Monitoring + alerting. Do not skip ahead — each layer depends on the one before it.