Proxmox VE Production Architecture

Customer deployment reference — clustering, storage, networking, HA, backups & operations

01

Overview

Proxmox Virtual Environment (PVE) is an open-source server virtualization platform built on Debian Linux. It combines KVM for full virtualization and LXC for lightweight containers, managed through a web UI and REST API. It competes with VMware vSphere, Microsoft Hyper-V, and Nutanix AHV.

Since VMware's acquisition by Broadcom (2023) and subsequent licensing changes, Proxmox has become a serious contender for customers looking to exit VMware. The migration wave is real — many engagements now are VMware-to-Proxmox transitions.

Strengths

  • Truly open source (AGPL v3) — no feature gating
  • Integrated clustering, HA, live migration, backup
  • Ceph storage integration built-in
  • Web UI that covers 95% of operations
  • REST API for automation
  • No per-CPU or per-VM licensing

Weaknesses

  • No equivalent to vMotion DRS (automatic load balancing)
  • Ecosystem is smaller — fewer third-party integrations
  • Enterprise support is good but not VMware/Microsoft tier
  • No native NSX-equivalent SDN (basic SDN exists)
  • GPU passthrough works but is less polished than VMware
  • Windows guest tooling less mature than VMware Tools
Positioning

Proxmox is not a 1:1 VMware replacement. It's a different philosophy — Linux-native, CLI-friendly, built on standard open-source components (KVM, LXC, Ceph, ZFS, Corosync). For customers who are comfortable with Linux, it's arguably better. For customers who expect a Windows-centric, GUI-everything experience, set expectations early.

02

Architecture

Each Proxmox node is a standalone Debian server that can join a cluster. Understanding the component stack matters for troubleshooting and capacity planning.

+------------------------------------------------------+ | Proxmox Web UI | | (port 8006, HTTPS) | +-----------------------+------------------------------+ | +-----------------------v------------------------------+ | Proxmox API (pveproxy) | | REST API + authentication + ACLs | +--+-------------+--------------+--------------+-------+ | | | | +--v---+ +----v----+ +----v----+ +-----v-----+ | QEMU | | LXC | | Ceph | | Corosync | | /KVM | | | | Client | | + pmxcfs | | | | | | | | (cluster) | +--+---+ +----+----+ +----+----+ +-----+-----+ | | | | +--v------------v--------------v--------------v-------+ | Debian Linux | | (kernel, networking, storage, ZFS) | +-----------------------------------------------------+
ComponentRoleNotes
QEMU/KVMFull virtualizationHardware-accelerated VMs. Supports live migration, snapshots, CPU pinning.
LXCOS-level containersLightweight, shared kernel. Not Docker — full OS containers. Great for services that don't need a full VM.
CorosyncCluster communicationHandles cluster membership, quorum, and node heartbeats. Totem protocol over UDP.
pmxcfsCluster filesystemFUSE filesystem backed by a SQLite DB replicated via Corosync. Stores cluster config (VMs, storage, users, ACLs).
pveproxyAPI & web UIHTTPS reverse proxy on port 8006. Handles authentication, serves the web UI, exposes the REST API.
pvedaemonNode managementLocal daemon for VM/container operations, storage management, task execution.
CephDistributed storageOptional. Built-in Ceph deployment for hyper-converged storage. OSD, MON, MDS, MGR.
ZFSLocal storageOptional. Advanced filesystem with snapshots, compression, checksums, replication.
Open vSwitchVirtual networkingOptional. Software-defined networking with VLANs, bonds, and SDN zones.
03

Clustering

A Proxmox cluster is a group of nodes managed as a single entity. Clustering enables live migration, HA, shared configuration, and centralized management. Minimum 3 nodes for production.

Creating a cluster

# On the first node:
pvecm create my-cluster

# On subsequent nodes:
pvecm add 10.0.0.1    # IP of an existing cluster node

# Verify cluster status
pvecm status
pvecm nodes

Quorum

Proxmox uses Corosync's voting system for quorum. A cluster needs a majority of votes to operate:

NodesQuorum RequiresTolerate Failures
22 (both must be up)0 — never do this without a QDevice
321
431
532
Two-Node Clusters

A 2-node cluster has no fault tolerance by default — losing one node loses quorum, and the surviving node won't start HA services. Fix this with a QDevice (Corosync Quorum Device) — a lightweight third-party witness running on a small VM or Raspberry Pi that provides the tiebreaker vote.

# Set up QDevice (on a separate machine):
apt install corosync-qdevice corosync-qnetd

# On a cluster node:
pvecm qdevice setup 10.0.0.100    # IP of the QDevice host

# Verify
pvecm status

Corosync network

  • Since PVE 6.0+, Corosync 3 uses Kronosnet (knet) for transport, which is unicast only. Multicast was used in Corosync 2.x (PVE 5.x and earlier) and is no longer supported.
  • Dedicate a separate NIC/VLAN for cluster traffic (Corosync + Ceph). Don't share with VM traffic.
  • Configure redundant links (knet supports up to 8 separate network links) for Corosync. If the cluster network fails, you lose quorum and all HA stops.
  • Latency between nodes must be < 2ms. Proxmox clusters cannot span WANs or high-latency links.
# /etc/pve/corosync.conf (managed by pvecm, don't edit directly)
# Verify link status:
pvecm status
# Check for link errors:
corosync-cfgtool -s
Cluster Breakup

Removing a node from a cluster is destructive. All VMs/CTs on that node must be migrated first. The node is wiped of cluster config and must be reinstalled to join a different cluster. Plan cluster membership carefully — it's not something you casually change.

04

Storage

Storage architecture is the most consequential decision in a Proxmox deployment. It affects performance, HA capability, backup speed, and operational complexity.

Hyper-Converged Ceph

Distributed storage built into Proxmox. Each node contributes disks to a shared pool. VMs can run on any node and access their storage over the network. Enables live migration and HA.

  • Pros: No external storage needed, scales linearly, self-healing
  • Cons: Needs 3+ nodes, dedicated network, CPU/RAM overhead, complex to tune
  • Best for: 3+ node clusters needing shared storage without a SAN

Local ZFS

Advanced local filesystem. Snapshots, compression, checksums, send/receive replication. Best local storage option for Proxmox.

  • Pros: Excellent data integrity, fast snapshots, built-in compression
  • Cons: Local only (no live migration without Ceph/NFS), RAM-hungry (1 GB ARC per 1 TB of storage is a common rule of thumb; must limit ARC on VM hosts)
  • Best for: Single nodes, or combined with Ceph (ZFS for local, Ceph for shared)

External NFS / iSCSI / FC

Traditional shared storage from a NAS/SAN. NFS is simplest. iSCSI and Fibre Channel for higher performance.

  • Pros: Well-understood, existing investment, enables live migration
  • Cons: Single point of failure (unless HA SAN), separate infrastructure to manage
  • Best for: Customers with existing SAN/NAS infrastructure

Simple LVM / LVM-Thin / Directory

Basic local storage. LVM-Thin supports thin provisioning and snapshots. Directory storage uses the filesystem directly (ext4/xfs).

  • Pros: Zero overhead, simple, fast
  • Cons: No checksums, limited snapshots, no replication
  • Best for: Dev/test, ephemeral workloads, boot drives

Ceph deployment

Proxmox has a built-in Ceph installer — you don't need to deploy Ceph separately:

# Install Ceph on each node (from the Proxmox UI or CLI):
pveceph install

# Create monitors (one per node, need 3+ for quorum):
pveceph mon create

# Create managers:
pveceph mgr create

# Create OSDs (one per disk):
pveceph osd create /dev/sdb
pveceph osd create /dev/sdc

# Create a storage pool:
pveceph pool create vm-storage --pg_autoscale_mode on

# Pool is now available as a storage backend in Proxmox
Ceph Networking

Ceph needs a dedicated network with at least 10 Gbps between nodes. 1 Gbps will work for small deployments but becomes a bottleneck quickly. For production, use 25 Gbps. Separate the Ceph public network (client access) from the Ceph cluster network (OSD replication) for best performance.

Ceph sizing rules of thumb

  • Minimum 3 nodes with at least 2 OSDs each
  • Don't fill beyond 70-80% — Ceph performance degrades and recovery becomes dangerous above 80%
  • RAM: BlueStore's default osd_memory_target is 4 GB per OSD (1 GB for HDD-backed, 3 GB for SSD-backed cache by default) + 1 GB per monitor + the OS baseline. A node with 8 NVMe OSDs needs ~28 GB just for Ceph.
  • CPU: 1 core per OSD for HDD, 2+ cores per OSD for NVMe (NVMe saturates CPU faster)
  • Journal/WAL: Use a fast NVMe for the OSD WAL/DB if your OSDs are SATA SSDs or HDDs. This dramatically improves write latency.
  • Replication: Default is 3x (3 copies). For NVMe-only clusters, consider erasure coding for better space efficiency on cold data.

ZFS configuration

# Create a mirrored ZFS pool (recommended over RAIDZ for VMs):
zpool create -f rpool mirror /dev/sda /dev/sdb

# Enable compression (always):
zfs set compression=lz4 rpool

# Set ARC (cache) limits to leave RAM for VMs:
# In /etc/modprobe.d/zfs.conf:
options zfs zfs_arc_max=8589934592    # 8GB max ARC

# Add as Proxmox storage:
pvesm add zfspool local-zfs -pool rpool/data -content images,rootdir
05

Networking

Proxmox networking is Linux networking. If you understand bridges, bonds, VLANs, and routing on Linux, you understand Proxmox networking. There's no proprietary abstraction layer.

Network architecture for production

A production node should have at minimum 3 network segments:

Management Proxmox UI / API / SSH

The management network carries web UI, API, SSH, and Corosync cluster traffic. Dedicated NIC or VLAN. This is your control plane — if it goes down, you can't manage the cluster.

VM Traffic Guest Networks

VLANs for VM/CT traffic. Trunk the VLANs to the Proxmox bridge and assign VLAN tags per VM NIC. Use LACP bonds for bandwidth and redundancy.

Storage Ceph / iSCSI / NFS

Dedicated high-bandwidth network for storage traffic. 10/25 Gbps minimum. Jumbo frames (MTU 9000) recommended for Ceph. This must be low-latency and reliable.

Optional Live Migration

Separate network for VM memory transfer during live migration. Shares with storage network in smaller deployments. Dedicated in large ones to avoid migration storms saturating storage I/O.

Bridge and bond configuration

# /etc/network/interfaces (typical production node)

# Management bond (LACP)
auto bond0
iface bond0 inet manual
    bond-slaves eno1 eno2
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer3+4

# Management bridge
auto vmbr0
iface vmbr0 inet static
    address 10.0.0.10/24
    gateway 10.0.0.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0

# Storage/Ceph bond (LACP, jumbo frames)
auto bond1
iface bond1 inet manual
    bond-slaves ens1f0 ens1f1
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer3+4
    mtu 9000

# Storage bridge (no gateway - isolated network)
auto vmbr1
iface vmbr1 inet static
    address 10.10.0.10/24
    bridge-ports bond1
    bridge-stp off
    bridge-fd 0
    mtu 9000

# VM traffic bridge (VLAN-aware)
auto vmbr2
iface vmbr2 inet manual
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 100-200

SDN (Software-Defined Networking)

Proxmox includes an SDN module for managing VNets, zones, and subnets across the cluster. It supports VLAN, VXLAN, and EVPN zones with BGP-based routing and fabric automation. PVE 8+ improved SDN significantly with DHCP integration and subnet management. It's functional for multi-tenancy but not as feature-rich as NSX or Cilium.

  • Use SDN if you need to define networks centrally and have them auto-configured on all nodes
  • Skip SDN if you're comfortable managing bridges/VLANs in /etc/network/interfaces directly — it's more transparent and easier to debug
06

VMs & Containers

KVM virtual machines

Full hardware virtualization. Each VM gets its own kernel, full OS, emulated or paravirtualized hardware. Use for:

  • Windows guests
  • Workloads that need kernel modules or specific kernel versions
  • Security isolation (separate kernel per workload)
  • Anything that needs GPU passthrough, USB passthrough, or specific hardware emulation

VM best practices

# Create a VM with virtio devices (best performance):
qm create 100 \
  --name my-vm \
  --memory 4096 \
  --cores 4 \
  --scsihw virtio-scsi-single \
  --scsi0 local-zfs:32,iothread=1 \
  --net0 virtio,bridge=vmbr0,tag=100 \
  --ostype l26 \
  --boot order=scsi0 \
  --agent enabled=1
  • Always use VirtIO for disk and network — dramatically faster than IDE/E1000 emulation
  • Enable QEMU Guest Agent (--agent enabled=1) — required for proper shutdown, freeze/thaw for backups, IP reporting
  • Use virtio-scsi-single with iothread=1 per disk for best I/O performance
  • CPU type: Use host for maximum performance (exposes real CPU features). Use x86-64-v2-AES or similar if you need live migration between different CPU generations.
  • Ballooning: Enabled by default. Allows the VM to return unused RAM to the host. Disable for latency-sensitive workloads (databases, real-time).

LXC containers

OS-level containers sharing the host kernel. Not Docker — these are full OS containers (think systemd, SSH, the full userspace). Use for:

  • Linux-only services that don't need a custom kernel
  • Lightweight infrastructure services (DNS, monitoring agents, web servers)
  • Dev/test environments
  • Anything where VM overhead is unnecessary

Privileged vs. Unprivileged

Unprivileged (default, recommended): Container UIDs are mapped to a high range on the host. Root inside the container is not root on the host. Much safer.

Privileged: Container root = host root (mapped 1:1). Required for some operations (NFS mounts, certain device access). Use sparingly and only when unprivileged doesn't work.

Resource Limits

Set CPU, RAM, and I/O limits per container. Unlike VMs, containers share the host kernel and scheduler — one runaway container can affect others without proper limits.

pct set 200 -memory 2048
pct set 200 -cores 2
pct set 200 -swap 512
LXC Limitations

LXC containers can run Docker with features: nesting=1,keyctl=1 on unprivileged containers, and this works well for many workloads. However, not all Docker images or complex stacks are guaranteed to work due to kernel namespace and AppArmor constraints. For maximum compatibility and isolation, a VM with Docker inside remains the safest choice for production. PVE 9.1+ also added native OCI container support, allowing you to pull and run OCI images directly without Docker or a full VM.

07

High Availability

Proxmox HA automatically restarts VMs/CTs on another node if a node fails. It requires a cluster with quorum and shared storage (Ceph, NFS, iSCSI).

How HA works

  1. The HA manager (pve-ha-lrm + pve-ha-crm) runs on each node
  2. Nodes are monitored via Corosync heartbeats
  3. If a node is fenced (declared dead after missing heartbeats), the cluster requests HA resources be restarted elsewhere
  4. The CRM (Cluster Resource Manager) picks a target node and starts the VM/CT
  5. The VM boots fresh on the new node — this is not live migration, it's a cold restart
HA is Not Live Migration

HA restarts VMs after a node failure. The VM is down during the failover (typically 1-5 minutes). Live migration (zero-downtime) is a manual or planned operation, not part of HA. Don't promise customers "zero downtime HA" with Proxmox — that's not what it does.

Fencing

Fencing is how the cluster ensures a failed node is truly dead before restarting its VMs elsewhere. Without proper fencing, you risk split-brain — two copies of the same VM running simultaneously, corrupting data.

  • Default (watchdog): Linux software watchdog. If the HA manager can't communicate with the cluster, the watchdog reboots the local node. This is the default and works for most deployments.
  • Hardware fencing (IPMI/iLO/iDRAC): The cluster powers off the failed node via out-of-band management. More reliable but requires BMC network access between nodes.
  • STONITH: "Shoot The Other Node In The Head" — same concept, different name. Configure via the ha-manager fencing options.

HA groups & resource configuration

# Add a VM to HA management:
ha-manager add vm:100

# Set priority (higher = preferred node):
ha-manager set vm:100 --group my-group

# Create an HA group (restrict which nodes can run this VM):
ha-manager groupadd my-group --nodes node1,node2 --nofailback 1

# List HA resources:
ha-manager status
  • nofailback: When the original node comes back, don't automatically migrate the VM back. Set this to avoid unnecessary migrations and potential disruption.
  • max_restart: Maximum restart attempts before giving up (default: 1). Increase for flaky workloads, keep at 1 for workloads where repeated restarts could cause data corruption.
  • max_relocate: Maximum times to try a different node (default: 1).
08

Backups

Built-in backup (vzdump)

Proxmox includes vzdump for VM and container backups. Three modes:

ModeDowntimeConsistencyUse Case
SnapshotNone (live)Crash-consistent (app-consistent with QEMU agent)Production VMs — the default choice
SuspendBrief (seconds-minutes)Memory state savedWhen snapshot mode doesn't work
StopFull (VM is stopped)Clean shutdown, fully consistentMaintenance windows, critical databases
Recommendation

Use snapshot mode with the QEMU Guest Agent enabled. The agent triggers fsfreeze inside the guest before the snapshot, making it application-consistent for most workloads (equivalent to taking a snapshot of a cleanly-paused filesystem). Without the agent, you get crash-consistent backups — fine for most Linux workloads, risky for databases.

Proxmox Backup Server (PBS)

Dedicated backup appliance from Proxmox. Strongly recommended over storing backups on local/NFS storage:

  • Deduplication: Client-side dedup with fixed-size chunks (for VM disk images) and variable-size chunks (for file archives, using a rolling hash for better dedup ratios). Second backup of a 100 GB VM that changed 1 GB only transfers ~1 GB.
  • Incremental forever: Every backup after the first is incremental. No periodic full backups needed.
  • Encryption: Client-side AES-256-GCM. The PBS server never sees plaintext data.
  • Verification: Scheduled verify jobs that check backup integrity without restoring.
  • Garbage collection: Automatic cleanup of unreferenced chunks.
  • Sync & offsite: Native sync to a remote PBS for offsite copies.
# Schedule backups in Proxmox UI: Datacenter → Backup → Add
# Or via CLI:
vzdump 100 --storage pbs-backup --mode snapshot --compress zstd

# Backup all VMs on a node:
vzdump --all --storage pbs-backup --mode snapshot --compress zstd

# Note: --mailnotification and --mailto are deprecated in PVE 8+.
# Use the notification system instead: Datacenter → Notifications
# to configure targets, matchers, and notification policies.

Backup strategy

  • Daily backups of all VMs/CTs to PBS (snapshot mode, off-hours)
  • Retention: 7 daily, 4 weekly, 3 monthly minimum. PBS handles retention policies natively.
  • Offsite: Sync PBS to a remote PBS or push to S3-compatible storage. The 3-2-1 rule applies: 3 copies, 2 media types, 1 offsite.
  • Test restores quarterly — restore a VM to a temporary name and verify it boots and works.
  • Backup the Proxmox config itself: /etc/pve/ contains cluster config. It's small — back it up separately.
Don't Forget

Back up /etc/pve/ (cluster config, VM configs, user database, ACLs, storage definitions). It's not included in VM backups. Losing this means you can recreate VMs from backups but not the cluster configuration, users, permissions, or HA settings.

09

Upgrades

Proxmox follows Debian releases. Major version upgrades coincide with the underlying Debian upgrade (PVE 7/Bullseye → PVE 8/Bookworm → PVE 9/Trixie). PVE 9.0 was released August 2025 on Debian 13 "Trixie" with kernel 6.14, QEMU 10.0, Ceph Squid 19.2, and ZFS 2.3. Minor updates are regular apt upgrades.

Minor updates (within a version)

# Standard apt upgrade, one node at a time:
apt update
apt dist-upgrade

# Reboot if kernel was updated:
# Check: running kernel vs. installed kernel
uname -r
ls /boot/vmlinuz-* | tail -1
  • Upgrade one node at a time in a cluster
  • Migrate or shut down VMs on the node before rebooting (or rely on HA for automatic failover)
  • Verify the node rejoins the cluster after reboot: pvecm status
  • Wait for Ceph to rebalance (if using Ceph) before upgrading the next node: ceph status should show HEALTH_OK

Major version upgrades

Major upgrades are in-place Debian upgrades. Proxmox provides a checklist tool:

# Run the pre-upgrade checklist:
pve8to9 --full    # (or pve7to8 for older upgrades)

# This checks for:
# - Unsupported packages
# - Deprecated configurations
# - Ceph version compatibility
# - Kernel version
# - Repository configuration
Major Upgrade Strategy

Major upgrades are not reversible (Debian doesn't support downgrades). Take a full backup of the node (ideally a bare-metal backup or at minimum /etc/ and /var/lib/pve-cluster/) before starting. Upgrade one node at a time. If it fails catastrophically, reinstall from scratch and rejoin the cluster. VMs on shared storage are unaffected.

Ceph upgrades

If running Ceph, it has its own upgrade path that must be coordinated with the PVE upgrade:

  • Ceph upgrades are version-locked to the PVE major version (PVE 9 ships Ceph Squid 19.x, PVE 8 shipped Ceph Quincy/Reef/Squid, PVE 7 shipped Ceph Pacific/Quincy)
  • Upgrade Ceph monitors first, then OSDs, then MDS (if using CephFS)
  • Set noout flag before rebooting OSD nodes to prevent unnecessary rebalancing: ceph osd set noout
  • Unset after upgrade: ceph osd unset noout
10

Monitoring

Proxmox has basic built-in monitoring (web UI graphs) but production deployments need external monitoring.

What to monitor

MetricAlert ThresholdWhy
Cluster quorumvotes < expectedQuorum loss = HA stops, no management operations
Node CPU> 85% sustainedVMs will compete for cycles, latency increases
Node RAM> 90% (incl. ZFS ARC)OOM killer will start killing VMs
Storage usage> 80% (Ceph: > 70%)Ceph degrades severely above 80%, near-full OSD = cluster emergency
Ceph health!= HEALTH_OKDegraded = reduced redundancy, one more failure could lose data
Ceph OSD latencycommit_latency_ms > 20Slow disk or overloaded OSD
ZFS pool health!= ONLINEDegraded pool = running on reduced redundancy
Disk SMARTAny reallocated sectorsEarly warning for disk failure
Network bondDegraded (lost a link)Running without redundancy
Backup statusFailed or staleNo backup = no recovery

Monitoring stack

  • Prometheus + PVE Exporter: The prometheus-pve-exporter scrapes the Proxmox API and exposes metrics. Community-maintained, works well.
  • Ceph built-in: ceph status, ceph health detail, Ceph Manager's Prometheus module (ceph mgr module enable prometheus)
  • SMART monitoring: smartmontools + smartd on every node. Alert on any SMART errors.
  • Node Exporter: Standard Prometheus node_exporter for OS-level metrics (CPU, RAM, disk I/O, network)
# Enable Ceph's Prometheus module:
ceph mgr module enable prometheus
# Scrape at http://ceph-mgr-node:9283/metrics

# Install PVE exporter (on a monitoring host):
pip install prometheus-pve-exporter
# Config: point at https://pve-node:8006 with API token
11

Security Hardening

Proxmox runs as root on bare metal. The hypervisor is the highest-privilege layer in the stack — if it's compromised, every VM is compromised.

Access Web UI & API

  • Restrict port 8006 to management network only (firewall or bind address)
  • Use API tokens instead of username/password for automation
  • Enable 2FA (TOTP) for all admin accounts
  • Disable root login; create named admin accounts with appropriate roles

SSH Hardening

  • Key-only authentication (disable password auth)
  • Restrict SSH to management network
  • Use fail2ban for brute-force protection
  • Disable root SSH if using sudo-capable admin accounts

Network Isolation

  • Management, storage, and VM traffic on separate networks/VLANs
  • Corosync traffic never on an untrusted network
  • Proxmox built-in firewall for VM-level rules
  • No VMs should be able to reach the management network

Updates Patching

  • Subscribe to Proxmox security advisories
  • Patch monthly at minimum, critical CVEs immediately
  • Kernel updates require reboot — schedule maintenance windows
  • Don't skip Debian security updates (it's a full Debian system)

RBAC & permissions

Proxmox has a granular permission system with users, groups, roles, and path-based ACLs:

# Create a user (Proxmox realm):
pveum user add admin@pve --comment "Node admin"

# Create a role with specific privileges:
pveum role add VMOperator -privs "VM.Audit,VM.Console,VM.PowerMgmt"

# Assign role on a path:
pveum acl modify /vms/100 --users admin@pve --roles VMOperator

# API tokens (for automation):
pveum user token add admin@pve automation --privsep 1
# privsep=1 means the token gets its own permissions, not the user's
  • Use LDAP/AD integration for user authentication in enterprise environments
  • Map AD groups to Proxmox groups, then assign roles to groups
  • Use API tokens with privsep for Terraform, Ansible, and other automation
12

Licensing & Support

Proxmox VE is fully open source (AGPL v3). Every feature works without a subscription. The subscription buys you access to the enterprise repository and support. Pricing is per physical CPU socket per year (not per core).

TierCost (per CPU socket/year)What You Get
No subscriptionFreeFull software, no-subscription repo (slightly less tested packages), community forum support only
Community€115Enterprise repo access, community-based support (no professional tickets)
Basic€355Enterprise repo, 3 support tickets/year (next business day response)
Standard€530Enterprise repo, 10 support tickets/year (4-hour response during business hours)
Premium€1,060Enterprise repo, unlimited tickets (2-hour response, business day around-the-clock)

Enterprise repo vs. no-subscription repo

  • The enterprise repo (pve-enterprise) requires a valid subscription key. Packages are held back slightly for extra testing.
  • The no-subscription repo (pve-no-subscription) is free. Same packages, slightly less testing. Completely usable for production — many companies run it without issues.
  • The test repo (pvetest) has bleeding-edge packages. Never use in production.
# Switch to no-subscription repo (if no subscription):
# Remove enterprise repo:
rm /etc/apt/sources.list.d/pve-enterprise.list

# Add no-subscription repo (use your Debian codename: trixie for PVE 9, bookworm for PVE 8):
echo "deb http://download.proxmox.com/debian/pve trixie pve-no-subscription" \
  > /etc/apt/sources.list.d/pve-no-subscription.list

apt update
Recommendation

For production customer deployments, buy at least the Community subscription (€115/socket/year) for enterprise repo access, or Basic (€355/socket/year) if you want professional support tickets. The enterprise repo is more stable, and having vendor support as a safety net matters for customer confidence. The cost is negligible compared to VMware licensing — often 10-50x cheaper. For internal/lab use, the no-subscription repo is perfectly fine.

VMware comparison (for customer conversations)

CapabilityProxmox VEVMware vSphere
License modelFree or per-socket/year (€115-€1,060)Per-core subscription (post-Broadcom)
HypervisorKVM (Type 1, Linux-based)ESXi (Type 1, proprietary)
Live migrationYes (manual or API)Yes (vMotion, + DRS for automatic)
HAYes (cold restart on failure)Yes (cold restart + DRS rebalancing)
Distributed storageCeph (built-in)vSAN (licensed separately)
ContainersLXC (native)None (requires VMs)
SDNBasic (VLAN, VXLAN, EVPN)NSX (advanced, very expensive)
AutomationREST API, Terraform, AnsiblevSphere API, Terraform, PowerCLI
GPU passthroughWorks (vfio-pci)Works (better vGPU support with NVIDIA)
13

Consultant's Checklist

Before proposing a Proxmox deployment:

  1. How many hosts? — Determines cluster size and quorum strategy (2-node needs QDevice)
  2. Storage strategy? — Ceph (hyper-converged), ZFS (local), NFS/iSCSI (external SAN), or a mix
  3. Network infrastructure? — How many NICs, 10G/25G availability, VLAN support, jumbo frames
  4. Workload types? — VMs vs. LXC, Windows vs. Linux, GPU needs, real-time requirements
  5. HA requirements? — Needs shared storage. What's the acceptable failover time? (Proxmox HA = cold restart, 1-5 min)
  6. Backup strategy? — PBS recommended. Offsite target? Retention requirements? RTO for restore?
  7. Migration from VMware? — How many VMs? OVA export possible? VMDK conversion plan? V2V tooling?
  8. Linux competency? — Proxmox is Linux. If the team isn't comfortable with CLI, networking config, and apt, budget for training.
  9. Subscription? — Enterprise repo access and support level. Even Basic is worth it for production.
  10. Automation plans? — Terraform (proxmox provider), Ansible (community modules), Packer for templates