OpenBao Production Architecture

Customer deployment reference — secrets management, architecture, HA, operations & Vault comparison

01

Overview

OpenBao is an open-source fork of HashiCorp Vault, maintained by the Linux Foundation. It was created in December 2023 after HashiCorp switched Vault (and other products) from MPL to the BSL (Business Source License). OpenBao continues under the MPL 2.0 license.

It provides the same core capabilities: secrets management, encryption as a service, identity-based access, and dynamic credentials. For customers who want Vault's capabilities without BSL licensing concerns, OpenBao is the answer.

Positioning

OpenBao is what you recommend when a customer needs Vault but (a) can't accept BSL licensing, (b) doesn't want vendor lock-in to HashiCorp/IBM, or (c) wants to avoid per-cluster Enterprise licensing costs. The trade-off is that you lose some Enterprise-only features and formal vendor support.

02

Architecture

OpenBao's architecture is identical to Vault's (it's a fork). Understanding it is about understanding the seal/unseal model and how secrets flow.

+-----------------------------------------------------+ | Load Balancer | | (HTTPS, health checks) | +----------+--------------+--------------+------------+ | | | +-----v-----+ +-----v-----+ +-----v-----+ | Bao Node | | Bao Node | | Bao Node | | (Active) | | (Standby) | | (Standby) | | | | forwards | | forwards | | +--------+ | | to leader | | to leader | | |Secrets | | +-----+-----+ +-----+-----+ | |Engines | | | | | |Auth | | | Raft Consensus | | |Audit | | | (port 8201) | | +--------+ | | | +-----+------+-------+--------------+ | +-----v----------------------------------------------+ | Integrated Raft Storage | | (encrypted at rest, replicated to all nodes) | +------------------------------------------------+ | +-----v----------------------------------------------+ | Auto-Unseal (KMS / HSM) | | AWS KMS / Azure KV / GCP KMS / PKCS#11 HSM | +------------------------------------------------+
ComponentRoleNotes
Bao ServerCore processHandles API requests, manages secrets engines, performs encryption
Storage BackendPersistent storageEncrypted data at rest. Raft (integrated), PostgreSQL, file, or in-memory.
Seal/UnsealMaster key managementShamir's secret sharing or auto-unseal via cloud KMS
Auth MethodsIdentity verificationLDAP, OIDC, AppRole, Kubernetes, TLS certs, etc.
Secrets EnginesSecret generation/storageKV, PKI, database, AWS, Azure, SSH, Transit, etc.
Audit DevicesAudit loggingFile, syslog, socket. Every request/response logged.
03

Seal / Unseal Model

This is the most important concept to explain to customers. It's what makes Bao/Vault unique compared to other secret stores.

How the seal works

  1. All data in the storage backend is encrypted with an encryption key
  2. The encryption key is itself encrypted by the master key
  3. The master key is split into key shares (Shamir's Secret Sharing) distributed to key holders
  4. On startup, Bao is sealed — it cannot read its own data
  5. Key holders provide their shares to unseal Bao (threshold, e.g., 3 of 5)
  6. Once unsealed, the master key is held in memory only — never written to disk

Shamir's Secret Sharing

Default initialization creates 5 key shares with a threshold of 3. This means:

  • 5 different people each receive one key share
  • Any 3 of the 5 must provide their shares to unseal
  • No single person (or 2 people) can unseal alone
  • You can lose up to 2 key shares and still unseal
# Initialize with custom key shares
bao operator init -key-shares=5 -key-threshold=3

# Unseal (run 3 times with different keys)
bao operator unseal    # enter key share 1
bao operator unseal    # enter key share 2
bao operator unseal    # enter key share 3
# => Sealed: false

Auto-unseal

Production Recommendation

Auto-unseal is almost always what you want in production. Manual unseal with Shamir keys means someone has to be available to unseal after every restart, crash, or upgrade. Configure auto-unseal via a cloud KMS.

Auto-unseal configuration in bao.hcl. Supported providers: AWS KMS, Azure Key Vault, GCP Cloud KMS, OCI KMS, AliCloud KMS, PKCS#11 HSM, Transit, and KMIP:

# AWS KMS auto-unseal
seal "awskms" {
  region     = "us-east-1"
  kms_key_id = "alias/bao-unseal"
}

# Azure Key Vault auto-unseal
seal "azurekeyvault" {
  tenant_id  = "00000000-0000-0000-0000-000000000000"
  vault_name = "bao-unseal-vault"
  key_name   = "bao-unseal-key"
}

# Transit auto-unseal (another Bao/Vault instance)
seal "transit" {
  address         = "https://other-bao.example.com:8200"
  token           = "hvs.XXXXXXXXX"
  key_name        = "autounseal"
  mount_path      = "transit/"
}

# PKCS#11 HSM auto-unseal (added in OpenBao 2.2.0)
seal "pkcs11" {
  lib         = "/usr/lib/softhsm/libsofthsm2.so"
  slot        = "0"
  pin         = "1234"
  key_label   = "bao-unseal-key"
  mechanism   = "0x1087"  # CKM_AES_GCM
}

With auto-unseal, Shamir key shares become recovery keys instead. They're used for operations like generating a new root token, but not for unsealing. The KMS key handles unseal automatically on startup.

Critical

If you lose access to the KMS key (deleted, permissions revoked, account locked), Bao cannot unseal. The KMS key is as critical as the unseal keys. Ensure it has deletion protection enabled and that multiple people have access to the cloud account.

04

Deployment Models

Simple Single Server

One Bao server with integrated Raft storage. Simple to deploy and operate.

  • Pros: Minimal infrastructure, fast to deploy
  • Cons: Single point of failure, no redundancy
  • Best for: Dev, small teams, non-critical secrets

Recommended Raft Cluster

3 or 5 Bao nodes using integrated Raft storage for consensus.

  • Pros: HA, no external dependencies, built-in replication
  • Cons: Needs low-latency network between nodes
  • Best for: Most production deployments

External PostgreSQL-backed

Bao servers using PostgreSQL as the storage backend with HA support.

  • Pros: Leverages existing DB infrastructure, familiar ops tooling
  • Cons: External dependency, requires PostgreSQL 9.5+
  • Best for: Orgs with strong PostgreSQL expertise and existing clusters

Cloud-Native Kubernetes

Deploy via Helm chart or operator on K8s. Bao runs as a StatefulSet.

  • Pros: Fits K8s-native workflows, easy scaling
  • Cons: PV management, K8s adds failure modes
  • Best for: Teams with mature K8s platforms
Recommendation

Default to Raft for most deployments. It has been production-stable for years and eliminates external dependencies. Consider PostgreSQL if the customer has strong existing PostgreSQL infrastructure and wants to use familiar backup/monitoring tooling.

Minimal production config

# /etc/bao.d/bao.hcl
storage "raft" {
  path    = "/opt/bao/data"
  node_id = "bao-1"
}

listener "tcp" {
  address     = "0.0.0.0:8200"
  tls_cert_file = "/opt/bao/tls/cert.pem"
  tls_key_file  = "/opt/bao/tls/key.pem"
}

api_addr     = "https://bao-1.example.com:8200"
cluster_addr = "https://bao-1.example.com:8201"

seal "awskms" {
  region     = "us-east-1"
  kms_key_id = "alias/bao-unseal"
}

ui = true
# Note: disable_mlock was removed in OpenBao 2.0.
# Setting it to false will cause an error. Omit it entirely.
05

High Availability

How HA works

In an OpenBao cluster:

  • One node is the active leader — handles all reads and writes
  • Other nodes are standbys — they forward requests to the leader
  • If the leader fails, Raft elects a new leader (typically within seconds)
  • With standby read support (available since OpenBao 2.5.0), standby nodes can serve read requests locally without forwarding to the leader. Disable with disable_standby_reads=true if needed.

Cluster sizing

NodesFault ToleranceUse Case
31 node failureStandard production
52 node failuresHigh-criticality, multi-AZ
Odd Numbers Only

Never run 2 or 4 nodes. Raft requires a majority quorum. With 2 nodes, losing 1 loses quorum. With 4 nodes, you still only tolerate 1 failure (same as 3), but pay for an extra node. Always 3 or 5.

Network requirements

  • Cluster nodes need low-latency connectivity (< 10ms RTT)
  • Raft uses TCP port 8201 for cluster communication
  • API listens on TCP port 8200
  • Cross-region clusters are not recommended — Raft performance degrades with latency
  • For multi-region, use separate clusters with replication (Vault Enterprise feature — not yet in OpenBao)

Joining a new node to the cluster

# On the new node, after starting bao:
bao operator raft join https://bao-1.example.com:8200

# Verify cluster membership
bao operator raft list-peers

# Expected output:
# Node     Address                  State       Voter
# ----     -------                  -----       -----
# bao-1    bao-1.example.com:8201   leader      true
# bao-2    bao-2.example.com:8201   follower    true
# bao-3    bao-3.example.com:8201   follower    true
06

Storage

Integrated Raft storage

  • Data stored in /opt/bao/data (or wherever you configure it)
  • All data is encrypted at rest — the raw Raft data is useless without the unseal keys
  • Use SSDs. Raft is write-heavy (every operation is a replicated log entry)
  • Typical storage: 1-10 GB for most deployments. PKI with millions of certs can grow larger.
  • Raft snapshots happen automatically; you should also take explicit snapshots for backup

PostgreSQL storage

OpenBao also supports PostgreSQL as an external storage backend (the only supported external database). It provides HA support and is production-ready:

storage "postgresql" {
  connection_url = "postgres://bao:password@pg.example.com:5432/bao?sslmode=verify-full"
  ha_enabled     = true
  table          = "bao_kv_store"
  ha_table       = "bao_ha_locks"
}
  • Requires PostgreSQL 9.5+; SSL connection attempted by default
  • Supports paginated lists and transactional storage
  • Good option if the customer already has mature PostgreSQL operations (backup, monitoring, HA via Patroni/repmgr)
  • Unlike Raft, data is not replicated by Bao — rely on PostgreSQL replication for redundancy

Other backends

  • File — stores data on local filesystem. No HA support. Suitable for development/testing only.
  • In-memory — all data lost on restart. Development and experimentation only.

Storage considerations

  • With Raft, storage is replicated across all cluster nodes automatically
  • Monitor disk I/O latency — slow disks cause Raft leader elections and instability
  • Autopilot (built-in) handles dead server cleanup and stable server promotion
No Consul

Unlike HashiCorp Vault, OpenBao does not support Consul as a storage backend. If migrating from a Consul-backed Vault deployment, plan to move to Raft or PostgreSQL storage.

Sizing

Bao is lightweight. A production cluster node typically needs 2-4 vCPU, 4-8 GB RAM, 20-50 GB SSD. The main resource bottleneck is I/O latency, not capacity. Over-provisioning on fast storage is cheap insurance.

07

Secrets Engines

Secrets engines are the core of what Bao does. Each engine is mounted at a path and handles a specific type of secret.

Most Common KV (Key-Value)

Static secret storage. V2 provides versioning, soft-delete, and metadata. The simplest engine and usually where customers start.

bao kv put secret/myapp \
  db_password=hunter2 \
  api_key=sk_live_xxx

Dynamic Database

Generates short-lived database credentials on demand. Supports PostgreSQL, MySQL, MongoDB, MSSQL, Oracle. Credentials auto-expire — no more shared, long-lived DB passwords.

bao read database/creds/readonly
# => username: v-app-readonly-xxxx
# => password: A1B2C3D4-random
# => ttl: 1h

Encryption Transit

Encryption as a service. Applications send plaintext, get ciphertext back. The encryption key never leaves Bao. Supports AES-GCM, ChaCha20-Poly1305, RSA, ECDSA, Ed25519, and HMAC. Key versioning and rotation built in.

Infrastructure PKI

Full certificate authority. Issues X.509 certs with configurable TTLs. Intermediate CA model recommended. Can replace expensive commercial CA for internal services.

Cloud AWS / Azure / GCP

Generates dynamic cloud credentials (IAM users, STS tokens, service principals). Short-lived, automatically revoked. Eliminates static cloud keys in config files.

Access SSH

Signed SSH certificates or dynamic SSH keys. Eliminates authorized_keys management. Signed certs are the recommended approach — no server-side configuration needed per user.

Consultant Tip

Start with KV to get secrets out of config files and environment variables. Then move to dynamic database credentials — this is where the real security value is. PKI and Transit come later when the team is comfortable with the workflow.

08

Auth Methods

Auth methods verify identity and map it to policies. Every request to Bao must be authenticated.

MethodUse CaseNotes
AppRoleMachine-to-machineRole ID + Secret ID. Most common for applications. Secret ID can be rotated.
KubernetesK8s workloadsPod service account tokens. Seamless for K8s-native apps. Use with the Bao Agent sidecar or CSI provider.
LDAPHuman users via ADBind to existing directory. Map LDAP groups to Bao policies.
OIDCHuman users via SSOKeycloak, Azure AD, Okta, etc. Browser-based redirect flow.
TLS CertificatesMutual TLS authClient presents a certificate. Good for services with existing PKI.
TokenDirect token authAlways enabled. Root token used for initial setup only — revoke after configuring other auth methods.
UserpassSimple username/passwordDev/test only. Never use in production without MFA.
Root Token

The root token generated during initialization has unlimited privileges. Use it only for initial setup (enabling auth methods, configuring policies), then revoke it. Generate a new root token via bao operator generate-root only when needed for emergency operations.

Policy basics

Policies define what a token can access. They follow the principle of least privilege:

# policy: app-readonly.hcl
path "secret/data/myapp/*" {
  capabilities = ["read", "list"]
}

path "database/creds/myapp-readonly" {
  capabilities = ["read"]
}

# Apply policy
bao policy write app-readonly app-readonly.hcl
09

Backups

Raft snapshots

The primary backup mechanism. A Raft snapshot captures the entire state of the cluster:

# Manual snapshot
bao operator raft snapshot save \
  /backup/bao-$(date +%Y%m%d-%H%M).snap

# Automated via cron (every 6 hours)
0 */6 * * * /usr/local/bin/bao operator raft snapshot save \
  /backup/bao-$(date +\%Y\%m\%d-\%H\%M).snap 2>&1 | logger -t bao-backup

What to back up

Primary Raft Snapshots

Contains all secrets, policies, auth configs, mounted engines — the entire state. Encrypted with the master key, so useless without unseal keys.

Critical Unseal / Recovery Keys

Without these, snapshots are useless. Store in a physically separate, secure location. Some customers use safe deposit boxes or hardware security modules.

Also Backup Configuration

  • Config file (/etc/bao.d/bao.hcl) — not in snapshots
  • TLS certificates for API and cluster
  • Auto-unseal KMS key access
  • Systemd unit file customizations

Strategy Retention

  • Snapshots every 6 hours minimum
  • Ship off-box (S3, separate server)
  • Retain 7 days minimum
  • Test restore quarterly
Keys to the Kingdom

The unseal keys (or auto-unseal KMS access) are the most critical thing to protect. If you lose the unseal keys AND the auto-unseal KMS access, all data is permanently irrecoverable. There is no recovery path — the encryption is real.

Restore procedure

# Restore replaces ALL data in the cluster
bao operator raft snapshot restore backup.snap

# For a fresh cluster restore:
# 1. Start a single Bao node
# 2. Initialize (or auto-unseal)
# 3. Restore snapshot
bao operator raft snapshot restore -force backup.snap
# 4. Join other nodes to the cluster

Snapshot restore replaces ALL data — any secrets written after the snapshot are lost. This is an all-or-nothing operation.

10

Upgrades & Rollbacks

OpenBao upgrades follow the same pattern as Vault. The process is straightforward but must be done carefully.

Upgrade process (Raft cluster)

  1. Read the changelog — check for breaking changes, deprecations, and required migration steps
  2. Take a Raft snapshot: bao operator raft snapshot save backup.snap
  3. Upgrade standby nodes first — one at a time, verify each joins the cluster
  4. Upgrade the leader last — this triggers a leader election
  5. Verify: check seal status, cluster members, run a read/write test
# Take snapshot before starting
bao operator raft snapshot save pre-upgrade-$(date +%Y%m%d).snap

# On each standby node (one at a time):
sudo systemctl stop bao
sudo dpkg -i bao_x.y.z_amd64.deb    # or rpm, or replace binary
sudo systemctl start bao
bao status                            # verify unsealed and raft peer

# After all standbys are upgraded, step down the leader:
bao operator step-down
# The upgraded standbys will elect a new leader

# Upgrade the old leader (now a standby):
sudo systemctl stop bao
sudo dpkg -i bao_x.y.z_amd64.deb
sudo systemctl start bao
bao operator raft list-peers          # verify all nodes healthy
Version Skipping

Large version jumps are supported (e.g., 2.0.0 → 2.5.0), but you must review the upgrade notes for all intervening versions. They may describe additional steps or configuration changes required before, during, or after the upgrade.

Rollbacks

Rollbacks are possible and better-supported than in GitLab or Keycloak:

Patch Versions Binary Swap

If no data schema changes occurred: stop Bao, replace binary with old version, start. This usually works for patch versions. Fast and simple.

Major/Minor Snapshot Restore

If schema changes occurred: restore from the Raft snapshot taken before the upgrade. Replaces ALL data — anything written after the snapshot is lost.

Always Take a Snapshot

Unlike GitLab, Bao rollbacks via snapshot are well-supported and fast. But only if you actually took the snapshot. Make it the first step of every upgrade.

11

Monitoring

OpenBao exposes metrics via a Prometheus endpoint at /v1/sys/metrics?format=prometheus (requires a token with appropriate permissions). Note: metric names default to the vault.* prefix (inherited from the Vault codebase). This can be changed via the metrics_prefix setting in the telemetry stanza.

Key metrics & alerting

MetricAlert ThresholdWhy
vault.core.unsealed= 0Node is sealed; can't serve requests
vault.raft.leader.lastContact> 500msCluster communication issues; risk of leader election
vault.raft.commitTime> 25ms (p99)Storage is slow; operations will queue
vault.expire.num_leases> 100,000Lease explosion; often a misconfigured app requesting new creds every request
vault.runtime.alloc_bytesTrending upMemory leak or excessive lease count
vault.audit.log_responseErrors > 0Audit device failure — Bao will STOP ALL requests
vault.core.leadership_setupFrequent changesLeadership instability; investigate network or disk issues
vault.token.count> 50,000Token sprawl; apps may not be revoking tokens properly
Critical Behavior

If all audit devices fail, Bao will stop processing all requests. This is a security feature — it prevents unaudited access. Configure multiple audit devices for redundancy (e.g., file + syslog). Make sure audit log destinations are reliable.

Enabling metrics

# In bao.hcl
telemetry {
  prometheus_retention_time = "30s"
  disable_hostname = true
}

# Scrape with Prometheus:
# - job_name: 'bao'
#   metrics_path: '/v1/sys/metrics'
#   params:
#     format: ['prometheus']
#   bearer_token: 'hvs.METRICS_TOKEN'
#   static_configs:
#     - targets: ['bao-1:8200', 'bao-2:8200', 'bao-3:8200']

Health checks

  • /v1/sys/health — returns 200 if initialized, unsealed, and active. Returns 429 for standby, 501 for not initialized, 503 for sealed. Status codes are customizable via query parameters (standbycode, activecode, etc.).
  • Use ?standbyok=true for load balancer health checks that should include standby nodes
  • /v1/sys/seal-status — detailed seal status without authentication
12

Security Hardening

Bao holds your most sensitive data. Default configuration is not production-ready.

TLS Encrypt Everything

API listener: TLS required (never disable in production). Cluster traffic: encrypted by default with Raft. Client→Bao: TLS 1.2+ only. Use certificates from a trusted CA, not self-signed.

Audit Enable Immediately

Enable audit devices before anything else. Every request and response is logged with HMAC'd sensitive values. Configure at least two audit backends for redundancy.

bao audit enable file \
  file_path=/var/log/bao/audit.log
bao audit enable syslog

Root Token Revoke After Setup

Use root token only for initial configuration. Then revoke it. Generate a new one via bao operator generate-root when needed for emergency operations. Never store the root token in a file.

Policies Least Privilege

Default deny. Grant only the specific paths and capabilities needed. Use path templating ({{identity.entity.id}}) for per-entity access. Review policies quarterly.

Operational hardening

  • mlock: Removed in OpenBao 2.0. The disable_mlock setting is no longer functional — setting it to false will cause a startup error. OpenBao relies on OS-level memory protections instead. Use encrypted swap or disable swap entirely on Bao nodes.
  • Firewall: Only expose port 8200 to clients. Port 8201 (cluster) should only be accessible between Bao nodes.
  • Lease TTLs: Set aggressive default and max TTLs. Short-lived credentials limit blast radius. Default TTL of 1h, max of 24h is a good starting point.
  • Token TTLs: Same principle. Use periodic tokens for long-running services that can renew, not long-lived tokens.
  • UI: Disable in production if not needed (ui = false). If enabled, restrict to internal networks via the load balancer.
  • Response wrapping: Use -wrap-ttl for secret zero delivery. The wrapped token can only be unwrapped once.
13

Licensing & Support

AspectOpenBaoHashiCorp Vault
LicenseMPL 2.0 (truly open source)BSL 1.1 (source-available, restrictions)
CostFreeFree (Community) / $$$$ (Enterprise, per-cluster)
Vendor SupportCommunity only (GitHub, mailing lists)Paid support from HashiCorp/IBM
Enterprise FeaturesNamespaces (GA since 2.3.1), standby reads (2.5.0), PKCS#11 HSM (2.2.0)Namespaces, Sentinel, replication, HSM
GovernanceLinux Foundation, community-drivenHashiCorp/IBM

What OpenBao is missing (vs. Vault Enterprise)

Shipped Recently Delivered

  • Namespaces — multi-tenancy isolation (GA since v2.3.1, June 2025)
  • Standby Read Support — standby nodes serve read requests locally (v2.5.0)
  • PKCS#11 HSM Auto-unseal — hardware security module support via PKCS#11 (v2.2.0)

Not Yet Still Missing

  • Performance Replication — cross-region read replicas
  • Disaster Recovery Replication — cross-region DR
  • Sentinel Policies — fine-grained policy-as-code
  • Control Groups — multi-party approval for secret access
Sales Positioning

For most customers, the remaining missing Enterprise features don't matter. KV secrets, PKI, database dynamic credentials, Kubernetes auth, Transit encryption, namespaces, and HSM unsealing — all work in OpenBao. The main gap is cross-region replication (performance and DR), which only matters for global deployments.

Migration from Vault

If the customer is migrating from HashiCorp Vault:

  • API is compatible — clients using the Vault HTTP API work with OpenBao (change the address)
  • CLI is bao instead of vault but accepts the same commands
  • Configuration files use the same HCL syntax
  • Data migration: take a Vault Raft snapshot, restore into OpenBao (version compatibility matters)
  • Plugins/auth methods: most community plugins work. Enterprise-only features won't.
14

Consultant's Checklist

Before proposing an OpenBao deployment, get answers to these:

  1. What secrets need managing? — Static KV? Dynamic DB creds? PKI? Encryption?
  2. How many applications/services will authenticate? — Determines auth method strategy
  3. What's the deployment platform? — VMs, Kubernetes, cloud? Determines auth methods and deployment model
  4. RTO/RPO requirements? — Determines cluster size and backup frequency
  5. Multi-region requirements? — If yes, OpenBao may not be sufficient today. Consider Vault Enterprise or architect around the limitation.
  6. Compliance requirements? — Audit logging, HSM requirements, secret rotation policies
  7. Existing HashiCorp Vault deployment? — Migration path needed?
  8. Who operates it? — Determines automation and runbook depth
  9. Auto-unseal strategy? — Which cloud KMS, or transit unseal from another instance?
  10. Secret zero problem? — How do applications get their initial Bao credentials? (The hardest question in secrets management)