OpenBao Production Architecture
Customer deployment reference — secrets management, architecture, HA, operations & Vault comparison
Overview
OpenBao is an open-source fork of HashiCorp Vault, maintained by the Linux Foundation. It was created in December 2023 after HashiCorp switched Vault (and other products) from MPL to the BSL (Business Source License). OpenBao continues under the MPL 2.0 license.
It provides the same core capabilities: secrets management, encryption as a service, identity-based access, and dynamic credentials. For customers who want Vault's capabilities without BSL licensing concerns, OpenBao is the answer.
OpenBao is what you recommend when a customer needs Vault but (a) can't accept BSL licensing, (b) doesn't want vendor lock-in to HashiCorp/IBM, or (c) wants to avoid per-cluster Enterprise licensing costs. The trade-off is that you lose some Enterprise-only features and formal vendor support.
Architecture
OpenBao's architecture is identical to Vault's (it's a fork). Understanding it is about understanding the seal/unseal model and how secrets flow.
| Component | Role | Notes |
|---|---|---|
| Bao Server | Core process | Handles API requests, manages secrets engines, performs encryption |
| Storage Backend | Persistent storage | Encrypted data at rest. Raft (integrated), PostgreSQL, file, or in-memory. |
| Seal/Unseal | Master key management | Shamir's secret sharing or auto-unseal via cloud KMS |
| Auth Methods | Identity verification | LDAP, OIDC, AppRole, Kubernetes, TLS certs, etc. |
| Secrets Engines | Secret generation/storage | KV, PKI, database, AWS, Azure, SSH, Transit, etc. |
| Audit Devices | Audit logging | File, syslog, socket. Every request/response logged. |
Seal / Unseal Model
This is the most important concept to explain to customers. It's what makes Bao/Vault unique compared to other secret stores.
How the seal works
- All data in the storage backend is encrypted with an encryption key
- The encryption key is itself encrypted by the master key
- The master key is split into key shares (Shamir's Secret Sharing) distributed to key holders
- On startup, Bao is sealed — it cannot read its own data
- Key holders provide their shares to unseal Bao (threshold, e.g., 3 of 5)
- Once unsealed, the master key is held in memory only — never written to disk
Shamir's Secret Sharing
Default initialization creates 5 key shares with a threshold of 3. This means:
- 5 different people each receive one key share
- Any 3 of the 5 must provide their shares to unseal
- No single person (or 2 people) can unseal alone
- You can lose up to 2 key shares and still unseal
# Initialize with custom key shares
bao operator init -key-shares=5 -key-threshold=3
# Unseal (run 3 times with different keys)
bao operator unseal # enter key share 1
bao operator unseal # enter key share 2
bao operator unseal # enter key share 3
# => Sealed: false
Auto-unseal
Auto-unseal is almost always what you want in production. Manual unseal with Shamir keys means someone has to be available to unseal after every restart, crash, or upgrade. Configure auto-unseal via a cloud KMS.
Auto-unseal configuration in bao.hcl. Supported providers: AWS KMS, Azure Key Vault, GCP Cloud KMS, OCI KMS, AliCloud KMS, PKCS#11 HSM, Transit, and KMIP:
# AWS KMS auto-unseal
seal "awskms" {
region = "us-east-1"
kms_key_id = "alias/bao-unseal"
}
# Azure Key Vault auto-unseal
seal "azurekeyvault" {
tenant_id = "00000000-0000-0000-0000-000000000000"
vault_name = "bao-unseal-vault"
key_name = "bao-unseal-key"
}
# Transit auto-unseal (another Bao/Vault instance)
seal "transit" {
address = "https://other-bao.example.com:8200"
token = "hvs.XXXXXXXXX"
key_name = "autounseal"
mount_path = "transit/"
}
# PKCS#11 HSM auto-unseal (added in OpenBao 2.2.0)
seal "pkcs11" {
lib = "/usr/lib/softhsm/libsofthsm2.so"
slot = "0"
pin = "1234"
key_label = "bao-unseal-key"
mechanism = "0x1087" # CKM_AES_GCM
}
With auto-unseal, Shamir key shares become recovery keys instead. They're used for operations like generating a new root token, but not for unsealing. The KMS key handles unseal automatically on startup.
If you lose access to the KMS key (deleted, permissions revoked, account locked), Bao cannot unseal. The KMS key is as critical as the unseal keys. Ensure it has deletion protection enabled and that multiple people have access to the cloud account.
Deployment Models
Simple Single Server
One Bao server with integrated Raft storage. Simple to deploy and operate.
- Pros: Minimal infrastructure, fast to deploy
- Cons: Single point of failure, no redundancy
- Best for: Dev, small teams, non-critical secrets
Recommended Raft Cluster
3 or 5 Bao nodes using integrated Raft storage for consensus.
- Pros: HA, no external dependencies, built-in replication
- Cons: Needs low-latency network between nodes
- Best for: Most production deployments
External PostgreSQL-backed
Bao servers using PostgreSQL as the storage backend with HA support.
- Pros: Leverages existing DB infrastructure, familiar ops tooling
- Cons: External dependency, requires PostgreSQL 9.5+
- Best for: Orgs with strong PostgreSQL expertise and existing clusters
Cloud-Native Kubernetes
Deploy via Helm chart or operator on K8s. Bao runs as a StatefulSet.
- Pros: Fits K8s-native workflows, easy scaling
- Cons: PV management, K8s adds failure modes
- Best for: Teams with mature K8s platforms
Default to Raft for most deployments. It has been production-stable for years and eliminates external dependencies. Consider PostgreSQL if the customer has strong existing PostgreSQL infrastructure and wants to use familiar backup/monitoring tooling.
Minimal production config
# /etc/bao.d/bao.hcl
storage "raft" {
path = "/opt/bao/data"
node_id = "bao-1"
}
listener "tcp" {
address = "0.0.0.0:8200"
tls_cert_file = "/opt/bao/tls/cert.pem"
tls_key_file = "/opt/bao/tls/key.pem"
}
api_addr = "https://bao-1.example.com:8200"
cluster_addr = "https://bao-1.example.com:8201"
seal "awskms" {
region = "us-east-1"
kms_key_id = "alias/bao-unseal"
}
ui = true
# Note: disable_mlock was removed in OpenBao 2.0.
# Setting it to false will cause an error. Omit it entirely.
High Availability
How HA works
In an OpenBao cluster:
- One node is the active leader — handles all reads and writes
- Other nodes are standbys — they forward requests to the leader
- If the leader fails, Raft elects a new leader (typically within seconds)
- With standby read support (available since OpenBao 2.5.0), standby nodes can serve read requests locally without forwarding to the leader. Disable with
disable_standby_reads=trueif needed.
Cluster sizing
| Nodes | Fault Tolerance | Use Case |
|---|---|---|
| 3 | 1 node failure | Standard production |
| 5 | 2 node failures | High-criticality, multi-AZ |
Never run 2 or 4 nodes. Raft requires a majority quorum. With 2 nodes, losing 1 loses quorum. With 4 nodes, you still only tolerate 1 failure (same as 3), but pay for an extra node. Always 3 or 5.
Network requirements
- Cluster nodes need low-latency connectivity (< 10ms RTT)
- Raft uses TCP port 8201 for cluster communication
- API listens on TCP port 8200
- Cross-region clusters are not recommended — Raft performance degrades with latency
- For multi-region, use separate clusters with replication (Vault Enterprise feature — not yet in OpenBao)
Joining a new node to the cluster
# On the new node, after starting bao:
bao operator raft join https://bao-1.example.com:8200
# Verify cluster membership
bao operator raft list-peers
# Expected output:
# Node Address State Voter
# ---- ------- ----- -----
# bao-1 bao-1.example.com:8201 leader true
# bao-2 bao-2.example.com:8201 follower true
# bao-3 bao-3.example.com:8201 follower true
Storage
Integrated Raft storage
- Data stored in
/opt/bao/data(or wherever you configure it) - All data is encrypted at rest — the raw Raft data is useless without the unseal keys
- Use SSDs. Raft is write-heavy (every operation is a replicated log entry)
- Typical storage: 1-10 GB for most deployments. PKI with millions of certs can grow larger.
- Raft snapshots happen automatically; you should also take explicit snapshots for backup
PostgreSQL storage
OpenBao also supports PostgreSQL as an external storage backend (the only supported external database). It provides HA support and is production-ready:
storage "postgresql" {
connection_url = "postgres://bao:password@pg.example.com:5432/bao?sslmode=verify-full"
ha_enabled = true
table = "bao_kv_store"
ha_table = "bao_ha_locks"
}
- Requires PostgreSQL 9.5+; SSL connection attempted by default
- Supports paginated lists and transactional storage
- Good option if the customer already has mature PostgreSQL operations (backup, monitoring, HA via Patroni/repmgr)
- Unlike Raft, data is not replicated by Bao — rely on PostgreSQL replication for redundancy
Other backends
- File — stores data on local filesystem. No HA support. Suitable for development/testing only.
- In-memory — all data lost on restart. Development and experimentation only.
Storage considerations
- With Raft, storage is replicated across all cluster nodes automatically
- Monitor disk I/O latency — slow disks cause Raft leader elections and instability
- Autopilot (built-in) handles dead server cleanup and stable server promotion
Unlike HashiCorp Vault, OpenBao does not support Consul as a storage backend. If migrating from a Consul-backed Vault deployment, plan to move to Raft or PostgreSQL storage.
Bao is lightweight. A production cluster node typically needs 2-4 vCPU, 4-8 GB RAM, 20-50 GB SSD. The main resource bottleneck is I/O latency, not capacity. Over-provisioning on fast storage is cheap insurance.
Secrets Engines
Secrets engines are the core of what Bao does. Each engine is mounted at a path and handles a specific type of secret.
Most Common KV (Key-Value)
Static secret storage. V2 provides versioning, soft-delete, and metadata. The simplest engine and usually where customers start.
bao kv put secret/myapp \
db_password=hunter2 \
api_key=sk_live_xxx
Dynamic Database
Generates short-lived database credentials on demand. Supports PostgreSQL, MySQL, MongoDB, MSSQL, Oracle. Credentials auto-expire — no more shared, long-lived DB passwords.
bao read database/creds/readonly
# => username: v-app-readonly-xxxx
# => password: A1B2C3D4-random
# => ttl: 1h
Encryption Transit
Encryption as a service. Applications send plaintext, get ciphertext back. The encryption key never leaves Bao. Supports AES-GCM, ChaCha20-Poly1305, RSA, ECDSA, Ed25519, and HMAC. Key versioning and rotation built in.
Infrastructure PKI
Full certificate authority. Issues X.509 certs with configurable TTLs. Intermediate CA model recommended. Can replace expensive commercial CA for internal services.
Cloud AWS / Azure / GCP
Generates dynamic cloud credentials (IAM users, STS tokens, service principals). Short-lived, automatically revoked. Eliminates static cloud keys in config files.
Access SSH
Signed SSH certificates or dynamic SSH keys. Eliminates authorized_keys management. Signed certs are the recommended approach — no server-side configuration needed per user.
Start with KV to get secrets out of config files and environment variables. Then move to dynamic database credentials — this is where the real security value is. PKI and Transit come later when the team is comfortable with the workflow.
Auth Methods
Auth methods verify identity and map it to policies. Every request to Bao must be authenticated.
| Method | Use Case | Notes |
|---|---|---|
| AppRole | Machine-to-machine | Role ID + Secret ID. Most common for applications. Secret ID can be rotated. |
| Kubernetes | K8s workloads | Pod service account tokens. Seamless for K8s-native apps. Use with the Bao Agent sidecar or CSI provider. |
| LDAP | Human users via AD | Bind to existing directory. Map LDAP groups to Bao policies. |
| OIDC | Human users via SSO | Keycloak, Azure AD, Okta, etc. Browser-based redirect flow. |
| TLS Certificates | Mutual TLS auth | Client presents a certificate. Good for services with existing PKI. |
| Token | Direct token auth | Always enabled. Root token used for initial setup only — revoke after configuring other auth methods. |
| Userpass | Simple username/password | Dev/test only. Never use in production without MFA. |
The root token generated during initialization has unlimited privileges. Use it only for initial setup (enabling auth methods, configuring policies), then revoke it. Generate a new root token via bao operator generate-root only when needed for emergency operations.
Policy basics
Policies define what a token can access. They follow the principle of least privilege:
# policy: app-readonly.hcl
path "secret/data/myapp/*" {
capabilities = ["read", "list"]
}
path "database/creds/myapp-readonly" {
capabilities = ["read"]
}
# Apply policy
bao policy write app-readonly app-readonly.hcl
Backups
Raft snapshots
The primary backup mechanism. A Raft snapshot captures the entire state of the cluster:
# Manual snapshot
bao operator raft snapshot save \
/backup/bao-$(date +%Y%m%d-%H%M).snap
# Automated via cron (every 6 hours)
0 */6 * * * /usr/local/bin/bao operator raft snapshot save \
/backup/bao-$(date +\%Y\%m\%d-\%H\%M).snap 2>&1 | logger -t bao-backup
What to back up
Primary Raft Snapshots
Contains all secrets, policies, auth configs, mounted engines — the entire state. Encrypted with the master key, so useless without unseal keys.
Critical Unseal / Recovery Keys
Without these, snapshots are useless. Store in a physically separate, secure location. Some customers use safe deposit boxes or hardware security modules.
Also Backup Configuration
- Config file (
/etc/bao.d/bao.hcl) — not in snapshots - TLS certificates for API and cluster
- Auto-unseal KMS key access
- Systemd unit file customizations
Strategy Retention
- Snapshots every 6 hours minimum
- Ship off-box (S3, separate server)
- Retain 7 days minimum
- Test restore quarterly
The unseal keys (or auto-unseal KMS access) are the most critical thing to protect. If you lose the unseal keys AND the auto-unseal KMS access, all data is permanently irrecoverable. There is no recovery path — the encryption is real.
Restore procedure
# Restore replaces ALL data in the cluster
bao operator raft snapshot restore backup.snap
# For a fresh cluster restore:
# 1. Start a single Bao node
# 2. Initialize (or auto-unseal)
# 3. Restore snapshot
bao operator raft snapshot restore -force backup.snap
# 4. Join other nodes to the cluster
Snapshot restore replaces ALL data — any secrets written after the snapshot are lost. This is an all-or-nothing operation.
Upgrades & Rollbacks
OpenBao upgrades follow the same pattern as Vault. The process is straightforward but must be done carefully.
Upgrade process (Raft cluster)
- Read the changelog — check for breaking changes, deprecations, and required migration steps
- Take a Raft snapshot:
bao operator raft snapshot save backup.snap - Upgrade standby nodes first — one at a time, verify each joins the cluster
- Upgrade the leader last — this triggers a leader election
- Verify: check seal status, cluster members, run a read/write test
# Take snapshot before starting
bao operator raft snapshot save pre-upgrade-$(date +%Y%m%d).snap
# On each standby node (one at a time):
sudo systemctl stop bao
sudo dpkg -i bao_x.y.z_amd64.deb # or rpm, or replace binary
sudo systemctl start bao
bao status # verify unsealed and raft peer
# After all standbys are upgraded, step down the leader:
bao operator step-down
# The upgraded standbys will elect a new leader
# Upgrade the old leader (now a standby):
sudo systemctl stop bao
sudo dpkg -i bao_x.y.z_amd64.deb
sudo systemctl start bao
bao operator raft list-peers # verify all nodes healthy
Large version jumps are supported (e.g., 2.0.0 → 2.5.0), but you must review the upgrade notes for all intervening versions. They may describe additional steps or configuration changes required before, during, or after the upgrade.
Rollbacks
Rollbacks are possible and better-supported than in GitLab or Keycloak:
Patch Versions Binary Swap
If no data schema changes occurred: stop Bao, replace binary with old version, start. This usually works for patch versions. Fast and simple.
Major/Minor Snapshot Restore
If schema changes occurred: restore from the Raft snapshot taken before the upgrade. Replaces ALL data — anything written after the snapshot is lost.
Unlike GitLab, Bao rollbacks via snapshot are well-supported and fast. But only if you actually took the snapshot. Make it the first step of every upgrade.
Monitoring
OpenBao exposes metrics via a Prometheus endpoint at /v1/sys/metrics?format=prometheus (requires a token with appropriate permissions). Note: metric names default to the vault.* prefix (inherited from the Vault codebase). This can be changed via the metrics_prefix setting in the telemetry stanza.
Key metrics & alerting
| Metric | Alert Threshold | Why |
|---|---|---|
| vault.core.unsealed | = 0 | Node is sealed; can't serve requests |
| vault.raft.leader.lastContact | > 500ms | Cluster communication issues; risk of leader election |
| vault.raft.commitTime | > 25ms (p99) | Storage is slow; operations will queue |
| vault.expire.num_leases | > 100,000 | Lease explosion; often a misconfigured app requesting new creds every request |
| vault.runtime.alloc_bytes | Trending up | Memory leak or excessive lease count |
| vault.audit.log_response | Errors > 0 | Audit device failure — Bao will STOP ALL requests |
| vault.core.leadership_setup | Frequent changes | Leadership instability; investigate network or disk issues |
| vault.token.count | > 50,000 | Token sprawl; apps may not be revoking tokens properly |
If all audit devices fail, Bao will stop processing all requests. This is a security feature — it prevents unaudited access. Configure multiple audit devices for redundancy (e.g., file + syslog). Make sure audit log destinations are reliable.
Enabling metrics
# In bao.hcl
telemetry {
prometheus_retention_time = "30s"
disable_hostname = true
}
# Scrape with Prometheus:
# - job_name: 'bao'
# metrics_path: '/v1/sys/metrics'
# params:
# format: ['prometheus']
# bearer_token: 'hvs.METRICS_TOKEN'
# static_configs:
# - targets: ['bao-1:8200', 'bao-2:8200', 'bao-3:8200']
Health checks
/v1/sys/health— returns 200 if initialized, unsealed, and active. Returns 429 for standby, 501 for not initialized, 503 for sealed. Status codes are customizable via query parameters (standbycode,activecode, etc.).- Use
?standbyok=truefor load balancer health checks that should include standby nodes /v1/sys/seal-status— detailed seal status without authentication
Security Hardening
Bao holds your most sensitive data. Default configuration is not production-ready.
TLS Encrypt Everything
API listener: TLS required (never disable in production). Cluster traffic: encrypted by default with Raft. Client→Bao: TLS 1.2+ only. Use certificates from a trusted CA, not self-signed.
Audit Enable Immediately
Enable audit devices before anything else. Every request and response is logged with HMAC'd sensitive values. Configure at least two audit backends for redundancy.
bao audit enable file \
file_path=/var/log/bao/audit.log
bao audit enable syslog
Root Token Revoke After Setup
Use root token only for initial configuration. Then revoke it. Generate a new one via bao operator generate-root when needed for emergency operations. Never store the root token in a file.
Policies Least Privilege
Default deny. Grant only the specific paths and capabilities needed. Use path templating ({{identity.entity.id}}) for per-entity access. Review policies quarterly.
Operational hardening
- mlock: Removed in OpenBao 2.0. The
disable_mlocksetting is no longer functional — setting it tofalsewill cause a startup error. OpenBao relies on OS-level memory protections instead. Use encrypted swap or disable swap entirely on Bao nodes. - Firewall: Only expose port 8200 to clients. Port 8201 (cluster) should only be accessible between Bao nodes.
- Lease TTLs: Set aggressive default and max TTLs. Short-lived credentials limit blast radius. Default TTL of 1h, max of 24h is a good starting point.
- Token TTLs: Same principle. Use periodic tokens for long-running services that can renew, not long-lived tokens.
- UI: Disable in production if not needed (
ui = false). If enabled, restrict to internal networks via the load balancer. - Response wrapping: Use
-wrap-ttlfor secret zero delivery. The wrapped token can only be unwrapped once.
Licensing & Support
| Aspect | OpenBao | HashiCorp Vault |
|---|---|---|
| License | MPL 2.0 (truly open source) | BSL 1.1 (source-available, restrictions) |
| Cost | Free | Free (Community) / $$$$ (Enterprise, per-cluster) |
| Vendor Support | Community only (GitHub, mailing lists) | Paid support from HashiCorp/IBM |
| Enterprise Features | Namespaces (GA since 2.3.1), standby reads (2.5.0), PKCS#11 HSM (2.2.0) | Namespaces, Sentinel, replication, HSM |
| Governance | Linux Foundation, community-driven | HashiCorp/IBM |
What OpenBao is missing (vs. Vault Enterprise)
Shipped Recently Delivered
- Namespaces — multi-tenancy isolation (GA since v2.3.1, June 2025)
- Standby Read Support — standby nodes serve read requests locally (v2.5.0)
- PKCS#11 HSM Auto-unseal — hardware security module support via PKCS#11 (v2.2.0)
Not Yet Still Missing
- Performance Replication — cross-region read replicas
- Disaster Recovery Replication — cross-region DR
- Sentinel Policies — fine-grained policy-as-code
- Control Groups — multi-party approval for secret access
For most customers, the remaining missing Enterprise features don't matter. KV secrets, PKI, database dynamic credentials, Kubernetes auth, Transit encryption, namespaces, and HSM unsealing — all work in OpenBao. The main gap is cross-region replication (performance and DR), which only matters for global deployments.
Migration from Vault
If the customer is migrating from HashiCorp Vault:
- API is compatible — clients using the Vault HTTP API work with OpenBao (change the address)
- CLI is
baoinstead ofvaultbut accepts the same commands - Configuration files use the same HCL syntax
- Data migration: take a Vault Raft snapshot, restore into OpenBao (version compatibility matters)
- Plugins/auth methods: most community plugins work. Enterprise-only features won't.
Consultant's Checklist
Before proposing an OpenBao deployment, get answers to these:
- What secrets need managing? — Static KV? Dynamic DB creds? PKI? Encryption?
- How many applications/services will authenticate? — Determines auth method strategy
- What's the deployment platform? — VMs, Kubernetes, cloud? Determines auth methods and deployment model
- RTO/RPO requirements? — Determines cluster size and backup frequency
- Multi-region requirements? — If yes, OpenBao may not be sufficient today. Consider Vault Enterprise or architect around the limitation.
- Compliance requirements? — Audit logging, HSM requirements, secret rotation policies
- Existing HashiCorp Vault deployment? — Migration path needed?
- Who operates it? — Determines automation and runbook depth
- Auto-unseal strategy? — Which cloud KMS, or transit unseal from another instance?
- Secret zero problem? — How do applications get their initial Bao credentials? (The hardest question in secrets management)