SUSE Observability

Topology-powered Kubernetes observability with time-travel debugging — formerly StackState

01

Overview & History

SUSE Observability is an enterprise observability platform for Kubernetes and cloud-native infrastructure. It unifies metrics, logs, traces, and topology into a single platform, built around a unique 4T data model (Topology, Telemetry, Traces, Time) that correlates all observability signals against a real-time dependency map of your infrastructure. The current version is v2.8.1 (released 17 March 2026).

The product was originally developed by StackState, a Dutch observability company founded in 2015 by Mark Bakker and Lodewijk Bogaards (with Remco Beckers joining as a third co-founder). StackState was built around the insight that traditional monitoring creates data silos, and that mapping IT topology — the relationships and dependencies between components — is essential for understanding complex distributed systems.

History StackState Origins (2014–2017)

Born from a consulting engagement at a major Dutch bank in 2014, where the founders discovered that performance issues persisted despite abundant monitoring data — the problem was not lack of data but lack of insight. They spent 3 years building a custom versioned graph database from scratch because no existing graph database supported time-travel capabilities.

History StackState Launch (2017–2024)

Launched in 2017 as the first observability platform with a time-traveling topology. Recognized by Gartner as a Cool Vendor in Performance Analysis (2019) and as a representative vendor in the Gartner Market Guide for AIOps Platforms (2021). Customers included KPN, Vodafone, and Accenture. The company grew to 50+ employees.

Acquisition SUSE Acquires StackState

June 18, 2024 — announced at SUSECON Berlin. SUSE acquired StackState to add full-stack observability to its Rancher ecosystem. Financial terms were not disclosed. SUSE announced its intention to open-source StackState in the future to foster broader adoption.

Rebranding SUSE Observability (2024–present)

Integrated into Rancher Prime 3.1 on September 5, 2024. Rebranded from StackState to "SUSE Observability." Version 2.0.0 (11 Sep 2024) was the first release under the SUSE brand. The documentation site moved from docs.stackstate.com to documentation.suse.com. The development organization on GitHub remains StackVista.

Deployment Models

  • SUSE Observability (Self-Hosted) — deployed on your own Kubernetes cluster via Helm. Included with SUSE Rancher Prime subscriptions. Full control over data and infrastructure.
  • SUSE Cloud Observability (SaaS) — fully managed SaaS platform launched November 2024. Available on AWS Marketplace. Setup in under 5 minutes. Supports EKS, on-premises, and Rancher-managed clusters. This was SUSE's first SaaS-based product.
02

The 4T Data Model

The core differentiator of SUSE Observability is the 4T data model, introduced in StackState v4.6. Traditional observability tools treat metrics, logs, and traces as separate concerns. SUSE Observability correlates Topology, Telemetry, and Traces at every moment in Time, providing a unified context for troubleshooting that no individual signal can offer alone.

T1 Topology

A real-time map of all infrastructure components and their dependencies (relationships). In Kubernetes, this includes clusters, nodes, namespaces, deployments, pods, services, persistent volumes, and their connections. Topology is auto-discovered from the Kubernetes API and enriched by eBPF-based network observation. Stored in a custom versioned graph database (StackGraph) that preserves every historical state.

T2 Telemetry

Metrics, events, and logs collected from observed infrastructure. Metrics are stored in VictoriaMetrics, logs in Elasticsearch. Telemetry is automatically bound to topology components, so you always see metrics in the context of what they belong to — not just as isolated time-series.

T3 Traces

Distributed traces that show how requests flow across services. Collected via OpenTelemetry or SUSE Observability's own eBPF-based request tracing. Traces are stored in ClickHouse and are correlated with topology to show request paths across the dependency map.

T4 Time

The temporal dimension that binds the other three. Every topology snapshot, metric, log entry, and trace span is precisely timestamped. This enables time-travel debugging — the ability to reconstruct the exact state of your infrastructure at any point in the past and see all associated observability data. This is the foundational innovation built on the versioned graph database.

Why Topology Changes Everything

In traditional monitoring (Prometheus + Grafana), you have metrics and dashboards but no automatic understanding of what depends on what. When a database goes down, you see the database alert, but you have to manually figure out which applications are affected. With the 4T model:

  • Context is automatic — every metric, log, and trace is tied to a component in the topology map
  • Impact analysis is built-in — if a component becomes unhealthy, you instantly see all dependent components that are affected via health propagation
  • Root cause analysis follows the graph — problems propagate through the dependency chain, and SUSE Observability identifies the unhealthy component at the bottom of the chain as the probable root cause
  • Ephemeral resources are preserved — even after a pod is deleted, you can travel back in time and see how it was connected, its logs, events, and related resources
03

Architecture

SUSE Observability consists of three primary architectural components: the Server (on-premises or SaaS), the Agent (deployed on observed clusters), and the optional Rancher Prime UI Extension.

+--[ Observed Cluster(s) ]---+ +--[ SUSE Observability Server Cluster ]--------+ | | | | | +--------+ +--------+ | | +----------+ +----------+ +---------+ | | | Node | | Node | | | | Receiver | | API | | UI | | | | Agent | | Agent | | | | (base, | | Server | | (React) | | | | (eBPF) | | (eBPF) | | | | logs, | +-----+----+ +---------+ | | +---+----+ +---+----+ | | | process) | | | | | | | | +-----+----+ +----+----+ | | +---+----------+----+ | | | | Router | (Envoy) | | | Cluster Agent | | | v +---------+ | | | + Checks Agent | | | +-----+----+ | | | + kube-state- | | | | Kafka | (message bus) | | | metrics | | | +-----+----+ | | +--------+-----------+ | | | | | | | | +-----+------+------+------+------+ | +----------+---------------+ | | | | | | | | | | | Sync | Health| State|Checks|Correlate | | HTTPS | | | Sync | | | | | +-------------------->| +--+---+--+---+--+---+--+---+--+--+ | | | | | | | | | +--v------v------v------v------v--+ | | | Data Stores | | | | StackGraph VictoriaMetrics | | | | (HBase/HDFS) ClickHouse | | | | Elasticsearch ZooKeeper | | | +---------------------------------+ | +------------------------------------------------+

Server Components (Distributed Mode)

In HA production deployments, the server runs in distributed mode with separate pods for each function. In non-HA setups, all functions consolidate into a single suse-observability-server pod.

Ingestion Receivers

In HA mode, receivers are split into three types: base (agent telemetry), logs (log data), and process-agent (process-level data). An OpenTelemetry Collector (suse-observability-otel-collector-0) handles OTLP data from instrumented applications.

Processing Processing Services

Individual services handle specific functions: Sync (topology synchronization), Health-Sync (health state computation), State (state management), Checks (monitor evaluation), Correlate (event correlation and problem grouping), Notification (alert delivery), and Slicing (data partitioning).

Serving API & UI

The API server handles all PromQL and topology queries. The UI is a static React application. The Router is an Envoy-based proxy that routes requests to the appropriate backend service. Default port: 8080.

Optional Anomaly Detection

Spotlight-based anomaly detection is available but disabled by default. It uses machine learning to detect deviations from normal metric patterns. Requires a separate anomaly detection chart (v5.2.0-snapshot.179). An AI Assistant and MCP Server are also included for natural-language querying.

04

Backing Services & Data Stores

SUSE Observability runs six major backing services, all deployed as part of the Helm chart. There is no external dependency on managed databases — everything runs inside the Kubernetes cluster.

Service Purpose Chart Version Pod Pattern
StackGraph (HBase + HDFS) Topology & configuration storage (versioned graph database) v0.2.128 *-hbase-stackgraph-0 (non-HA) or name-nodes, region servers, data-nodes, Tephra (HA)
VictoriaMetrics Metrics storage & query v0.8.53-stackstate.45 *-victoria-metrics-0-0, *-vmagent-0
ClickHouse Trace & OpenTelemetry data storage v3.6.9-suse-observability.21 *-clickhouse-shard0-N
Elasticsearch Events & logs storage v8.19.4-stackstate.18 *-elasticsearch-master-N
Kafka Message bus for in-transit topology & telemetry updates v19.1.3-suse-observability.20 *-kafka-N
ZooKeeper Service discovery, orchestration & failover coordination v8.1.2-suse-observability.18 *-zookeeper-N
MinIO S3-compatible object storage for backups v8.0.10-stackstate.25 Optional, for backup/restore

Backup Architecture

Backups are handled through a MinIO gateway that supports three storage backends: AWS S3, Azure Blob Storage, or Kubernetes PersistentVolumes.

Data StoreBackup TypeDefault ScheduleDefault Retention
StackGraphFull (single .graph file)Daily at 03:0030 days
VictoriaMetricsIncrementalHourly (staggered 25/35 min past)~14 days
ElasticsearchIncremental snapshotsDaily at 03:0030 days
ClickHouseFull + incrementalFull daily 00:45, incremental hourly~14 days
Not Backed Up

Kafka and ZooKeeper data are not backed up. Kafka holds only in-transit data that has temporary value. ZooKeeper holds master node negotiation state that is automatically recreated.

05

Agent Architecture

The SUSE Observability Agent is deployed on each observed cluster (not the server cluster) via Helm. It consists of four components that work together to collect topology, metrics, events, logs, traces, and network data.

DaemonSet Node Agent

Deployed as a DaemonSet on every node. Runs with hostNetwork: true to scrape open metrics from all pods, and hostPID: true to map processes to containers via cgroups. Injects eBPF programs into network namespaces to monitor workload communication and decode L7 protocols (TCP, HTTP/1.0, HTTP/1.1, TLS, Redis). Reads conntrack tables across all network namespaces for connection tracking. Requires securityContext.privileged: true.

Deployment Cluster Agent

A single instance per cluster. Communicates with the Kubernetes API to discover topology: clusters, nodes, namespaces, deployments, statefulsets, daemonsets, pods, services, configmaps, persistent volumes, ingresses, and their relationships. Requires ClusterRole and ClusterRoleBinding for API access.

Deployment Checks Agent

Runs health and diagnostic checks against the cluster. Evaluates the health of Kubernetes resources and reports status back to the SUSE Observability server. Works in conjunction with the monitors configured on the server side.

Dependency kube-state-metrics

Deployed as part of the agent Helm chart. Exposes Kubernetes object state as Prometheus-format metrics (pod status, deployment replicas, resource requests/limits, etc.). The Node Agent scrapes these metrics and forwards them to SUSE Observability.

Request Tracing (Cross-Service)

For tracing requests across service boundaries, load balancers, and service meshes, SUSE Observability can inject a sidecar proxy via a mutating webhook. The sidecar injects an X-Request-ID header into all HTTP traffic. This header is observed at both client and server endpoints, allowing SUSE Observability to map service dependencies across cluster boundaries.

  • Supported protocols: HTTP/1.0, HTTP/1.1 with keepAlive, unencrypted traffic, OpenSSL-encrypted traffic
  • Supported integrations: LinkerD service mesh, Envoy proxy, Istio EnvoyFilters
  • Resource overhead: 25–40 MB memory per pod for the sidecar proxy, plus variable CPU based on request volume
  • Annotation: http-header-injector.stackstate.io/inject: enabled

Supported Container Runtimes

  • ContainerD
  • CRI-O
06

Topology & Health Model

The topology-based health model is how SUSE Observability turns raw observability data into actionable insights. Every component in your infrastructure has a health state, and health propagates through the dependency graph to enable automatic root cause analysis.

Components & Relations

  • Component — any discrete element in your infrastructure: a pod, a node, a service, a deployment, a namespace, a PV, etc. Each has properties, telemetry bindings, and a health state.
  • Relation — a directed dependency between two components. The arrow indicates dependency direction: app → db means "app depends on db."

Health States

Each component has a computed health state based on monitors that evaluate metrics, topology, and metadata:

  • CLEAR (green) — component is healthy, all monitors pass
  • DEVIATING (orange) — component is deviating from expected behavior
  • CRITICAL (red) — component has a critical issue
  • UNKNOWN (gray) — no health data available

Health Propagation & Root Cause Analysis

Health propagates in the opposite direction to dependency arrows. If app → db and the database turns red, the app component's outer color turns red to indicate potential impact from an upstream dependency. The inner color shows the component's own health; the outer color shows propagated health from its dependencies.

How Root Cause Analysis Works

A problem groups related unhealthy components. The root cause is the unhealthy element at the bottom of the dependency chain. All other unhealthy elements that depend on the root cause are contributing causes. When health states change, root cause identification is automatically updated. A problem is considered resolved when all contributing and root cause elements return to CLEAR.

Out-of-the-Box Monitors

SUSE Observability ships with pre-configured monitors for common Kubernetes failure modes. Each monitor includes remediation guides that appear directly in the UI with step-by-step troubleshooting instructions. Monitors can be:

  • Metric-based — threshold and dynamic threshold monitors on metrics
  • Topology-based — validate topology structure and component properties (unique to SUSE Observability's 4T Monitors)
  • Derived state — monitors that derive health from related components
  • Custom — user-defined monitors via the UI or CLI, can target Prometheus metrics ingested via remote_write
07

Time-Travel Debugging

Time-travel is SUSE Observability's signature capability, built on the versioned graph database that preserves every topology state change. It operates on two independent time dimensions that can be controlled separately.

Dimension Topology Time

A specific moment in time for which you fetch a snapshot of your Kubernetes resources. When you select a topology time in the past, the interface reconstructs the exact infrastructure state at that moment — which pods existed, how they were connected, their configurations, and their health states. Even deleted pods are visible at their historical topology time.

Dimension Telemetry Interval

The time range for which you want to see telemetry data (metrics, events, logs, traces). This is independent of topology time. Maximum window is 6 months. Telemetry shown is filtered to only data related to components that existed at the selected topology time.

How It Works in Practice

  1. Incident occurs at 2:00 AM — you arrive at 9:00 AM to investigate
  2. Set topology time to 2:00 AM — the topology perspective reconstructs the exact state of your infrastructure at that time, including pods that may have been killed and restarted since then
  3. Set telemetry interval around 2:00 AM — see metrics, logs, events, and traces from that window
  4. Navigate the topology — follow the dependency graph from affected services to the root cause, seeing all associated telemetry for each component at that point in time
  5. Scrub through time — use the timeline at the bottom of the UI to move forward and backward, watching how the topology and health states changed
Key Insight

Traditional monitoring tools lose context when Kubernetes resources are ephemeral. A CrashLooping pod that was killed and replaced has its logs and metrics scattered or lost. SUSE Observability preserves the complete picture — the pod's topology position, its relationships, its logs, events, and metrics — accessible through time-travel up to the configured data retention period (default 30 days for production).

08

Installation & Deployment

SUSE Observability is deployed via Helm charts to a dedicated Kubernetes cluster (or namespace on an existing cluster). Installation takes approximately 30 minutes. Helm v3.13.1 or higher is required.

Step 1: Add Helm Repository

# Add the SUSE Observability Helm repo
helm repo add suse-observability \
  https://charts.rancher.com/server-charts/prime/suse-observability
helm repo update

Step 2: Create Namespace

kubectl create namespace suse-observability

Step 3: Create values.yaml

# values.yaml - Core configuration
global:
  suseObservability:
    license: "YOUR-LICENSE-KEY"          # From SUSE Customer Center
    baseUrl: "https://observability.example.com"  # External access URL
    adminPassword: "your-admin-password"  # Plain text or bcrypt hash
    sizing:
      profile: "150-ha"                  # See sizing profiles below
  # imageRegistry: "registry.example.com" # Optional: custom registry
  # storageClass: "gp3"                   # Optional: override default

Step 4: Deploy

# Install SUSE Observability
helm upgrade --install \
  --namespace suse-observability \
  --values values.yaml \
  suse-observability \
  suse-observability/suse-observability

# Verify installation
helm list --namespace suse-observability
kubectl get pods --namespace suse-observability

# Port-forward for local access
kubectl port-forward \
  service/suse-observability-suse-observability-router 8080:8080 \
  --namespace suse-observability

Step 5: Deploy Agent on Observed Clusters

After the server is running, navigate to StackPacks > Integrations > Kubernetes in the SUSE Observability UI. Create a new instance with a cluster identifier. The UI will generate a Helm command with pre-filled configuration:

# Generated by SUSE Observability UI (example)
helm upgrade --install \
  --namespace suse-observability \
  --create-namespace \
  --set-string 'stackstate.apiKey=YOUR-API-KEY' \
  --set-string 'stackstate.cluster.name=my-cluster' \
  --set-string 'stackstate.url=https://observability.example.com/receiver/stsAgent' \
  suse-observability-agent \
  suse-observability/suse-observability-agent

Sizing Profiles

ProfileObserved NodesHAUse Case
trialUp to 10NoEvaluation only
10-nonha10NoSmall / testing
20-nonha20NoSmall / testing
50-nonha50NoSmall / testing
100-nonha100NoSmall production
150-ha150Yes (3x replicas)Production
250-ha250YesProduction
500-ha500YesLarge production
4000-ha4,000YesEnterprise
Node Counting

An "observed node" is defined as 4 vCPUs + 16 GB memory. If your actual nodes are larger, they count as multiples. For example, a node with 12 vCPU / 48 GB counts as 3 observed nodes.

Air-Gapped Installation

For disconnected environments, pull all container images to a local registry and provide a local-docker-registry.yaml with global.imageRegistry set to your internal registry.

helm upgrade --install \
  --namespace suse-observability \
  --values local-docker-registry.yaml \
  --values values.yaml \
  suse-observability \
  suse-observability/suse-observability
09

Requirements & Sizing

Compute Requirements (Server Cluster)

ProfileCPU RequestsCPU LimitsMemory RequestsMemory LimitsStorage
trial7.0 cores15.1 cores22.7 Gi23.3 Gi163 GB
10-nonha7.0 cores15.1 cores22.7 Gi23.3 Gi358 GB
50-nonha14.0 cores28.8 cores30.9 Gi31.0 Gi~450 GB
100-nonha23.6 cores47.9 cores47.0 Gi47.2 Gi562 GB
150-ha49.6 cores105.2 cores127.0 Gi131.8 Gi2.8 TB
500-ha85.1 cores176.2 cores166.4 Gi171.2 Gi3.9 TB
4000-ha212.1 cores281.0 cores263.9 Gi321.7 Gi7.5 TB

Minimum Node Specifications

Deployment TypeMin vCPU/NodeMin Memory/Node
Non-HA (testing/small)4 vCPU8 GB
HA (up to 500 nodes)8 vCPU16 GB
HA (4000 nodes)16 vCPU32 GB

Kubernetes Compatibility

PlatformSupported Versions
Kubernetes1.25 through 1.33
OpenShift4.14 through 4.19
Rancher 2.11.xRKE2 v1.30.11+rke2r1
Rancher 2.12.xRKE2 v1.30.11+rke2r1
Rancher 2.13.xRKE2 v1.30.11, v1.31.13, v1.32.10 (+rke2r1)

Supported Kubernetes Distributions

  • Cloud managed: Amazon EKS, Azure AKS, Google GKE, Alibaba Cloud ACK
  • On-premises: RKE2, K3s, vanilla Kubernetes
  • Enterprise: OpenShift (4.14–4.19)
Storage Warning

NFS is not supported for storage provisioning due to the risk of data corruption. Use SSD/flash-based storage for production deployments. The default storage class is used unless global.storageClass is specified in values.yaml. ResourceQuota is not recommended as it may interfere with resource allocation.

Data Retention Defaults

  • Trial: 3 days
  • Production profiles: 30 days
  • SaaS (Cloud Observability): ~1 day for events/logs/metrics, ~12 hours for traces (default tier)

Other Requirements

  • Helm: v3.13.1 or higher (Helm 4 supported as of v2.8.0)
  • Ingress: An ingress controller or load balancer for external HTTPS access
  • Browsers: Chrome and Firefox
  • Authentication: OIDC, KeyCloak, Microsoft Entra ID, LDAP, file-based, or single-password
10

Integrations & Data Sources

SUSE Observability extends its functionality through StackPacks — plugin packages that provide automated integration with external systems. StackPacks come in two types: Add-ons (extend platform capabilities) and Integrations (connect to external data sources).

OpenTelemetry (Native)

SUSE Observability is OpenTelemetry-native. It includes an OpenTelemetry Collector (v0.108.0-stackstate.21) as a built-in component and accepts OTLP data (traces, metrics, logs) at dedicated API endpoints. The recommended architecture:

  1. Instrument applications with OpenTelemetry SDKs
  2. Deploy the OpenTelemetry Collector near instrumented applications to preprocess data (enrich with K8s labels, implement sampling)
  3. Forward to SUSE Observability's OTLP endpoints

Out-of-the-box capabilities include monitors for span error rates and duration metrics, metric bindings for span metrics, .NET and JVM memory metrics, and service overview pages.

Prometheus Integration

SUSE Observability exposes a Prometheus remote_write endpoint to mirror metrics from existing Prometheus instances:

# Add to your Prometheus config
remote_write:
  - url: https://<base-url>/receiver/prometheus/api/v1/write
    headers:
      sts-api-key: "<API-KEY>"
    # Or use basic_auth:
    # basic_auth:
    #   username: apikey
    #   password: "<API-KEY>"

This enables using existing Prometheus metrics in SUSE Observability's monitors and topology context without replacing your existing Prometheus setup.

Kubernetes StackPack

The core integration. Provides auto-discovery of all Kubernetes topology (clusters, nodes, namespaces, workloads, pods, services, etc.), pre-built monitors for common Kubernetes issues, and the agent deployment configuration. Multi-instance support allows monitoring multiple clusters from a single SUSE Observability server.

Other Integrations

  • Cloud providers: AWS StackPack (supports multiple AWS accounts), Azure, GCP
  • Alerting: Slack, Jira, custom webhooks
  • CI/CD: Integration with CI/CD pipelines for deployment correlation
  • Custom: StackPacks can be extended or new ones created for custom data sources
  • Splunk: Integration for log forwarding (v2.8.0 added improvements)
  • 40+ prebuilt dashboards for common Kubernetes monitoring scenarios
11

Rancher Integration

SUSE Observability is tightly integrated with SUSE Rancher Prime through a UI extension and shared RBAC. The observability license is included with Rancher Prime subscriptions.

Rancher Prime UI Extension

A Rancher Manager extension that integrates SUSE Observability health signals directly into the Rancher UI. Installation:

  1. Enable UI extensions from the Rancher UI
  2. Navigate to Extensions > Available
  3. Install the Observability extension
  4. Navigate to SUSE Observability > Configurations in the left panel
  5. Add the SUSE Observability server URL and credentials

Once configured, Rancher displays health indicators on every resource (cluster, node, workload, pod). Clicking a health indicator provides a direct link to SUSE Observability's detailed investigation view for that resource.

RBAC Integration

SUSE Observability supports Rancher RBAC, allowing you to map Rancher roles and permissions to SUSE Observability access levels. This means Rancher users see only the clusters and resources they have permission to view.

Complementing Existing Prometheus + Grafana

SUSE Observability does not replace Rancher's built-in Prometheus + Grafana monitoring stack. Instead, it complements it:

  • Prometheus + Grafana (Rancher Monitoring) — provides detailed metrics dashboards, PromQL queries, and alerting rules for specific metrics
  • SUSE Observability — adds topology awareness, cross-cluster correlation, root cause analysis, time-travel debugging, and the 4T data model
  • Connect them via Prometheus remote_write to feed Prometheus metrics into SUSE Observability's topology-correlated view
How They Work Together

Think of Prometheus + Grafana as your microscope (deep metrics analysis) and SUSE Observability as your map (understanding what is connected to what, what broke, and why). Rancher is the control plane that ties them together with unified RBAC and a single management interface.

12

Comparison & Licensing

SUSE Observability vs. Alternatives

CapabilitySUSE ObservabilityDatadogDynatracePrometheus + Grafana
Topology-based monitoring Core differentiator — auto-discovered versioned topology graph Service maps exist but not versioned/time-travel enabled Smartscape topology, AI-driven No built-in topology
Time-travel debugging Full infrastructure state reconstruction at any past moment Historical dashboards, no topology time-travel Session replay for user sessions, not infra topology Historical PromQL queries only
Root cause analysis Automatic via dependency graph traversal Watchdog AI-based correlation Davis AI engine (patented) Manual investigation
Deployment model Self-hosted (K8s) or SaaS SaaS only SaaS or Managed (on-prem available) Self-hosted
Open source Planned (SUSE committed to open-sourcing) No (agent is open-source) No Fully open-source (Apache 2.0)
Kubernetes-native Primary focus; deep K8s topology Strong K8s support, broader scope Strong K8s support, broader scope Excellent K8s integration
Pricing model Included with Rancher Prime, or SaaS per-host Per-host + per-feature add-ons Host Units (tied to RAM), complex Free (operational costs only)
OpenTelemetry Native OTLP support + built-in collector OTLP ingestion supported OTLP ingestion supported Via OTLP remote_write or Alloy
eBPF monitoring Built-in for L7 protocol decoding & network topology Yes (network monitoring) OneAgent uses eBPF Separate tools (Cilium, Pixie)

Unique Selling Points

  • Versioned topology — the only platform with a custom-built versioned graph database that stores every topology state change, enabling true time-travel debugging of infrastructure
  • 4T Monitors — monitors that can validate topology structure and properties, not just metric thresholds
  • Rancher-native — deep integration with the Rancher ecosystem, shared RBAC, included in Rancher Prime subscription
  • Self-hosted option — full on-premises deployment for organizations with data sovereignty requirements, unlike SaaS-only competitors
  • Open-source commitment — SUSE has committed to open-sourcing the platform

Licensing & Pricing

Included SUSE Rancher Prime

SUSE Observability is included with SUSE Rancher Prime subscriptions. The license key is available in the SUSE Customer Center under the Subscription tab, shown as "SUSE Observability" Registration Code. Valid for the duration of your Rancher Prime subscription.

SaaS SUSE Cloud Observability

Available on AWS Marketplace with pay-as-you-go pricing:

  • 10–100 hosts: $9.99/host/month (hourly billing, 10-host minimum = $99/mo base)
  • 100+ hosts: $8.99/host/month ($899/mo base)
  • Included: 5 GB logs + 5 GB metrics + 5 GB traces
  • Overage: $0.15/GB

Add-on Platform Optimization

"SUSE Platform Optimization" is a separate add-on that requires its own license. It provides cost optimization recommendations for Kubernetes workloads. Not included in the base Observability license.

Future Open Source

SUSE announced plans to open-source StackState/SUSE Observability. As of March 2026, this has not yet occurred, but SUSE has been contributing to CNCF observability projects (including a case study on Longhorn). No timeline for the open-source release has been published.

+

Version History

VersionDateNotable Changes
v2.8.117 Mar 2026Latest release (patch)
v2.8.003 Mar 2026Helm 4 support, simplified installation, Traefik ingress docs
v2.7.014 Jan 2026Feature release
v2.6.029 Sep 2025HBase 2.6.3 upgrade, global commonLabels, editable service monitors. Breaking: ClickHouse/ZooKeeper StatefulSet labels immutable
v2.5.008 Sep 2025Feature release
v2.4.025 Aug 2025Feature release
v2.3.030 Jan 2025Feature release (7 patch releases through v2.3.7)
v2.2.009 Dec 2024Feature release
v2.1.029 Oct 2024Feature release
v2.0.011 Sep 2024First SUSE-branded release, integrated with Rancher Prime 3.1