GitLab Production Architecture

Customer deployment reference — architecture, HA, storage, upgrades, runners, licensing & operations

01

Overview

GitLab is a complete DevOps platform delivered as a single application. It covers source code management, CI/CD, container registry, package registry, issue tracking, wikis, security scanning, and more. For customers, the pitch is consolidation — replace 5-10 separate tools with one.

Two deployment models: GitLab.com (SaaS, managed by GitLab Inc.) or self-managed (you run it). As a consultant, you'll almost always be dealing with self-managed deployments for enterprise customers who need data sovereignty, compliance, or customization.

SaaS GitLab.com

Hosted by GitLab Inc. Zero infrastructure to manage. Limited customization. Data lives on GitLab's infrastructure (GCP). Good for small teams or orgs without compliance constraints.

Self-Managed On your infra

Full control over data, network, and configuration. Required for air-gapped, compliance-heavy, or highly customized environments. You own the operations burden.

Key Insight

GitLab looks like one product but operates like a microservices platform. Scoping a deployment without understanding the component map leads to underestimating infrastructure needs by 3-5x.

02

Architecture

GitLab is composed of many internal services. Understanding the components matters because each one has its own failure mode and scaling profile.

+-----------------------------------------------------+ | Load Balancer | | (HTTPS termination) | +----------+--------------+--------------+------------+ | | | +-----v-----+ +-----v-----+ +-----v-----+ | NGINX | | NGINX | | NGINX | | Workhorse | | Workhorse | | Workhorse | | Puma | | Puma | | Puma | | Sidekiq | | Sidekiq | | Sidekiq | +--+--+--+---+ +--+--+--+---+ +--+--+--+---+ | | | | | | | | | +----v--+ +----+ | | | +----+ | +----+ | | | | | | | | +--v---+ +--------v----v--v--v----v---+ +v----------+ |Gitaly| | PostgreSQL (Patroni) | | Redis | |Cluster| | + PgBouncer | | (Sentinel) | +------+ +----------------------------+ +-----------+ | +-------v--------+ | Object Storage | | (S3 / GCS) | +----------------+
ComponentRoleNotes
Puma (Rails)Web application serverReplaced Unicorn. Handles UI and API requests.
SidekiqBackground job processingEmails, repo cleanup, CI pipeline processing. The silent workhorse.
GitalyGit storage RPC serviceAll Git operations go through Gitaly. Usually the bottleneck.
PostgreSQLPrimary databaseStores everything except Git data and file uploads.
RedisCaching + queuesSession data, Sidekiq queues, caching layer.
Object StorageArtifacts, uploads, LFS, packagesS3-compatible. Critical for anything beyond small deploys.
NGINXReverse proxyBundled. Terminates TLS, routes to Puma/Workhorse.
GitLab WorkhorseSmart reverse proxyHandles large file uploads, Git over HTTP. Offloads work from Puma.
PraefectGitaly cluster proxyRequired for Gitaly HA. Adds its own PostgreSQL database.
ConsulService discoveryUsed in HA setups for PostgreSQL failover coordination.
PgBouncerConnection poolerRequired in HA to manage PostgreSQL connection limits.
03

Deployment Models

Recommended Omnibus (Linux Package)

The most common method for self-managed. A single .deb or .rpm that bundles everything — PostgreSQL, Redis, Gitaly, NGINX, all of it. Configure via /etc/gitlab/gitlab.rb, then run gitlab-ctl reconfigure.

  • Pros: Simple to get started, well-documented, GitLab Support's preferred model
  • Cons: All services on one box by default. Scaling means splitting services across nodes manually.
  • Best for: Teams under ~2,000 users, or as the starting point for larger deployments

Cloud-Native Helm Chart (Kubernetes)

GitLab's official Helm chart deploys each component as a separate pod/deployment. Looks attractive on paper but adds significant operational complexity.

  • Pros: Auto-scaling for Puma/Sidekiq, cloud-native, works well with mature K8s platforms
  • Cons: Gitaly on K8s is not recommended for production. You'll likely still need VMs for Gitaly and PostgreSQL.
  • Best for: Large organizations (5,000+ users) with a dedicated platform team
Consultant Reality Check

Many customers ask for "GitLab on Kubernetes" because it sounds modern. Push back unless they have a mature K8s platform with persistent volume support, monitoring, and a team that can debug pod scheduling issues at 2 AM. Omnibus on VMs is boring but works.

Docker (Compose)

Technically supported but not recommended for production. Fine for demos, dev instances, or air-gapped evaluation environments. The image is large (2GB+) and bundles the same Omnibus components inside a container.

04

Omnibus & gitlab.rb

GitLab Omnibus is the official all-in-one installation package. It bundles GitLab and all its dependencies (Nginx, PostgreSQL, Redis, Puma, Sidekiq, Gitaly, Prometheus, etc.) into a single deb or rpm package managed by Chef under the hood.

Key commands

# Install GitLab EE (always install EE, even without a license)
sudo apt install gitlab-ee   # Debian/Ubuntu
sudo yum install gitlab-ee   # RHEL/CentOS

# Apply configuration changes
sudo gitlab-ctl reconfigure   # Runs Chef to converge config

# Service management
sudo gitlab-ctl status        # All service statuses
sudo gitlab-ctl restart       # Restart all services
sudo gitlab-ctl restart puma  # Restart specific service
sudo gitlab-ctl tail          # Tail all logs
sudo gitlab-ctl tail sidekiq  # Tail specific service logs

# Health checks
sudo gitlab-rake gitlab:check
sudo gitlab-rake gitlab:doctor:secrets

The gitlab.rb file

/etc/gitlab/gitlab.rb is the single source of truth for GitLab configuration. It's a Ruby file that defines every setting. After editing, run gitlab-ctl reconfigure to apply changes.

Essential Core Settings

# External URL (most important setting)
external_url 'https://gitlab.example.com'

# HTTPS with Let's Encrypt
letsencrypt['enable'] = true
letsencrypt['auto_renew'] = true

# Timezone
gitlab_rails['time_zone'] = 'America/Toronto'

Database PostgreSQL

# Use external PostgreSQL
postgresql['enable'] = false
gitlab_rails['db_host'] = 'pg.example.com'
gitlab_rails['db_port'] = 5432
gitlab_rails['db_database'] = 'gitlabhq_production'
gitlab_rails['db_username'] = 'gitlab'
gitlab_rails['db_password'] = 'secret'

Auth LDAP / SSO

# LDAP authentication
gitlab_rails['ldap_enabled'] = true
gitlab_rails['ldap_servers'] = {
  'main' => {
    'host' => 'ldap.example.com',
    'port' => 636,
    'encryption' => 'simple_tls',
    'bind_dn' => 'cn=gitlab,ou=apps,dc=example,dc=com',
    'password' => 'bind_password',
    'base' => 'ou=users,dc=example,dc=com'
  }
}

Storage Object Storage

# Consolidated object storage (S3/MinIO)
gitlab_rails['object_store']['enabled'] = true
gitlab_rails['object_store']['connection'] = {
  'provider' => 'AWS',
  'region' => 'us-east-1',
  'aws_access_key_id' => 'AKIA...',
  'aws_secret_access_key' => '...',
  'endpoint' => 'https://s3.example.com'
}
gitlab_rails['object_store']['objects']['artifacts']['bucket'] = 'gl-artifacts'
gitlab_rails['object_store']['objects']['lfs']['bucket'] = 'gl-lfs'
gitlab_rails['object_store']['objects']['uploads']['bucket'] = 'gl-uploads'

Configuration management

Regardless of deployment model, treat gitlab.rb as infrastructure-as-code:

  • Store it in a Git repo (not the GitLab instance itself — chicken-and-egg problem)
  • Use Ansible, Puppet, or Chef to manage it across nodes
  • After any change: sudo gitlab-ctl reconfigure — this is idempotent and safe to re-run
  • Some changes require a restart: sudo gitlab-ctl restart — the reconfigure output will tell you

Disabling unused bundled services

In multi-node setups, each node only runs specific services. Disable everything else:

# Example: Rails application node only
postgresql['enable'] = false
redis['enable'] = false
gitaly['enable'] = false
prometheus['enable'] = false
alertmanager['enable'] = false
grafana['enable'] = false
nginx['enable'] = true
puma['enable'] = true
sidekiq['enable'] = false
Common Pitfalls

Don't edit files in /var/opt/gitlab/ — they're generated by reconfigure and will be overwritten. Always edit /etc/gitlab/gitlab.rb. Back up gitlab.rb and gitlab-secrets.json — these are the two critical config files. Losing gitlab-secrets.json means losing access to encrypted database columns (CI variables, 2FA secrets, etc.).

05

GitLab Environment Toolkit (GET)

GET is GitLab's official Infrastructure as Code toolkit for deploying and managing GitLab reference architectures. It combines Terraform (for provisioning cloud infrastructure) and Ansible (for configuring GitLab components) to deploy fully operational 2K, 3K, 5K, 10K, 25K, and 50K reference architectures.

Terraform Infrastructure Provisioning

GET's Terraform modules provision VMs, load balancers, networking, object storage, and databases on AWS, GCP, and Azure. Each reference architecture has a pre-built variable file that maps directly to GitLab's published specs.

Ansible Configuration

After infrastructure is provisioned, GET's Ansible playbooks install and configure every GitLab component: Omnibus packages, Consul, Patroni, PgBouncer, Redis Sentinel, Praefect, Gitaly, Puma, Sidekiq, Prometheus, and Geo secondaries.

  • Supported architectures: 1K (single node), 2K, 3K, 5K, 10K, 25K, 50K
  • Cloud Native Hybrid: GET can deploy the hybrid model (Puma/Sidekiq in Kubernetes via Helm, stateful services on VMs)
  • Geo support: GET can provision and configure multi-site Geo deployments (primary + secondary sites)
  • Day 2 operations: Use GET Ansible playbooks for upgrades, scaling, and reconfiguration — not just initial deployment
  • Who uses it: GitLab Professional Services uses GET for customer deployments. It's also available to customers directly via the GitLab project.

Deploying with GET

The GitLab Environment Toolkit (GET) automates deployment of reference architectures 2K through 50K. GET uses Terraform to provision cloud infrastructure and Ansible to configure all GitLab components per the reference architecture specifications. It supports AWS, GCP, and Azure, including Cloud Native Hybrid variants and multi-site Geo deployments.

# Clone GET and configure for a 3K deployment on AWS
git clone https://gitlab.com/gitlab-org/gitlab-environment-toolkit.git
cd gitlab-environment-toolkit/terraform/environments

# Copy and customize the 3k template
cp -r 3k my-deployment
cd my-deployment
# Edit variables.tf for your AWS account, VPC, domain, etc.

# Provision infrastructure
terraform init && terraform apply

# Configure GitLab components via Ansible
cd ../../ansible
ansible-playbook -i environments/my-deployment/inventory playbooks/all.yml
Recommendation

For any deployment at 2K or above, use GET rather than manually provisioning infrastructure. It encodes GitLab's reference architecture best practices and eliminates configuration drift. Even for 1K deployments, GET's Ansible playbooks simplify initial setup and future upgrades.

06

High Availability & Reference Architectures

GitLab HA is not a checkbox — it's a significant architecture decision that roughly triples infrastructure cost and operational complexity.

Reference architecture tiers

GitLab publishes reference architectures sized by user count. These are the real-world minimum specs — don't go below them:

UsersNodesHA?Approx. vCPUs
Up to 1,0001No8
Up to 2,0003No24
Up to 3,000~7Yes48
Up to 5,000~10Yes72
Up to 10,000~13Yes128
Up to 50,000~20+Yes384+

What HA actually requires

Database PostgreSQL HA

Patroni cluster (3 nodes minimum) with Consul for leader election and PgBouncer for connection pooling. This is the most complex piece to set up and the most critical to get right.

Cache Redis HA

Redis Sentinel (3 nodes) or Redis Cluster. Handles session data, Sidekiq queues, and caching. Sentinel is simpler and usually sufficient.

Storage Gitaly Cluster

Praefect cluster: 3 Gitaly nodes + 3 Praefect nodes + dedicated PostgreSQL for Praefect. Provides synchronous replication of Git repositories. Adds write latency.

Application Web & Workers

Multiple Puma and Sidekiq nodes behind a load balancer. Object storage externalized to S3/GCS/Azure Blob — mandatory for HA.

Common Mistake

Customers say "we need HA" but actually need "we need backups and a 4-hour RTO." Full HA is expensive. A single Omnibus node with good backups and a tested restore procedure covers 80% of customers.

Reference Architectures

GitLab publishes tested reference architectures sized by user count. Each tier specifies exact node counts, CPU, and RAM per service.

TierUsersHA?NodesKey Characteristics
1K1,000No1Single node, all-in-one. Dev/small teams.
2K2,000No8Separated services, no HA. Cloud Native Hybrid available.
3K3,000Yes~28Smallest HA architecture. Most common production deployment.
5K5,000Yes~28Same node count as 3K, larger specs per node.
10K10,000Yes~35Split Redis (Cache + Persistent). 4 Sidekiq nodes.
25K25,000Yes~425 Puma nodes. Massive Gitaly specs (32 vCPU, 120 GB).
50K50,000Yes~4512 Puma nodes. Gitaly at 64 vCPU, 240 GB RAM per node.

3K Architecture (example)

The 3K is the most commonly deployed HA architecture and the smallest that provides full redundancy:

+---------------------------------------------------+ | External Load Balancer (1) | +----+-------------------+-------------------+------+ | | | +----v----+ +------v------+ +------v------+ | Puma (3)| | Sidekiq (2) | | Praefect (3)| | Rails | | Background | | Git routing | | 8 vCPU | | 4 vCPU | | 2 vCPU | +---------+ +-------------+ +------+------+ | | +----v----+ +-----------+ +-------+ +------v------+ |PgBouncer| |Consul (3) | |Redis | | Gitaly (3) | | (3) | |Service | | (3) | | 4 vCPU | | 2 vCPU | |Discovery | |Sentinel| | 15 GB RAM | +---------+ +-----------+ +-------+ +-------------+ | +----v--------+ +-------------------+ |PostgreSQL(3)| | Praefect PG (1+) | |Patroni HA | | Praefect metadata | | 2 vCPU | +-------------------+ +-------------+ + Internal Load Balancer (1) + Monitoring/Prometheus (1) = ~28 nodes total

Why 3K is the HA threshold

The jump from 2K to 3K is the most significant architectural change in GitLab's reference architectures. At 2K, services are separated across nodes but each runs as a single instance — one PostgreSQL, one Redis, one Gitaly. A single failure takes down that component.

At 3K, every critical component is fully clustered:

  • PostgreSQL: 3-node Patroni cluster with automatic leader election via Consul
  • Redis: 3-node Redis with Sentinel for automatic failover
  • Gitaly: 3-node Praefect cluster with synchronous replication
  • PgBouncer: 3 instances for connection pooling redundancy
  • Consul: 3-node cluster for service discovery and leader election
  • Puma (Rails): 3 application nodes behind a load balancer

Scaling beyond 3K (to 5K, 10K, 25K, 50K) follows a predictable pattern: the architecture shape stays the same but resources grow. From 3K to 5K, it's purely vertical scaling (same ~28 nodes, bigger specs). From 5K to 10K, Redis splits into separate Cache and Persistent clusters and Sidekiq scales from 2 to 4 nodes. From 10K upward, Puma and Gitaly nodes get progressively larger. The core clustered design established at 3K doesn't change — you're just adding capacity to existing clusters.

Supported modifications (all HA tiers)

  • Cloud Native Hybrid — run Puma and Sidekiq in Kubernetes (Helm), keep stateful services (PostgreSQL, Redis, Gitaly) on VMs or PaaS
  • External PostgreSQL — replace with Cloud SQL, RDS, or Azure Database
  • External Redis — replace with ElastiCache, Memorystore
  • External object storage — S3, GCS, Azure Blob for artifacts, LFS, uploads
  • Sharded Gitaly — use Gitaly shards instead of Praefect cluster
  • Scaled-down HA — use 3K architecture with reduced specs for fewer users who still need HA
Unsupported

Do not run stateful services in Kubernetes (PostgreSQL, Redis, Gitaly). This is explicitly unsupported. Do not use Redis Cluster mode (only Standalone or Sentinel HA). Amazon Aurora has limited support — it works for basic workflows but is incompatible with Geo and database load balancing.

Recommendation

For most customers needing HA, start with the 3K architecture even if they have fewer than 3,000 users. The 3K is the smallest HA tier, and sizing down is a supported modification. Overshooting slightly on infrastructure is much cheaper than a redesign later. Use GET for the deployment — it eliminates manual configuration errors and provides a repeatable, auditable process.

Geo (disaster recovery)

GitLab Geo provides read-only replicas in other regions. It's not multi-master — there is one primary and one or more secondaries.

Replication What replicates

  • Git repositories (via Gitaly/Praefect)
  • LFS objects, uploads, artifacts
  • Container registry images
  • Database (PostgreSQL streaming replication)
  • Design management files
  • Package registry (npm, Maven, etc.)

Gaps What doesn't replicate

  • CI/CD job logs (use object storage)
  • Some caches and session data
  • Terraform state files (Geo replication added in later 15.x/16.x releases; verify support for your version)
  • Pages deployments
  • External services (Elasticsearch, etc.)

Geo architecture

Primary Site (Active) Secondary Site (Read-only) ┌─────────────────────┐ ┌─────────────────────┐ │ Load Balancer │ │ Load Balancer │ │ ┌──────────────┐ │ │ ┌──────────────┐ │ │ │ Puma (Rails) │ │ │ │ Puma (Rails) │ │ │ │ Sidekiq │ │ │ │ Sidekiq │ │ │ └──────────────┘ │ │ └──────────────┘ │ │ ┌──────────────┐ │ stream │ ┌──────────────┐ │ │ │ PostgreSQL │──┼───────────►│ │ PostgreSQL │ │ │ │ (primary) │ │ repl. │ │ (read-only) │ │ │ └──────────────┘ │ │ └──────────────┘ │ │ ┌──────────────┐ │ sync │ ┌──────────────┐ │ │ │ Gitaly │──┼───────────►│ │ Gitaly │ │ │ └──────────────┘ │ │ └──────────────┘ │ │ ┌──────────────┐ │ sync │ ┌──────────────┐ │ │ │ Object Store │──┼───────────►│ │ Object Store │ │ │ └──────────────┘ │ │ └──────────────┘ │ └─────────────────────┘ └─────────────────────┘ ▲ writes ▲ reads (git clone) │ │ ────┴────────────────────────────┴──── Users

Failover

  • Promotion: Manual — run gitlab-ctl geo promote on the secondary. Not automatic.
  • DNS update required: After promotion, update DNS to point to the new primary. Plan for TTL propagation.
  • Data loss window: Depends on replication lag. Monitor geo_replication_lag — typically seconds to minutes.
  • Planned failover: Pause writes on primary, wait for sync, promote secondary. Near-zero data loss.
  • Unplanned failover: Promote immediately. Accept potential data loss equal to replication lag.

Geo use cases

  • Disaster recovery: Geographic redundancy for business continuity
  • Distributed teams: Developers clone from the nearest Geo secondary — faster git operations across regions
  • Data residency: Keep a read-only copy of data in a specific jurisdiction for compliance
  • Read offloading: Route CI/CD runner git clones to the secondary to reduce primary load
Key distinction

Geo is a DR solution, not an HA solution. It doesn't eliminate the need for local HA within each site. A common architecture: 3K HA at the primary site + 3K HA at the secondary Geo site. GET can provision both sites including Geo configuration.

Geo requirements

  • License: Premium or Ultimate required
  • PostgreSQL: Streaming replication between sites (not logical replication)
  • Network: Sites need reliable connectivity. Geo tolerates intermittent outages but replication lag will grow.
  • Object storage: Recommended to use separate buckets per site with cross-region replication (S3 CRR, GCS dual-region)
  • Identical GitLab versions: Primary and secondary must run the same GitLab version
07

Storage & Database

Git repository storage (Gitaly)

Gitaly stores Git repositories on local disk. This is almost always the storage bottleneck:

  • Use SSDs. NFS for Gitaly storage was deprecated in GitLab 14.0 and fully removed in GitLab 15.0 (end-of-life in 15.6).
  • Monitor disk IOPS and latency — Gitaly performance degrades non-linearly as disks fill
  • Large monorepos (10GB+) will require tuning Gitaly timeouts and resource limits
  • Gitaly Cluster (Praefect) provides replication but adds latency on writes due to synchronous replication
Gitaly Sizing

Gitaly is CPU and IOPS intensive, not just storage. A common mistake is provisioning large, slow disks. What you need is fast disks with low latency. Plan for 2-4x the raw repository size to account for pack files, temporary objects, and housekeeping operations.

PostgreSQL

GitLab generates significant database load, especially from Sidekiq. Plan for:

  • Minimum 5-10 GB RAM dedicated to PostgreSQL for 1,000+ user instances
  • shared_buffers = 25% of RAM, effective_cache_size = 75% of RAM as starting points
  • Regular VACUUM and ANALYZE — GitLab's background migrations can bloat tables
  • Connection pooling via PgBouncer is recommended even for non-HA setups above 1,000 users
  • GitLab requires PostgreSQL 14+ (as of GitLab 17.0); GitLab 18.0 requires PostgreSQL 16+. Always check the version requirements for your target GitLab version.

Object storage

Move these to object storage early — don't let them accumulate on local disk:

High Growth CI/CD Artifacts

These grow fast and are the #1 disk space consumer. Configure expiration policies aggressively — default retention is forever. Most artifacts are only useful for days or weeks.

Also Externalize Everything Else

  • LFS objects
  • Container registry layers
  • Package registry files
  • Uploads (attachments in issues/MRs)
  • Terraform state files
  • Dependency proxy cache

Configure the consolidated object storage setting in gitlab.rb — one S3 connection config for all object types rather than configuring each separately:

gitlab_rails['object_store']['enabled'] = true
gitlab_rails['object_store']['connection'] = {
  'provider' => 'AWS',
  'region' => 'us-east-1',
  'aws_access_key_id' => 'AKIA...',
  'aws_secret_access_key' => '...'
}
gitlab_rails['object_store']['objects']['artifacts']['bucket'] = 'gl-artifacts'
gitlab_rails['object_store']['objects']['lfs']['bucket'] = 'gl-lfs'
gitlab_rails['object_store']['objects']['uploads']['bucket'] = 'gl-uploads'
gitlab_rails['object_store']['objects']['packages']['bucket'] = 'gl-packages'
08

Backups

What's included

gitlab-backup create backs up:

  • Database (PostgreSQL dump)
  • Repositories (Git bundles)
  • Uploads, LFS, artifacts, packages, registry (if on local storage)
  • CI/CD secure files

What's NOT included

Critical gitlab-secrets.json

Encryption keys for CI/CD variables, runner tokens, 2FA secrets. Without this file, a restored instance cannot decrypt any encrypted data. Back it up separately and securely. Losing it means all encrypted data is irrecoverable.

Also Backup Configuration

  • /etc/gitlab/gitlab.rb — your configuration
  • TLS certificates
  • Object storage data (if using external S3) — needs separate backup
  • Custom Nginx configurations

Automated backup strategy

# Daily automated backup via cron
0 2 * * * /opt/gitlab/bin/gitlab-backup create CRON=1

# Backup gitlab.rb and secrets separately
0 2 * * * tar czf /var/opt/gitlab/backups/config_$(date +\%Y\%m\%d).tar.gz \
  /etc/gitlab/gitlab.rb /etc/gitlab/gitlab-secrets.json
  • Ship backups off-box (S3, NFS mount, rsync to another server)
  • Retain 7 daily + 4 weekly minimum
  • Test restores quarterly — an untested backup is not a backup
  • For large instances (500GB+ repo data), consider incremental backup strategies or Gitaly snapshots
  • Configure backup_keep_time in gitlab.rb to auto-prune old backups

Restore procedure

# 1. Install the EXACT same GitLab version as the backup
sudo apt-get install gitlab-ee=16.8.1-ee.0

# 2. Restore config files FIRST
sudo cp gitlab.rb /etc/gitlab/gitlab.rb
sudo cp gitlab-secrets.json /etc/gitlab/gitlab-secrets.json
sudo gitlab-ctl reconfigure

# 3. Stop data-writing services
sudo gitlab-ctl stop puma
sudo gitlab-ctl stop sidekiq
sudo gitlab-ctl status    # verify they're down

# 4. Restore from backup
sudo gitlab-backup restore BACKUP=1710000000_2024_03_10_16.8.1-ee

# 5. Reconfigure and restart
sudo gitlab-ctl reconfigure
sudo gitlab-ctl restart

# 6. Verify
sudo gitlab-rake gitlab:check SANITIZE=true
sudo gitlab-rake gitlab:artifacts:check
sudo gitlab-rake gitlab:lfs:check
Critical

The backup and restore must be on the exact same GitLab version. You cannot restore a 16.5 backup to a 16.8 instance. Install the matching version first, restore, then upgrade.

09

Upgrades & Rollbacks

GitLab releases monthly (X.Y) with patch releases in between. Upgrades are where most consultant engagements go wrong.

Upgrade rules

  • You cannot skip major versions. To go from 14.x to 16.x, you must pass through 15.x.
  • Required upgrade stops: Certain versions have mandatory database migrations. GitLab documents these as "required upgrade stops." You must land on these versions and let background migrations complete before continuing.
  • Check the upgrade path tool: GitLab provides an upgrade path calculator. Use it every time.
  • Background migrations must finish before proceeding to the next version. Check with gitlab-rake db:migrate:status or the Admin UI.

Upgrade process (Omnibus)

# 1. Backup (always)
sudo gitlab-backup create

# 2. Check current background migrations
sudo gitlab-rails runner -e production \
  'puts Gitlab::BackgroundMigration.remaining'

# 3. Update the package
sudo apt-get update
sudo apt-get install gitlab-ee=16.8.1-ee.0    # pin the version

# 4. Reconfigure triggers automatically, but verify
sudo gitlab-ctl reconfigure
sudo gitlab-ctl status

# 5. Check background migrations again
sudo gitlab-rails runner -e production \
  'puts Gitlab::BackgroundMigration.remaining'

# 6. Smoke test: login, push, CI pipeline, MR creation

Multi-hop upgrades

When a customer is many versions behind, the upgrade becomes a multi-day project:

  • Map out every required stop from current to target version
  • At each stop: upgrade, run reconfigure, wait for background migrations to complete (can take hours on large instances)
  • Monitor /admin/background_migrations in the UI or gitlab-rails runner
  • Do NOT proceed to the next version while batched background migrations are running
Time Estimate

A single-version hop on a small instance takes 15-30 minutes. A multi-hop upgrade (e.g., 14.0 → 16.8) across required stops can take a full day including validation. Never schedule a multi-hop upgrade in a 2-hour maintenance window.

Upgrade on Kubernetes (Helm)

Helm upgrades follow a similar pattern but with extra considerations:

  • Pre-upgrade: helm diff upgrade to preview changes
  • Database migrations run as a Kubernetes Job (the migrations pod)
  • Scale down webservice and sidekiq deployments before the migration job if you want zero risk of interference
  • After migration completes, the new pods roll out automatically
  • Rollback via helm rollback does not revert the database — same limitation as Omnibus

Rollbacks

GitLab's official stance: rollbacks are not supported once database migrations have run. This is important to communicate to customers clearly.

Best Option VM Snapshot

For VM-based deployments, take a full VM snapshot before upgrading. This is the fastest rollback path — revert the entire machine to pre-upgrade state in minutes.

Alternative Backup Restore

Restore from the backup taken before upgrading. Slower (can take hours for large instances) but works when VM snapshots aren't available.

  • Never attempt to downgrade the package version without restoring the database. The schema will be incompatible.
  • For Kubernetes: Helm rollback won't revert the database — you still need a database backup.
  • GitLab database migrations are forward-only. There is no gitlab-rake db:rollback that will safely undo a version upgrade.
Non-negotiable

Every upgrade engagement must have a documented rollback plan before starting. The customer must agree to the rollback method (snapshot vs. backup restore) and the acceptable data loss window (RPO).

10

Monitoring

GitLab ships with a built-in Prometheus and Grafana stack (optional, enabled in gitlab.rb). If the customer already runs Prometheus/Grafana, export GitLab metrics to their stack rather than running a parallel monitoring system.

Key metrics & alerting

MetricAlert ThresholdWhy
Gitaly request duration (p95)> 5sGit operations are slow; users will notice immediately
Sidekiq queue depth> 1,000 jobsBackground processing falling behind; CI pipelines will stall
Puma active workers> 90% capacityWeb requests will queue; users see slow page loads
PostgreSQL connections> 80% of maxConnection exhaustion = hard outage
Disk usage (Gitaly, PG)> 80%Full disk on Gitaly = repository corruption risk
Background migration count> 0 for 24hStalled migration blocks upgrades
Workhorse queue time> 30sRequests are backing up before reaching Puma
Registry disk/storage> 80%Container registry full = broken CI/CD push steps

Enabling monitoring

# In gitlab.rb
prometheus_monitoring['enable'] = true
# Note: Bundled Grafana was removed in GitLab 16.3.
# Use an external Grafana instance with GitLab's Prometheus as a datasource.

# Expose metrics to external Prometheus instead
gitlab_rails['monitoring_allowlist'] = ['10.0.0.0/8']
prometheus['listen_address'] = '0.0.0.0:9090'

Log analysis

GitLab writes structured JSON logs to /var/log/gitlab/. Key log files:

  • gitlab-rails/production_json.log — web requests with timing breakdowns
  • sidekiq/current — background job execution and failures
  • gitaly/current — Git RPC operations and durations
  • nginx/gitlab_access.log — raw HTTP access

Ship to ELK, Loki, or Datadog for centralized analysis. The JSON format makes parsing straightforward.

11

CI/CD Runners

Runners are the execution layer for CI/CD pipelines. They are separate infrastructure from GitLab itself and often account for more compute spend than the GitLab server.

Recommended Docker Executor

Each job runs in a fresh container. Clean environment, reproducible, easy to manage. The default choice for most deployments.

Auto-scaling Kubernetes Executor

Each job is a K8s pod. Auto-scales with cluster capacity. Best for large, variable workloads. Requires a K8s cluster dedicated to (or shared with) CI/CD.

Legacy Shell Executor

Jobs run directly on the runner host. No isolation. Only use for specific needs (e.g., hardware access, GPU, bare-metal builds). Security risk — one job can affect another.

Specialized Docker Machine (Auto-scale)

Spins up cloud VMs on demand for each job. Cost-effective for burst workloads. But Docker Machine is deprecated (GitLab 17.5, removal in 20.0) — migrate to the Docker Autoscaler executor (uses the fleeting library and taskscaler) or the Kubernetes executor instead.

Runner architecture decisions

  • Shared vs. group vs. project runners: Shared runners serve all projects (convenience), group/project runners are scoped (security, isolation). Use project runners for sensitive builds (signing, deploying to prod).
  • Tags: Use tags to route jobs to specific runners. E.g., docker, gpu, deploy-prod. Without tags, jobs go to any available shared runner.
  • Concurrency: Set concurrent in config.toml based on available CPU/memory. Overcommitting causes OOM kills and flaky pipelines.
  • Caching: Configure distributed cache (S3/GCS) rather than local filesystem cache. Local cache doesn't survive runner restarts and doesn't work with auto-scaling.

Runner registration (GitLab 16+)

GitLab 16.0 introduced runner authentication tokens replacing the old registration token model (deprecated in 16.2, legacy workflow disabled by default in 17.0, removal planned for 18.0):

# New method (16+): create runner in UI/API, get auth token
gitlab-runner register \
  --url https://gitlab.example.com \
  --token glrt-XXXXXXXXXXXXXXXXXXXX \
  --executor docker \
  --docker-image alpine:latest
Sizing Rule of Thumb

Start with 2 vCPU and 4 GB RAM per concurrent job for general-purpose CI. Java/Docker-in-Docker builds need 4+ vCPU and 8+ GB. Monitor and adjust — runner sizing is always iterative.

12

Security Scanning

GitLab Ultimate includes a comprehensive DevSecOps security scanning suite integrated directly into the CI/CD pipeline. Scan results appear in merge requests and aggregate into a centralized Vulnerability Dashboard.

Code SAST

Static Application Security Testing. Analyzes source code for vulnerabilities without executing it. Supports 20+ languages via Semgrep-based rules. Advanced SAST (Ultimate, cross-function/cross-file taint analysis) provides higher-quality results with fewer false positives. Runs on every commit.

Runtime DAST

Dynamic Application Security Testing. Sends real HTTP requests to a running application to find runtime vulnerabilities like XSS, SQLi, and CSRF.

Dependencies Dependency Scanning

Checks project dependencies against known vulnerability databases (CVE, NVD). Finds vulnerable libraries before they reach production.

Images Container Scanning

Scans Docker images for OS-level and language-level package vulnerabilities. Integrates with the GitLab Container Registry.

Secrets Secret Detection

Scans commits for accidentally committed secrets — API keys, passwords, tokens, private keys. Can run pre-commit (client-side) and in the pipeline.

Compliance SBOM

Software Bill of Materials. Generates a machine-readable inventory of all components and dependencies. Required by many compliance frameworks (NIST, EO 14028).

Vulnerability Dashboard

All scanner results feed into the Vulnerability Report at the project, group, and instance level:

  • Merge request widget — new vulnerabilities introduced by the MR are surfaced inline, with severity and description. Reviewers can see security impact before approving.
  • Vulnerability list — filterable by scanner, severity (Critical/High/Medium/Low), status (Detected/Confirmed/Dismissed/Resolved), and project.
  • Security policies — enforce rules like "block merge if Critical SAST findings exist" or "require security approval for High+ vulnerabilities." Policies are defined as YAML in a separate security policy project.
  • Compliance dashboard — track which projects have scanning enabled, which have unresolved findings, and overall security posture across the organization.

Pipeline integration

# Include security scanning templates in .gitlab-ci.yml
include:
  - template: Security/SAST.gitlab-ci.yml
  - template: Security/DAST.gitlab-ci.yml
  - template: Security/Dependency-Scanning.gitlab-ci.yml
  - template: Security/Container-Scanning.gitlab-ci.yml
  - template: Security/Secret-Detection.gitlab-ci.yml

# Scanners run automatically in the "test" stage
# Results are uploaded as CI artifacts and parsed by GitLab
Adoption Strategy

Don't enable all scanners at once — the initial vulnerability flood overwhelms teams. Start with Secret Detection (immediate, actionable wins) and Dependency Scanning (known CVEs, easy to prioritize). Add SAST once the team has a triage workflow. DAST comes last — it requires a running environment and produces noisier results.

Licensing

Security scanning features require GitLab Ultimate. Basic pipeline Secret Detection scanning (JSON artifacts) is available in all tiers, but the merge request widget, vulnerability dashboard, security policies, and pre-receive (push) secret blocking all require Ultimate.

13

Migrations

Migrating to GitLab from other platforms is a common engagement. GitLab provides built-in importers for most sources, and GitLab Professional Services offers Congregate for large-scale migrations.

Built-in GitHub

Built-in importer. Migrates repos, issues, PRs, labels, milestones, releases. Supports GitHub.com and GitHub Enterprise.

Built-in Bitbucket

Built-in importer for Bitbucket Cloud and Server. Repos, PRs, issues. Server version requires API access.

Tool Jira

Use jira2gitlab (open source) or Jira2Lab (GitLab PS). Migrates issues, comments, attachments, labels, epics, worklogs. Jira Server only.

Tool Jenkins

Largely manual conversion of Jenkinsfiles to .gitlab-ci.yml. GitLab provides a migration guide, syntax comparison, and a JenkinsFile Wrapper to run Jenkins jobs inside GitLab CI during transition. Community CLI converters exist but handle only simple cases.

Built-in SVN

Use git svn to convert SVN history to Git, then push to GitLab. Preserves commit history, branches, tags. Plan for large repos.

Built-in Mercurial

Use hg-fast-export or hg-git to convert Mercurial repos to Git. Push to GitLab. Branch/bookmark mapping needs planning.

GitLab-to-GitLab migrations

Migrating between GitLab instances is increasingly common — consolidating instances, moving to SaaS, or upgrading editions. The method depends on the source and destination.

Simple CE to EE

A package swap, not a data migration. Install the gitlab-ee package at the same version as CE, run gitlab-ctl reconfigure, then add your license. All data, repos, and config persist. Takes minutes.

  • Always install EE from day one — EE without a license runs identically to CE
  • One-way: reverting EE back to CE risks database migration conflicts (EE adds tables and columns that CE doesn't expect). Once EE, stay EE.

Common Self-Managed to Self-Managed

Consolidating multiple GitLab instances or migrating to new infrastructure.

  • Direct Transfer — preferred for selective migration. HTTPS API-based, requires 16.8+ on both instances.
  • Backup & Restore — full instance copy. Best for 1:1 server replacement. gitlab-backup create → transfer → gitlab-backup restore.
  • Congregate — for large-scale migrations (100+ projects). Orchestrates in waves, handles objects Direct Transfer misses.

Complex Self-Managed to SaaS

Moving to GitLab.com. Direct Transfer works without admin access on the SaaS side — you only need Owner role on the destination group. Users must already exist on GitLab.com (mapped by email, never auto-created).

  • LDAP: not available on SaaS. SAML: available as Group SAML (Premium+) for the top-level group only. Use SCIM for automated user provisioning.
  • 10 GB repo limit on GitLab.com (upgradeable)
  • For large migrations, engage GitLab PS with Congregate

Complex SaaS to Self-Managed

Direct Transfer works bidirectionally — SaaS to self-managed is supported. Destination must be 16.8+ with Direct Transfer enabled in admin settings.

  • User contributions may not map correctly if public emails don't match between instances
  • File export/import is the fallback for selective project migration

Direct Transfer

Direct Transfer (formerly "Bulk Import") is GitLab's native migration mechanism. It transfers groups and projects between any two GitLab instances (self-managed or SaaS) via HTTPS API calls. Both instances should be on 16.8+, with the source no more than 2 minor versions behind the destination. Direct Transfer reached General Availability (GA) in Q2 2025.

TransfersDoes NOT transfer
Repos, issues, MRs, labels, milestones, boards, epics, wikis, releases, snippets, CI pipeline history, comments, members, badges CI/CD variables, deploy tokens, webhooks, container registry images, runners, job artifacts, Pages domains, approval rules, feature flags

Congregate

Congregate is GitLab Professional Services' migration automation tool. It now uses Direct Transfer under the hood for core migration, then supplements it with API calls to handle objects that Direct Transfer doesn't cover (CI/CD variables, container registries, webhooks, deploy tokens). It orchestrates migrations in waves to manage API rate limits and allow staged cutover of large environments.

  • Supported paths: self→self, self→SaaS, SaaS→SaaS, plus GitHub, Bitbucket, Azure DevOps sources
  • Air-gapped: supports disconnected networks via two Congregate nodes with file-based transfer
  • Wave orchestration: breaks large migrations into manageable batches with progress tracking
  • Post-migration tasks: automatically handles objects excluded from Direct Transfer
  • Not self-service — requires engaging GitLab PS
Congregate vs Direct Transfer

For small migrations (< 50 projects), use Direct Transfer directly from the UI. For large migrations (100+ projects, multiple groups, complex user mappings), engage GitLab PS with Congregate. Congregate doesn't replace Direct Transfer — it wraps and extends it with orchestration, wave management, and coverage of objects Direct Transfer can't handle.

jira2gitlab

Open-source tool (swingbit/jira2gitlab) for migrating Jira Server projects to GitLab. Converts issues, comments (Jira markup to Markdown), attachments, labels, components, fix versions (to milestones), worklogs (to /spend commands), sub-tasks, epics, and issue relationships. Supports incremental/resumable imports. Jira Server 8.5.1+ only (not Jira Cloud). GitLab PS also offers Jira2Lab for enterprise-scale migrations.

Migration planning

  • User mapping — create a spreadsheet mapping source usernames to GitLab usernames. Pre-create accounts or configure SSO before migration.
  • Group structure — design the GitLab group/subgroup hierarchy before importing. Reorganizing after migration is painful.
  • Large repos — repos over 5 GB need special handling. Consider git-filter-repo to trim history if appropriate.
  • CI/CD conversion — the most labor-intensive part. Jenkins/CircleCI/Travis configs must be manually rewritten as .gitlab-ci.yml.
  • Cutover window — plan for a freeze period. Run incremental syncs beforehand, then do a final sync and cutover.
Migration Approach

For small migrations (< 50 repos), use built-in importers directly. For large migrations (100+ repos, multiple source systems), engage GitLab Professional Services with Congregate. For Jira, evaluate jira2gitlab for Jira Server or consider maintaining Jira alongside GitLab if using Jira Cloud (GitLab has a native Jira integration).

14

Licensing & Support

TierKey FeaturesSupportPrice Model
Free (CE)Core Git, CI/CD, registry, issue trackingCommunity onlyFree
Premium (EE)+ Merge approvals, epics, roadmaps, push rules, Geo, LDAP group sync24/5 supportPer-user/year
Ultimate (EE)+ Security scanning (SAST, DAST, dependency, container), compliance, value streams24/7 supportPer-user/year

Licensing details

  • Licensing is per-user, per-year. All billable users count — generally anyone who can log in. Bot users and service accounts are typically not billable. Guest users consume a seat on Free and Premium tiers but are free on Ultimate.
  • Premium is the sweet spot for most enterprises — Ultimate is worth it only if the customer will actually use the security scanning features.
  • Self-managed EE with an expired license degrades to CE features but keeps running. Data is not lost.
  • Geo (disaster recovery / read replicas across regions) requires Premium or Ultimate.
  • True-up: GitLab audits user counts. If you exceed your license count, you'll owe the difference at renewal. Monitor this proactively.

CE vs. EE binary

A common source of confusion:

  • GitLab CE (Community Edition) and EE (Enterprise Edition) are different packages
  • EE without a license runs with CE features — it's functionally identical to CE
  • Always install EE, even without a license. Migrating from CE to EE later requires a migration process. Installing EE from the start avoids this entirely. You can add a license later.
Recommendation

Always install the EE package from day one. It costs nothing without a license and avoids a painful CE→EE migration later when the customer inevitably decides they want Premium features.

15

Consultant's Checklist

Before proposing a GitLab deployment, get answers to these:

  1. How many users? — Determines reference architecture tier
  2. What's the RTO/RPO? — Determines if you need HA or just good backups
  3. Is there existing Kubernetes infrastructure? — Determines deployment model
  4. What's the largest repository? — Determines Gitaly sizing and tuning needs
  5. Do they need CI/CD? How many concurrent jobs? — Determines runner infrastructure (often more expensive than GitLab itself)
  6. Air-gapped or internet-connected? — Affects updates, runner images, dependency scanning
  7. Compliance requirements? — Determines license tier and audit log retention
  8. Existing SCM/CI tools? — Migration scope (repos, CI configs, issues, users)
  9. Authentication method? — LDAP, SAML, OIDC (OAuth 2.0 / JWT) — affects initial setup complexity
  10. Who will operate it day-to-day? — Determines how much automation and documentation you deliver