GitLab Production Architecture Guide

Overview

GitLab is a complete DevOps platform delivered as a single application. It covers source code management, CI/CD, container registry, package registry, issue tracking, wikis, security scanning, and more. For customers, the pitch is consolidation — replace 5-10 separate tools with one.

Two deployment models: GitLab.com (SaaS, managed by GitLab Inc.) or self-managed (you run it). As a consultant, you'll almost always be dealing with self-managed deployments for enterprise customers who need data sovereignty, compliance, or customization.

SaaS GitLab.com

Hosted by GitLab Inc. Zero infrastructure to manage. Limited customization. Data lives on GitLab's infrastructure (GCP). Good for small teams or orgs without compliance constraints.

Self-Managed On your infra

Full control over data, network, and configuration. Required for air-gapped, compliance-heavy, or highly customized environments. You own the operations burden.

Key Insight

GitLab looks like one product but operates like a microservices platform. Scoping a deployment without understanding the component map leads to underestimating infrastructure needs by 3-5x.

Architecture

GitLab is composed of many internal services. Understanding the components matters because each one has its own failure mode and scaling profile.

Component	Role	Notes
Puma (Rails)	Web application server	Replaced Unicorn. Handles UI and API requests.
Sidekiq	Background job processing	Emails, repo cleanup, CI pipeline processing. The silent workhorse.
Gitaly	Git storage RPC service	All Git operations go through Gitaly. Usually the bottleneck.
PostgreSQL	Primary database	Stores everything except Git data and file uploads.
Redis	Caching + queues	Session data, Sidekiq queues, caching layer.
Object Storage	Artifacts, uploads, LFS, packages	S3-compatible. Critical for anything beyond small deploys.
NGINX	Reverse proxy	Bundled. Terminates TLS, routes to Puma/Workhorse.
GitLab Workhorse	Smart reverse proxy	Handles large file uploads, Git over HTTP. Offloads work from Puma.
Praefect	Gitaly cluster proxy	Required for Gitaly HA. Adds its own PostgreSQL database.
Consul	Service discovery	Used in HA setups for PostgreSQL failover coordination.
PgBouncer	Connection pooler	Required in HA to manage PostgreSQL connection limits.

Deployment Models

Recommended Omnibus (Linux Package)

The most common method for self-managed. A single .deb or .rpm that bundles everything — PostgreSQL, Redis, Gitaly, NGINX, all of it. Configure via /etc/gitlab/gitlab.rb, then run gitlab-ctl reconfigure.

Pros: Simple to get started, well-documented, GitLab Support's preferred model
Cons: All services on one box by default. Scaling means splitting services across nodes manually.
Best for: Teams under ~2,000 users, or as the starting point for larger deployments

Cloud-Native Helm Chart (Kubernetes)

GitLab's official Helm chart deploys each component as a separate pod/deployment. Looks attractive on paper but adds significant operational complexity.

Pros: Auto-scaling for Puma/Sidekiq, cloud-native, works well with mature K8s platforms
Cons: Gitaly on K8s is not recommended for production. You'll likely still need VMs for Gitaly and PostgreSQL.
Best for: Large organizations (5,000+ users) with a dedicated platform team

Consultant Reality Check

Many customers ask for "GitLab on Kubernetes" because it sounds modern. Push back unless they have a mature K8s platform with persistent volume support, monitoring, and a team that can debug pod scheduling issues at 2 AM. Omnibus on VMs is boring but works.

Docker (Compose)

Technically supported but not recommended for production. Fine for demos, dev instances, or air-gapped evaluation environments. The image is large (2GB+) and bundles the same Omnibus components inside a container.

Omnibus & gitlab.rb

GitLab Omnibus is the official all-in-one installation package. It bundles GitLab and all its dependencies (Nginx, PostgreSQL, Redis, Puma, Sidekiq, Gitaly, Prometheus, etc.) into a single deb or rpm package managed by Chef under the hood.

Key commands

# Install GitLab EE (always install EE, even without a license)
sudo apt install gitlab-ee   # Debian/Ubuntu
sudo yum install gitlab-ee   # RHEL/CentOS

# Apply configuration changes
sudo gitlab-ctl reconfigure   # Runs Chef to converge config

# Service management
sudo gitlab-ctl status        # All service statuses
sudo gitlab-ctl restart       # Restart all services
sudo gitlab-ctl restart puma  # Restart specific service
sudo gitlab-ctl tail          # Tail all logs
sudo gitlab-ctl tail sidekiq  # Tail specific service logs

# Health checks
sudo gitlab-rake gitlab:check
sudo gitlab-rake gitlab:doctor:secrets

The gitlab.rb file

/etc/gitlab/gitlab.rb is the single source of truth for GitLab configuration. It's a Ruby file that defines every setting. After editing, run gitlab-ctl reconfigure to apply changes.

Essential Core Settings

# External URL (most important setting)
external_url 'https://gitlab.example.com'

# HTTPS with Let's Encrypt
letsencrypt['enable'] = true
letsencrypt['auto_renew'] = true

# Timezone
gitlab_rails['time_zone'] = 'America/Toronto'

Database PostgreSQL

# Use external PostgreSQL
postgresql['enable'] = false
gitlab_rails['db_host'] = 'pg.example.com'
gitlab_rails['db_port'] = 5432
gitlab_rails['db_database'] = 'gitlabhq_production'
gitlab_rails['db_username'] = 'gitlab'
gitlab_rails['db_password'] = 'secret'

Auth LDAP / SSO

# LDAP authentication
gitlab_rails['ldap_enabled'] = true
gitlab_rails['ldap_servers'] = {
  'main' => {
    'host' => 'ldap.example.com',
    'port' => 636,
    'encryption' => 'simple_tls',
    'bind_dn' => 'cn=gitlab,ou=apps,dc=example,dc=com',
    'password' => 'bind_password',
    'base' => 'ou=users,dc=example,dc=com'
  }
}

Storage Object Storage

# Consolidated object storage (S3/MinIO)
gitlab_rails['object_store']['enabled'] = true
gitlab_rails['object_store']['connection'] = {
  'provider' => 'AWS',
  'region' => 'us-east-1',
  'aws_access_key_id' => 'AKIA...',
  'aws_secret_access_key' => '...',
  'endpoint' => 'https://s3.example.com'
}
gitlab_rails['object_store']['objects']['artifacts']['bucket'] = 'gl-artifacts'
gitlab_rails['object_store']['objects']['lfs']['bucket'] = 'gl-lfs'
gitlab_rails['object_store']['objects']['uploads']['bucket'] = 'gl-uploads'

Configuration management

Regardless of deployment model, treat gitlab.rb as infrastructure-as-code:

Store it in a Git repo (not the GitLab instance itself — chicken-and-egg problem)
Use Ansible, Puppet, or Chef to manage it across nodes
After any change: sudo gitlab-ctl reconfigure — this is idempotent and safe to re-run
Some changes require a restart: sudo gitlab-ctl restart — the reconfigure output will tell you

Disabling unused bundled services

In multi-node setups, each node only runs specific services. Disable everything else:

# Example: Rails application node only
postgresql['enable'] = false
redis['enable'] = false
gitaly['enable'] = false
prometheus['enable'] = false
alertmanager['enable'] = false
grafana['enable'] = false
nginx['enable'] = true
puma['enable'] = true
sidekiq['enable'] = false

Common Pitfalls

Don't edit files in /var/opt/gitlab/ — they're generated by reconfigure and will be overwritten. Always edit /etc/gitlab/gitlab.rb. Back up gitlab.rb and gitlab-secrets.json — these are the two critical config files. Losing gitlab-secrets.json means losing access to encrypted database columns (CI variables, 2FA secrets, etc.).

GitLab Environment Toolkit (GET)

GET is GitLab's official Infrastructure as Code toolkit for deploying and managing GitLab reference architectures. It combines Terraform (for provisioning cloud infrastructure) and Ansible (for configuring GitLab components) to deploy fully operational 2K, 3K, 5K, 10K, 25K, and 50K reference architectures.

Terraform Infrastructure Provisioning

GET's Terraform modules provision VMs, load balancers, networking, object storage, and databases on AWS, GCP, and Azure. Each reference architecture has a pre-built variable file that maps directly to GitLab's published specs.

Ansible Configuration

After infrastructure is provisioned, GET's Ansible playbooks install and configure every GitLab component: Omnibus packages, Consul, Patroni, PgBouncer, Redis Sentinel, Praefect, Gitaly, Puma, Sidekiq, Prometheus, and Geo secondaries.

Supported architectures: 1K (single node), 2K, 3K, 5K, 10K, 25K, 50K
Cloud Native Hybrid: GET can deploy the hybrid model (Puma/Sidekiq in Kubernetes via Helm, stateful services on VMs)
Geo support: GET can provision and configure multi-site Geo deployments (primary + secondary sites)
Day 2 operations: Use GET Ansible playbooks for upgrades, scaling, and reconfiguration — not just initial deployment
Who uses it: GitLab Professional Services uses GET for customer deployments. It's also available to customers directly via the GitLab project.

Deploying with GET

The GitLab Environment Toolkit (GET) automates deployment of reference architectures 2K through 50K. GET uses Terraform to provision cloud infrastructure and Ansible to configure all GitLab components per the reference architecture specifications. It supports AWS, GCP, and Azure, including Cloud Native Hybrid variants and multi-site Geo deployments.

# Clone GET and configure for a 3K deployment on AWS
git clone https://gitlab.com/gitlab-org/gitlab-environment-toolkit.git
cd gitlab-environment-toolkit/terraform/environments

# Copy and customize the 3k template
cp -r 3k my-deployment
cd my-deployment
# Edit variables.tf for your AWS account, VPC, domain, etc.

# Provision infrastructure
terraform init && terraform apply

# Configure GitLab components via Ansible
cd ../../ansible
ansible-playbook -i environments/my-deployment/inventory playbooks/all.yml

Recommendation

For any deployment at 2K or above, use GET rather than manually provisioning infrastructure. It encodes GitLab's reference architecture best practices and eliminates configuration drift. Even for 1K deployments, GET's Ansible playbooks simplify initial setup and future upgrades.

High Availability & Reference Architectures

GitLab HA is not a checkbox — it's a significant architecture decision that roughly triples infrastructure cost and operational complexity.

Reference architecture tiers

GitLab publishes reference architectures sized by user count. These are the real-world minimum specs — don't go below them:

Users	Nodes	HA?	Approx. vCPUs
Up to 1,000	1	No	8
Up to 2,000	3	No	24
Up to 3,000	~7	Yes	48
Up to 5,000	~10	Yes	72
Up to 10,000	~13	Yes	128
Up to 50,000	~20+	Yes	384+

What HA actually requires

Database PostgreSQL HA

Patroni cluster (3 nodes minimum) with Consul for leader election and PgBouncer for connection pooling. This is the most complex piece to set up and the most critical to get right.

Cache Redis HA

Redis Sentinel (3 nodes) or Redis Cluster. Handles session data, Sidekiq queues, and caching. Sentinel is simpler and usually sufficient.

Storage Gitaly Cluster

Praefect cluster: 3 Gitaly nodes + 3 Praefect nodes + dedicated PostgreSQL for Praefect. Provides synchronous replication of Git repositories. Adds write latency.

Application Web & Workers

Multiple Puma and Sidekiq nodes behind a load balancer. Object storage externalized to S3/GCS/Azure Blob — mandatory for HA.

Common Mistake

Customers say "we need HA" but actually need "we need backups and a 4-hour RTO." Full HA is expensive. A single Omnibus node with good backups and a tested restore procedure covers 80% of customers.

Reference Architectures

GitLab publishes tested reference architectures sized by user count. Each tier specifies exact node counts, CPU, and RAM per service.

Tier	Users	HA?	Nodes	Key Characteristics
1K	1,000	No	1	Single node, all-in-one. Dev/small teams.
2K	2,000	No	8	Separated services, no HA. Cloud Native Hybrid available.
3K	3,000	Yes	~28	Smallest HA architecture. Most common production deployment.
5K	5,000	Yes	~28	Same node count as 3K, larger specs per node.
10K	10,000	Yes	~35	Split Redis (Cache + Persistent). 4 Sidekiq nodes.
25K	25,000	Yes	~42	5 Puma nodes. Massive Gitaly specs (32 vCPU, 120 GB).
50K	50,000	Yes	~45	12 Puma nodes. Gitaly at 64 vCPU, 240 GB RAM per node.

3K Architecture (example)

The 3K is the most commonly deployed HA architecture and the smallest that provides full redundancy:

Why 3K is the HA threshold

The jump from 2K to 3K is the most significant architectural change in GitLab's reference architectures. At 2K, services are separated across nodes but each runs as a single instance — one PostgreSQL, one Redis, one Gitaly. A single failure takes down that component.

At 3K, every critical component is fully clustered:

PostgreSQL: 3-node Patroni cluster with automatic leader election via Consul
Redis: 3-node Redis with Sentinel for automatic failover
Gitaly: 3-node Praefect cluster with synchronous replication
PgBouncer: 3 instances for connection pooling redundancy
Consul: 3-node cluster for service discovery and leader election
Puma (Rails): 3 application nodes behind a load balancer

Scaling beyond 3K (to 5K, 10K, 25K, 50K) follows a predictable pattern: the architecture shape stays the same but resources grow. From 3K to 5K, it's purely vertical scaling (same ~28 nodes, bigger specs). From 5K to 10K, Redis splits into separate Cache and Persistent clusters and Sidekiq scales from 2 to 4 nodes. From 10K upward, Puma and Gitaly nodes get progressively larger. The core clustered design established at 3K doesn't change — you're just adding capacity to existing clusters.

Supported modifications (all HA tiers)

Cloud Native Hybrid — run Puma and Sidekiq in Kubernetes (Helm), keep stateful services (PostgreSQL, Redis, Gitaly) on VMs or PaaS
External PostgreSQL — replace with Cloud SQL, RDS, or Azure Database
External Redis — replace with ElastiCache, Memorystore
External object storage — S3, GCS, Azure Blob for artifacts, LFS, uploads
Sharded Gitaly — use Gitaly shards instead of Praefect cluster
Scaled-down HA — use 3K architecture with reduced specs for fewer users who still need HA

Unsupported

Do not run stateful services in Kubernetes (PostgreSQL, Redis, Gitaly). This is explicitly unsupported. Do not use Redis Cluster mode (only Standalone or Sentinel HA). Amazon Aurora has limited support — it works for basic workflows but is incompatible with Geo and database load balancing.

Recommendation

For most customers needing HA, start with the 3K architecture even if they have fewer than 3,000 users. The 3K is the smallest HA tier, and sizing down is a supported modification. Overshooting slightly on infrastructure is much cheaper than a redesign later. Use GET for the deployment — it eliminates manual configuration errors and provides a repeatable, auditable process.

Geo (disaster recovery)

GitLab Geo provides read-only replicas in other regions. It's not multi-master — there is one primary and one or more secondaries.

Replication What replicates

Git repositories (via Gitaly/Praefect)
LFS objects, uploads, artifacts
Container registry images
Database (PostgreSQL streaming replication)
Design management files
Package registry (npm, Maven, etc.)

Gaps What doesn't replicate

CI/CD job logs (use object storage)
Some caches and session data
Terraform state files (Geo replication added in later 15.x/16.x releases; verify support for your version)
Pages deployments
External services (Elasticsearch, etc.)

Geo architecture

Primary Site (Active) Secondary Site (Read-only) ┌─────────────────────┐ ┌─────────────────────┐ │ Load Balancer │ │ Load Balancer │ │ ┌──────────────┐ │ │ ┌──────────────┐ │ │ │ Puma (Rails) │ │ │ │ Puma (Rails) │ │ │ │ Sidekiq │ │ │ │ Sidekiq │ │ │ └──────────────┘ │ │ └──────────────┘ │ │ ┌──────────────┐ │ stream │ ┌──────────────┐ │ │ │ PostgreSQL │──┼───────────►│ │ PostgreSQL │ │ │ │ (primary) │ │ repl. │ │ (read-only) │ │ │ └──────────────┘ │ │ └──────────────┘ │ │ ┌──────────────┐ │ sync │ ┌──────────────┐ │ │ │ Gitaly │──┼───────────►│ │ Gitaly │ │ │ └──────────────┘ │ │ └──────────────┘ │ │ ┌──────────────┐ │ sync │ ┌──────────────┐ │ │ │ Object Store │──┼───────────►│ │ Object Store │ │ │ └──────────────┘ │ │ └──────────────┘ │ └─────────────────────┘ └─────────────────────┘ ▲ writes ▲ reads (git clone) │ │ ────┴────────────────────────────┴──── Users

Failover

Promotion: Manual — run gitlab-ctl geo promote on the secondary. Not automatic.
DNS update required: After promotion, update DNS to point to the new primary. Plan for TTL propagation.
Data loss window: Depends on replication lag. Monitor geo_replication_lag — typically seconds to minutes.
Planned failover: Pause writes on primary, wait for sync, promote secondary. Near-zero data loss.
Unplanned failover: Promote immediately. Accept potential data loss equal to replication lag.

Geo use cases

Disaster recovery: Geographic redundancy for business continuity
Distributed teams: Developers clone from the nearest Geo secondary — faster git operations across regions
Data residency: Keep a read-only copy of data in a specific jurisdiction for compliance
Read offloading: Route CI/CD runner git clones to the secondary to reduce primary load

Key distinction

Geo is a DR solution, not an HA solution. It doesn't eliminate the need for local HA within each site. A common architecture: 3K HA at the primary site + 3K HA at the secondary Geo site. GET can provision both sites including Geo configuration.

Geo requirements

License: Premium or Ultimate required
PostgreSQL: Streaming replication between sites (not logical replication)
Network: Sites need reliable connectivity. Geo tolerates intermittent outages but replication lag will grow.
Object storage: Recommended to use separate buckets per site with cross-region replication (S3 CRR, GCS dual-region)
Identical GitLab versions: Primary and secondary must run the same GitLab version

Storage & Database

Git repository storage (Gitaly)

Gitaly stores Git repositories on local disk. This is almost always the storage bottleneck:

Use SSDs. NFS for Gitaly storage was deprecated in GitLab 14.0 and fully removed in GitLab 15.0 (end-of-life in 15.6).
Monitor disk IOPS and latency — Gitaly performance degrades non-linearly as disks fill
Large monorepos (10GB+) will require tuning Gitaly timeouts and resource limits
Gitaly Cluster (Praefect) provides replication but adds latency on writes due to synchronous replication

Gitaly Sizing

Gitaly is CPU and IOPS intensive, not just storage. A common mistake is provisioning large, slow disks. What you need is fast disks with low latency. Plan for 2-4x the raw repository size to account for pack files, temporary objects, and housekeeping operations.

PostgreSQL

GitLab generates significant database load, especially from Sidekiq. Plan for:

Minimum 5-10 GB RAM dedicated to PostgreSQL for 1,000+ user instances
shared_buffers = 25% of RAM, effective_cache_size = 75% of RAM as starting points
Regular VACUUM and ANALYZE — GitLab's background migrations can bloat tables
Connection pooling via PgBouncer is recommended even for non-HA setups above 1,000 users
GitLab requires PostgreSQL 14+ (as of GitLab 17.0); GitLab 18.0 requires PostgreSQL 16+. Always check the version requirements for your target GitLab version.

Object storage

Move these to object storage early — don't let them accumulate on local disk:

High Growth CI/CD Artifacts

These grow fast and are the #1 disk space consumer. Configure expiration policies aggressively — default retention is forever. Most artifacts are only useful for days or weeks.

Also Externalize Everything Else

LFS objects
Container registry layers
Package registry files
Uploads (attachments in issues/MRs)
Terraform state files
Dependency proxy cache

Configure the consolidated object storage setting in gitlab.rb — one S3 connection config for all object types rather than configuring each separately:

gitlab_rails['object_store']['enabled'] = true
gitlab_rails['object_store']['connection'] = {
  'provider' => 'AWS',
  'region' => 'us-east-1',
  'aws_access_key_id' => 'AKIA...',
  'aws_secret_access_key' => '...'
}
gitlab_rails['object_store']['objects']['artifacts']['bucket'] = 'gl-artifacts'
gitlab_rails['object_store']['objects']['lfs']['bucket'] = 'gl-lfs'
gitlab_rails['object_store']['objects']['uploads']['bucket'] = 'gl-uploads'
gitlab_rails['object_store']['objects']['packages']['bucket'] = 'gl-packages'

Backups

What's included

gitlab-backup create backs up:

Database (PostgreSQL dump)
Repositories (Git bundles)
Uploads, LFS, artifacts, packages, registry (if on local storage)
CI/CD secure files

What's NOT included

Critical gitlab-secrets.json

Encryption keys for CI/CD variables, runner tokens, 2FA secrets. Without this file, a restored instance cannot decrypt any encrypted data. Back it up separately and securely. Losing it means all encrypted data is irrecoverable.

Also Backup Configuration

/etc/gitlab/gitlab.rb — your configuration
TLS certificates
Object storage data (if using external S3) — needs separate backup
Custom Nginx configurations

Automated backup strategy

# Daily automated backup via cron
0 2 * * * /opt/gitlab/bin/gitlab-backup create CRON=1

# Backup gitlab.rb and secrets separately
0 2 * * * tar czf /var/opt/gitlab/backups/config_$(date +\%Y\%m\%d).tar.gz \
  /etc/gitlab/gitlab.rb /etc/gitlab/gitlab-secrets.json

Ship backups off-box (S3, NFS mount, rsync to another server)
Retain 7 daily + 4 weekly minimum
Test restores quarterly — an untested backup is not a backup
For large instances (500GB+ repo data), consider incremental backup strategies or Gitaly snapshots
Configure backup_keep_time in gitlab.rb to auto-prune old backups

Restore procedure

# 1. Install the EXACT same GitLab version as the backup
sudo apt-get install gitlab-ee=16.8.1-ee.0

# 2. Restore config files FIRST
sudo cp gitlab.rb /etc/gitlab/gitlab.rb
sudo cp gitlab-secrets.json /etc/gitlab/gitlab-secrets.json
sudo gitlab-ctl reconfigure

# 3. Stop data-writing services
sudo gitlab-ctl stop puma
sudo gitlab-ctl stop sidekiq
sudo gitlab-ctl status    # verify they're down

# 4. Restore from backup
sudo gitlab-backup restore BACKUP=1710000000_2024_03_10_16.8.1-ee

# 5. Reconfigure and restart
sudo gitlab-ctl reconfigure
sudo gitlab-ctl restart

# 6. Verify
sudo gitlab-rake gitlab:check SANITIZE=true
sudo gitlab-rake gitlab:artifacts:check
sudo gitlab-rake gitlab:lfs:check

Critical

The backup and restore must be on the exact same GitLab version. You cannot restore a 16.5 backup to a 16.8 instance. Install the matching version first, restore, then upgrade.

Upgrades & Rollbacks

GitLab releases monthly (X.Y) with patch releases in between. Upgrades are where most consultant engagements go wrong.

Upgrade rules

You cannot skip major versions. To go from 14.x to 16.x, you must pass through 15.x.
Required upgrade stops: Certain versions have mandatory database migrations. GitLab documents these as "required upgrade stops." You must land on these versions and let background migrations complete before continuing.
Check the upgrade path tool: GitLab provides an upgrade path calculator. Use it every time.
Background migrations must finish before proceeding to the next version. Check with gitlab-rake db:migrate:status or the Admin UI.

Upgrade process (Omnibus)

# 1. Backup (always)
sudo gitlab-backup create

# 2. Check current background migrations
sudo gitlab-rails runner -e production \
  'puts Gitlab::BackgroundMigration.remaining'

# 3. Update the package
sudo apt-get update
sudo apt-get install gitlab-ee=16.8.1-ee.0    # pin the version

# 4. Reconfigure triggers automatically, but verify
sudo gitlab-ctl reconfigure
sudo gitlab-ctl status

# 5. Check background migrations again
sudo gitlab-rails runner -e production \
  'puts Gitlab::BackgroundMigration.remaining'

# 6. Smoke test: login, push, CI pipeline, MR creation

Multi-hop upgrades

When a customer is many versions behind, the upgrade becomes a multi-day project:

Map out every required stop from current to target version
At each stop: upgrade, run reconfigure, wait for background migrations to complete (can take hours on large instances)
Monitor /admin/background_migrations in the UI or gitlab-rails runner
Do NOT proceed to the next version while batched background migrations are running

Time Estimate

A single-version hop on a small instance takes 15-30 minutes. A multi-hop upgrade (e.g., 14.0 → 16.8) across required stops can take a full day including validation. Never schedule a multi-hop upgrade in a 2-hour maintenance window.

Upgrade on Kubernetes (Helm)

Helm upgrades follow a similar pattern but with extra considerations:

Pre-upgrade: helm diff upgrade to preview changes
Database migrations run as a Kubernetes Job (the migrations pod)
Scale down webservice and sidekiq deployments before the migration job if you want zero risk of interference
After migration completes, the new pods roll out automatically
Rollback via helm rollback does not revert the database — same limitation as Omnibus

Rollbacks

GitLab's official stance: rollbacks are not supported once database migrations have run. This is important to communicate to customers clearly.

Best Option VM Snapshot

For VM-based deployments, take a full VM snapshot before upgrading. This is the fastest rollback path — revert the entire machine to pre-upgrade state in minutes.

Alternative Backup Restore

Restore from the backup taken before upgrading. Slower (can take hours for large instances) but works when VM snapshots aren't available.

Never attempt to downgrade the package version without restoring the database. The schema will be incompatible.
For Kubernetes: Helm rollback won't revert the database — you still need a database backup.
GitLab database migrations are forward-only. There is no gitlab-rake db:rollback that will safely undo a version upgrade.

Non-negotiable

Every upgrade engagement must have a documented rollback plan before starting. The customer must agree to the rollback method (snapshot vs. backup restore) and the acceptable data loss window (RPO).

Monitoring

GitLab ships with a built-in Prometheus and Grafana stack (optional, enabled in gitlab.rb). If the customer already runs Prometheus/Grafana, export GitLab metrics to their stack rather than running a parallel monitoring system.

Key metrics & alerting

Metric	Alert Threshold	Why
Gitaly request duration (p95)	> 5s	Git operations are slow; users will notice immediately
Sidekiq queue depth	> 1,000 jobs	Background processing falling behind; CI pipelines will stall
Puma active workers	> 90% capacity	Web requests will queue; users see slow page loads
PostgreSQL connections	> 80% of max	Connection exhaustion = hard outage
Disk usage (Gitaly, PG)	> 80%	Full disk on Gitaly = repository corruption risk
Background migration count	> 0 for 24h	Stalled migration blocks upgrades
Workhorse queue time	> 30s	Requests are backing up before reaching Puma
Registry disk/storage	> 80%	Container registry full = broken CI/CD push steps

Enabling monitoring

# In gitlab.rb
prometheus_monitoring['enable'] = true
# Note: Bundled Grafana was removed in GitLab 16.3.
# Use an external Grafana instance with GitLab's Prometheus as a datasource.

# Expose metrics to external Prometheus instead
gitlab_rails['monitoring_allowlist'] = ['10.0.0.0/8']
prometheus['listen_address'] = '0.0.0.0:9090'

Log analysis

GitLab writes structured JSON logs to /var/log/gitlab/. Key log files:

gitlab-rails/production_json.log — web requests with timing breakdowns
sidekiq/current — background job execution and failures
gitaly/current — Git RPC operations and durations
nginx/gitlab_access.log — raw HTTP access

Ship to ELK, Loki, or Datadog for centralized analysis. The JSON format makes parsing straightforward.

CI/CD Runners

Runners are the execution layer for CI/CD pipelines. They are separate infrastructure from GitLab itself and often account for more compute spend than the GitLab server.

Recommended Docker Executor

Each job runs in a fresh container. Clean environment, reproducible, easy to manage. The default choice for most deployments.

Auto-scaling Kubernetes Executor

Each job is a K8s pod. Auto-scales with cluster capacity. Best for large, variable workloads. Requires a K8s cluster dedicated to (or shared with) CI/CD.

Legacy Shell Executor

Jobs run directly on the runner host. No isolation. Only use for specific needs (e.g., hardware access, GPU, bare-metal builds). Security risk — one job can affect another.

Specialized Docker Machine (Auto-scale)

Spins up cloud VMs on demand for each job. Cost-effective for burst workloads. But Docker Machine is deprecated (GitLab 17.5, removal in 20.0) — migrate to the Docker Autoscaler executor (uses the fleeting library and taskscaler) or the Kubernetes executor instead.

Runner architecture decisions

Shared vs. group vs. project runners: Shared runners serve all projects (convenience), group/project runners are scoped (security, isolation). Use project runners for sensitive builds (signing, deploying to prod).
Tags: Use tags to route jobs to specific runners. E.g., docker, gpu, deploy-prod. Without tags, jobs go to any available shared runner.
Concurrency: Set concurrent in config.toml based on available CPU/memory. Overcommitting causes OOM kills and flaky pipelines.
Caching: Configure distributed cache (S3/GCS) rather than local filesystem cache. Local cache doesn't survive runner restarts and doesn't work with auto-scaling.

Runner registration (GitLab 16+)

GitLab 16.0 introduced runner authentication tokens replacing the old registration token model (deprecated in 16.2, legacy workflow disabled by default in 17.0, removal planned for 18.0):

# New method (16+): create runner in UI/API, get auth token
gitlab-runner register \
  --url https://gitlab.example.com \
  --token glrt-XXXXXXXXXXXXXXXXXXXX \
  --executor docker \
  --docker-image alpine:latest

Sizing Rule of Thumb

Start with 2 vCPU and 4 GB RAM per concurrent job for general-purpose CI. Java/Docker-in-Docker builds need 4+ vCPU and 8+ GB. Monitor and adjust — runner sizing is always iterative.

Security Scanning

GitLab Ultimate includes a comprehensive DevSecOps security scanning suite integrated directly into the CI/CD pipeline. Scan results appear in merge requests and aggregate into a centralized Vulnerability Dashboard.

Code SAST

Static Application Security Testing. Analyzes source code for vulnerabilities without executing it. Supports 20+ languages via Semgrep-based rules. Advanced SAST (Ultimate, cross-function/cross-file taint analysis) provides higher-quality results with fewer false positives. Runs on every commit.

Runtime DAST

Dynamic Application Security Testing. Sends real HTTP requests to a running application to find runtime vulnerabilities like XSS, SQLi, and CSRF.

Dependencies Dependency Scanning

Checks project dependencies against known vulnerability databases (CVE, NVD). Finds vulnerable libraries before they reach production.

Images Container Scanning

Scans Docker images for OS-level and language-level package vulnerabilities. Integrates with the GitLab Container Registry.

Secrets Secret Detection

Scans commits for accidentally committed secrets — API keys, passwords, tokens, private keys. Can run pre-commit (client-side) and in the pipeline.

Compliance SBOM

Software Bill of Materials. Generates a machine-readable inventory of all components and dependencies. Required by many compliance frameworks (NIST, EO 14028).

Vulnerability Dashboard

All scanner results feed into the Vulnerability Report at the project, group, and instance level:

Merge request widget — new vulnerabilities introduced by the MR are surfaced inline, with severity and description. Reviewers can see security impact before approving.
Vulnerability list — filterable by scanner, severity (Critical/High/Medium/Low), status (Detected/Confirmed/Dismissed/Resolved), and project.
Security policies — enforce rules like "block merge if Critical SAST findings exist" or "require security approval for High+ vulnerabilities." Policies are defined as YAML in a separate security policy project.
Compliance dashboard — track which projects have scanning enabled, which have unresolved findings, and overall security posture across the organization.

Pipeline integration

# Include security scanning templates in .gitlab-ci.yml
include:
  - template: Security/SAST.gitlab-ci.yml
  - template: Security/DAST.gitlab-ci.yml
  - template: Security/Dependency-Scanning.gitlab-ci.yml
  - template: Security/Container-Scanning.gitlab-ci.yml
  - template: Security/Secret-Detection.gitlab-ci.yml

# Scanners run automatically in the "test" stage
# Results are uploaded as CI artifacts and parsed by GitLab

Adoption Strategy

Don't enable all scanners at once — the initial vulnerability flood overwhelms teams. Start with Secret Detection (immediate, actionable wins) and Dependency Scanning (known CVEs, easy to prioritize). Add SAST once the team has a triage workflow. DAST comes last — it requires a running environment and produces noisier results.

Licensing

Security scanning features require GitLab Ultimate. Basic pipeline Secret Detection scanning (JSON artifacts) is available in all tiers, but the merge request widget, vulnerability dashboard, security policies, and pre-receive (push) secret blocking all require Ultimate.

Migrations

Migrating to GitLab from other platforms is a common engagement. GitLab provides built-in importers for most sources, and GitLab Professional Services offers Congregate for large-scale migrations.

Built-in GitHub

Built-in importer. Migrates repos, issues, PRs, labels, milestones, releases. Supports GitHub.com and GitHub Enterprise.

Built-in Bitbucket

Built-in importer for Bitbucket Cloud and Server. Repos, PRs, issues. Server version requires API access.

Tool Jira

Use jira2gitlab (open source) or Jira2Lab (GitLab PS). Migrates issues, comments, attachments, labels, epics, worklogs. Jira Server only.

Tool Jenkins

Largely manual conversion of Jenkinsfiles to .gitlab-ci.yml. GitLab provides a migration guide, syntax comparison, and a JenkinsFile Wrapper to run Jenkins jobs inside GitLab CI during transition. Community CLI converters exist but handle only simple cases.

Built-in SVN

Use git svn to convert SVN history to Git, then push to GitLab. Preserves commit history, branches, tags. Plan for large repos.

Built-in Mercurial

Use hg-fast-export or hg-git to convert Mercurial repos to Git. Push to GitLab. Branch/bookmark mapping needs planning.

GitLab-to-GitLab migrations

Migrating between GitLab instances is increasingly common — consolidating instances, moving to SaaS, or upgrading editions. The method depends on the source and destination.

Simple CE to EE

A package swap, not a data migration. Install the gitlab-ee package at the same version as CE, run gitlab-ctl reconfigure, then add your license. All data, repos, and config persist. Takes minutes.

Always install EE from day one — EE without a license runs identically to CE
One-way: reverting EE back to CE risks database migration conflicts (EE adds tables and columns that CE doesn't expect). Once EE, stay EE.

Common Self-Managed to Self-Managed

Consolidating multiple GitLab instances or migrating to new infrastructure.

Direct Transfer — preferred for selective migration. HTTPS API-based, requires 16.8+ on both instances.
Backup & Restore — full instance copy. Best for 1:1 server replacement. gitlab-backup create → transfer → gitlab-backup restore.
Congregate — for large-scale migrations (100+ projects). Orchestrates in waves, handles objects Direct Transfer misses.

Complex Self-Managed to SaaS

Moving to GitLab.com. Direct Transfer works without admin access on the SaaS side — you only need Owner role on the destination group. Users must already exist on GitLab.com (mapped by email, never auto-created).

LDAP: not available on SaaS. SAML: available as Group SAML (Premium+) for the top-level group only. Use SCIM for automated user provisioning.
10 GB repo limit on GitLab.com (upgradeable)
For large migrations, engage GitLab PS with Congregate

Complex SaaS to Self-Managed

Direct Transfer works bidirectionally — SaaS to self-managed is supported. Destination must be 16.8+ with Direct Transfer enabled in admin settings.

User contributions may not map correctly if public emails don't match between instances
File export/import is the fallback for selective project migration

Direct Transfer

Direct Transfer (formerly "Bulk Import") is GitLab's native migration mechanism. It transfers groups and projects between any two GitLab instances (self-managed or SaaS) via HTTPS API calls. Both instances should be on 16.8+, with the source no more than 2 minor versions behind the destination. Direct Transfer reached General Availability (GA) in Q2 2025.

Transfers	Does NOT transfer
Repos, issues, MRs, labels, milestones, boards, epics, wikis, releases, snippets, CI pipeline history, comments, members, badges	CI/CD variables, deploy tokens, webhooks, container registry images, runners, job artifacts, Pages domains, approval rules, feature flags

Congregate

Congregate is GitLab Professional Services' migration automation tool. It now uses Direct Transfer under the hood for core migration, then supplements it with API calls to handle objects that Direct Transfer doesn't cover (CI/CD variables, container registries, webhooks, deploy tokens). It orchestrates migrations in waves to manage API rate limits and allow staged cutover of large environments.

Supported paths: self→self, self→SaaS, SaaS→SaaS, plus GitHub, Bitbucket, Azure DevOps sources
Air-gapped: supports disconnected networks via two Congregate nodes with file-based transfer
Wave orchestration: breaks large migrations into manageable batches with progress tracking
Post-migration tasks: automatically handles objects excluded from Direct Transfer
Not self-service — requires engaging GitLab PS

Congregate vs Direct Transfer

For small migrations (< 50 projects), use Direct Transfer directly from the UI. For large migrations (100+ projects, multiple groups, complex user mappings), engage GitLab PS with Congregate. Congregate doesn't replace Direct Transfer — it wraps and extends it with orchestration, wave management, and coverage of objects Direct Transfer can't handle.

jira2gitlab

Open-source tool (swingbit/jira2gitlab) for migrating Jira Server projects to GitLab. Converts issues, comments (Jira markup to Markdown), attachments, labels, components, fix versions (to milestones), worklogs (to /spend commands), sub-tasks, epics, and issue relationships. Supports incremental/resumable imports. Jira Server 8.5.1+ only (not Jira Cloud). GitLab PS also offers Jira2Lab for enterprise-scale migrations.

Migration planning

User mapping — create a spreadsheet mapping source usernames to GitLab usernames. Pre-create accounts or configure SSO before migration.
Group structure — design the GitLab group/subgroup hierarchy before importing. Reorganizing after migration is painful.
Large repos — repos over 5 GB need special handling. Consider git-filter-repo to trim history if appropriate.
CI/CD conversion — the most labor-intensive part. Jenkins/CircleCI/Travis configs must be manually rewritten as .gitlab-ci.yml.
Cutover window — plan for a freeze period. Run incremental syncs beforehand, then do a final sync and cutover.

Migration Approach

For small migrations (< 50 repos), use built-in importers directly. For large migrations (100+ repos, multiple source systems), engage GitLab Professional Services with Congregate. For Jira, evaluate jira2gitlab for Jira Server or consider maintaining Jira alongside GitLab if using Jira Cloud (GitLab has a native Jira integration).

Licensing & Support

Tier	Key Features	Support	Price Model
Free (CE)	Core Git, CI/CD, registry, issue tracking	Community only	Free
Premium (EE)	+ Merge approvals, epics, roadmaps, push rules, Geo, LDAP group sync	24/5 support	Per-user/year
Ultimate (EE)	+ Security scanning (SAST, DAST, dependency, container), compliance, value streams	24/7 support	Per-user/year

Licensing details

Licensing is per-user, per-year. All billable users count — generally anyone who can log in. Bot users and service accounts are typically not billable. Guest users consume a seat on Free and Premium tiers but are free on Ultimate.
Premium is the sweet spot for most enterprises — Ultimate is worth it only if the customer will actually use the security scanning features.
Self-managed EE with an expired license degrades to CE features but keeps running. Data is not lost.
Geo (disaster recovery / read replicas across regions) requires Premium or Ultimate.
True-up: GitLab audits user counts. If you exceed your license count, you'll owe the difference at renewal. Monitor this proactively.

CE vs. EE binary

A common source of confusion:

GitLab CE (Community Edition) and EE (Enterprise Edition) are different packages
EE without a license runs with CE features — it's functionally identical to CE
Always install EE, even without a license. Migrating from CE to EE later requires a migration process. Installing EE from the start avoids this entirely. You can add a license later.

Recommendation

Always install the EE package from day one. It costs nothing without a license and avoids a painful CE→EE migration later when the customer inevitably decides they want Premium features.

Consultant's Checklist

Before proposing a GitLab deployment, get answers to these:

How many users? — Determines reference architecture tier
What's the RTO/RPO? — Determines if you need HA or just good backups
Is there existing Kubernetes infrastructure? — Determines deployment model
What's the largest repository? — Determines Gitaly sizing and tuning needs
Do they need CI/CD? How many concurrent jobs? — Determines runner infrastructure (often more expensive than GitLab itself)
Air-gapped or internet-connected? — Affects updates, runner images, dependency scanning
Compliance requirements? — Determines license tier and audit log retention
Existing SCM/CI tools? — Migration scope (repos, CI configs, issues, users)
Authentication method? — LDAP, SAML, OIDC (OAuth 2.0 / JWT) — affects initial setup complexity
Who will operate it day-to-day? — Determines how much automation and documentation you deliver

Gitaly

Git storage RPC service — the I/O backbone of every GitLab instance

What is Gitaly?

Gitaly is a gRPC-based service that provides a high-level interface to Git repository storage. Every Git operation in GitLab — clones, pushes, merges, diffs, blame, file browsing — is dispatched as a gRPC call to Gitaly. No GitLab component touches the filesystem directly anymore; Gitaly is the sole gateway to on-disk Git data.

Gitaly was introduced to replace direct NFS access to Git repositories. The old model (multiple Rails/Sidekiq nodes mounting the same NFS share) suffered from locking issues, cache coherency problems, and terrible performance at scale. Gitaly moves all filesystem access to a dedicated service with proper concurrency control.

Architecture

In a simple deployment, Gitaly runs on the same node as the rest of GitLab (Omnibus bundles it). In production, Gitaly typically runs on dedicated nodes with fast local SSDs:

Puma / Workhorse / Sidekiq → gRPC calls → Gitaly → local disk Git repositories
Communication is over gRPC with token-based authentication (gitaly['auth_token'] in gitlab.rb)
Each Gitaly node can host multiple storage paths (virtual storages), allowing you to spread repositories across disks

Gitaly Cluster (Praefect)

For high availability of Git repository data, GitLab provides Gitaly Cluster, which uses Praefect as a transparent proxy/router in front of multiple Gitaly nodes:

Praefect is a gRPC proxy that sits between GitLab application nodes and Gitaly nodes. It routes reads and writes, tracks replication state, and coordinates failover.
Replication model: Writes go to the primary Gitaly node first, then Praefect replicates to secondaries. You can configure strong consistency (synchronous — write acknowledged only after N replicas confirm) or eventual consistency (async replication).
Praefect PostgreSQL: Praefect maintains its own PostgreSQL database to track which repositories live on which nodes and their replication state. This is separate from GitLab's main PostgreSQL.
Minimum topology: 3 Praefect nodes (for Praefect HA) + 3 Gitaly nodes + 1 Praefect PostgreSQL (can be HA with Patroni).

GitLab App Nodes
      |
      v  (gRPC)
+------------------+
|    Praefect (3)   |  ← routes reads/writes, tracks replication
+--+------+------+--+
   |      |      |
   v      v      v   (gRPC)
Gitaly1  Gitaly2  Gitaly3   ← each stores full repo copies
  |        |        |
 SSD      SSD      SSD

Why Gitaly is usually the bottleneck

Disk I/O intensive: Git operations (pack-objects, diff, blame on large files) are fundamentally I/O-bound. Gitaly performance is directly tied to disk latency and throughput.
Large repositories: Monorepos or repos with extensive history generate massive pack operations. A single git clone of a 10GB repo can saturate disk I/O for minutes.
CPU for pack operations: git pack-objects (used during clone/fetch) is CPU-intensive. Multiple concurrent clones can overwhelm a Gitaly node.
Memory for pack windows: Git uses memory-mapped windows to compute deltas during packing. Large repos with many objects require significant RAM.

Sizing guidelines

SSDs are non-negotiable. NFS was deprecated in GitLab 14.0 and removed in 15.0. Use NVMe SSDs where possible.
Disk space: Plan for 2–4x the raw repository size (pack files, temporary objects, housekeeping)
CPU: 4–8 vCPUs per Gitaly node for up to 5,000 users; 16+ for larger deployments
RAM: 16–32 GB minimum. Git pack operations are memory-hungry.
Network: Gitaly should be on a low-latency link to app nodes. 1 Gbps minimum, 10 Gbps for large instances.

Common issues

High disk latency: The #1 cause of slow Git operations. Monitor gitaly_disk_* Prometheus metrics. Anything above 10ms average latency will degrade user experience.
Large push operations: Pushes with many refs or large packfiles can time out. Tune gitaly['configuration']['git']['config'] for pack.threads and transfer limits.
Repository housekeeping: GitLab runs periodic git gc and git repack via Sidekiq. These are I/O-heavy and can impact foreground operations. Schedule during off-peak hours.
Praefect replication lag: In eventual consistency mode, check praefect dataloss command to identify repositories that are under-replicated.

# Check Gitaly health
sudo gitlab-rake gitlab:gitaly:check

# Check Praefect replication status
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss

# Monitor Gitaly disk performance (Prometheus query)
# gitaly_disk_read_bytes_total, gitaly_disk_write_bytes_total
# histogram_quantile(0.95, rate(gitaly_grpc_server_handling_seconds_bucket[5m]))

Consultant tip: When scoping a GitLab deployment, always ask about the largest repository. A single 20 GB monorepo changes the entire Gitaly sizing conversation. If monorepos are in play, consider dedicated Gitaly nodes for those repositories using Gitaly storage weighting, and set aggressive git gc schedules.

Sidekiq

Ruby background job processor — GitLab's asynchronous workhorse

What is Sidekiq?

Sidekiq is a Ruby background job processing framework that uses Redis as its job queue. In GitLab, it handles everything that shouldn't block a web request — which turns out to be a substantial portion of GitLab's functionality. If Puma is the front door, Sidekiq is the entire back office.

What Sidekiq does in GitLab

Nearly every asynchronous operation in GitLab flows through Sidekiq:

CI/CD pipeline processing: Creating pipelines, scheduling jobs, processing pipeline status updates, sending notifications
Email delivery: Notification emails, approval requests, pipeline results
Repository maintenance: git gc, git repack, housekeeping tasks, repository size calculations
Webhooks: Delivering webhook payloads to external services on push, merge, pipeline events
Import/Export: Project imports (GitHub, Bitbucket), project exports, group migrations
Cache invalidation: Clearing stale caches when projects, users, or configurations change
Merge request processing: Diff generation, merge checks, approval rule evaluation
Security scanning: Processing SAST/DAST/dependency scan results and creating vulnerabilities

Queue routing & urgency

GitLab categorizes Sidekiq jobs by urgency, which determines queue routing and latency expectations:

High urgency: Must start within 10 seconds. Examples: pipeline creation, merge status updates. These affect user-perceived responsiveness.
Low urgency: Can tolerate minutes of queue time. Examples: email delivery, repository cleanup, export jobs.
Throttled: Rate-limited to prevent overload. Examples: bulk imports, large project exports.

In production HA deployments, you can run dedicated Sidekiq processes for specific queue groups:

# In gitlab.rb — dedicated Sidekiq workers by queue
sidekiq['queue_groups'] = [
  # Worker 1: high-urgency queues only
  'pipeline_processing,pipeline_creation,pipeline_default',
  # Worker 2: everything else
  '*'
]

# Or run separate Sidekiq nodes entirely:
# Node A (pipeline-focused):
sidekiq['queue_groups'] = ['pipeline_processing,pipeline_creation']
# Node B (general):
sidekiq['queue_groups'] = ['*']

Scaling Sidekiq

Vertical: Increase sidekiq['max_concurrency'] (default 20). Each concurrent thread holds a database connection, so this is bounded by PostgreSQL connection limits and PgBouncer pool size.
Horizontal: Add more Sidekiq processes on the same node (sidekiq['queue_groups'] with multiple entries) or add entirely separate Sidekiq nodes.
On Kubernetes: Scale the Sidekiq Deployment replicas independently from webservice pods. Use HPA based on queue depth metrics.
Monitor queue depth: The key metric is sidekiq_queue_size per queue. If high-urgency queues back up, users see delayed pipeline starts and slow MR updates.

Common issues

Queue backlogs: Usually caused by insufficient Sidekiq workers or a slow dependency (PostgreSQL, Gitaly, external webhooks). Check which queues are growing and what jobs are consuming time.
Memory bloat (Ruby GC): Ruby's garbage collector can cause Sidekiq processes to grow to several GB over time. Configure sidekiq['max_concurrency'] and use MALLOC_ARENA_MAX=2 to reduce memory fragmentation. GitLab also supports SIDEKIQ_MEMORY_KILLER_MAX_RSS to restart workers that exceed a memory threshold.
Stuck jobs: Jobs that hang (waiting on external services, deadlocked database queries) block their thread. Monitor sidekiq_running_jobs and set up alerts for jobs running longer than expected.
Redis memory pressure: All Sidekiq job data lives in Redis. A massive queue backlog can exhaust Redis memory. Monitor redis_used_memory_bytes and set Redis maxmemory with an appropriate eviction policy (though eviction will lose jobs).

Key metrics

sidekiq_queue_latency_seconds — time from enqueue to start of processing. High-urgency queues should be under 10s.
sidekiq_jobs_completion_seconds — how long jobs take to execute. Look for p95/p99 outliers.
sidekiq_jobs_failed_total — failed job rate. Transient failures retry automatically; persistent failures indicate bugs or infrastructure issues.
sidekiq_queue_size — number of jobs waiting per queue. The primary capacity signal.

Consultant tip: When a customer complains that "GitLab is slow" but the web UI loads fine, the problem is almost always Sidekiq. Check queue depths first. A backed-up pipeline_processing queue means CI/CD feels broken even though the UI works. The fix is usually more Sidekiq workers or resolving whatever downstream dependency (PostgreSQL, Gitaly) is causing jobs to run slowly.

Patroni

PostgreSQL HA with automatic failover via distributed consensus

What is Patroni?

Patroni is an open-source PostgreSQL high-availability solution that uses a Distributed Configuration Store (DCS) — typically etcd, Consul, or ZooKeeper — for leader election and cluster state management. It automates the hardest parts of PostgreSQL HA: leader election, automatic failover, and managed streaming replication.

In GitLab's architecture, Patroni replaces manual PostgreSQL replication setups. GitLab's Omnibus package bundles Patroni and uses Consul as the DCS by default.

How it works

Patroni agent runs on each PostgreSQL node as a sidecar/supervisor process. It manages the local PostgreSQL instance (start, stop, promote, configure replication).
Leader lock: The DCS (Consul in GitLab's case) holds a leader key. The node that holds the lock is the primary. If the primary fails to renew the lock (health check failure, network partition), another node acquires it and promotes itself.
Streaming replication: Replicas stream WAL (Write-Ahead Log) from the primary in real-time. Patroni configures and manages replication slots automatically.
Cluster bootstrap: Patroni handles initial cluster setup, including pg_basebackup for new replicas joining the cluster.

Architecture in GitLab

                 +-------------------+
                 |   Consul Cluster   |
                 |  (3 nodes, DCS)    |
                 +--------+----------+
                          |
            +-------------+-------------+
            |             |             |
    +-------v---+  +------v----+  +----v-------+
    |  Patroni   |  |  Patroni   |  |  Patroni   |
    | PostgreSQL |  | PostgreSQL |  | PostgreSQL |
    |  (Leader)  |  | (Replica)  |  | (Replica)  |
    +------+-----+  +-----------+  +-----------+
           |
    +------v------+
    |  PgBouncer   |  ← routes connections to current leader
    +--------------+

Consul cluster (3 nodes): Provides the distributed consensus layer. Patroni uses Consul's key/value store for the leader lock and cluster metadata.
Patroni nodes (3 minimum): One leader + two replicas. Each runs PostgreSQL with Patroni managing the lifecycle.
PgBouncer: Connection pooler that routes application connections to the current leader. Consul DNS or service discovery directs PgBouncer to the active primary.

Switchover vs. failover

Switchover (planned): Graceful leadership transfer for maintenance. Patroni ensures the best replica is caught up, then promotes it. Zero or near-zero downtime. Triggered via patronictl switchover.
Failover (unplanned): Primary dies or becomes unreachable. Patroni detects the failure (missed DCS heartbeat), selects the most up-to-date replica, and promotes it. Typical detection + promotion time: 10–30 seconds.

# GitLab Omnibus Patroni commands
sudo gitlab-ctl patroni members          # Show cluster members and roles
sudo gitlab-ctl patroni switchover       # Initiate planned switchover
sudo gitlab-ctl patroni failover         # Force failover (use with caution)
sudo gitlab-ctl patroni reinitialize     # Re-bootstrap a failed replica
sudo gitlab-ctl patroni check-leader     # Verify current leader health

Split-brain prevention

Split-brain (two nodes believing they are the primary) is the worst failure mode in any database HA system. Patroni prevents it through the DCS:

The leader must continuously renew its lock in the DCS (default TTL: 30 seconds). If it can't reach the DCS, it voluntarily demotes itself to read-only.
A new leader can only be elected if the DCS confirms the old leader's lock has expired.
watchdog (optional, Linux kernel feature): Patroni can use the kernel watchdog to forcibly reboot a node that loses DCS connectivity, preventing a zombie primary from accepting writes.

GitLab-specific configuration

# In gitlab.rb on each PostgreSQL/Patroni node
patroni['enable'] = true
patroni['scope'] = 'gitlab-pg-cluster'
postgresql['listen_address'] = '0.0.0.0'
patroni['consul']['url'] = 'http://consul.service.consul:8500'
patroni['postgresql']['max_connections'] = 300
patroni['postgresql']['wal_level'] = 'replica'
patroni['replication']['password'] = 'repl_password_here'

Consultant tip: Always test failover before going live. Run gitlab-ctl patroni switchover during a maintenance window and verify that GitLab reconnects within 30 seconds. Also test an unplanned failure by stopping PostgreSQL on the leader node (sudo gitlab-ctl stop postgresql) and verifying automatic promotion. Document the observed failover time — customers will ask for it.

PgBouncer

Lightweight PostgreSQL connection pooler for high-concurrency environments

What is PgBouncer?

PgBouncer is a lightweight connection pooler for PostgreSQL. It sits between application clients (Rails/Puma, Sidekiq) and PostgreSQL, multiplexing many client connections over a smaller number of actual database connections. This is critical because PostgreSQL creates a new OS process for every connection, which becomes expensive at scale — hundreds or thousands of connections consume significant memory and CPU just for connection management.

Why GitLab needs it

In a GitLab HA deployment, multiple Puma and Sidekiq processes across multiple nodes all need database connections. Without PgBouncer:

3 Puma nodes × 60 workers × 4 threads = 720 potential connections
3 Sidekiq nodes × 20 concurrency = 60 more connections
Plus Praefect, Geo, monitoring queries…
Total: easily 800+ connections, each consuming ~5–10 MB of PostgreSQL backend memory

PgBouncer reduces this to a manageable pool (e.g., 100–200 actual database connections) while accepting many more client connections.

Pooling modes

Session pooling: A server connection is assigned to a client for the entire session (connect to disconnect). Safest mode — all PostgreSQL features work. But it doesn't save many connections since each active session ties up a backend.
Transaction pooling (recommended for GitLab): A server connection is assigned only for the duration of a transaction. Between transactions, the connection returns to the pool. This is the most effective mode for connection reduction. GitLab officially recommends transaction pooling.
Statement pooling: Connection returned after every statement. Very aggressive pooling but breaks multi-statement transactions. Not compatible with GitLab.

GitLab-specific setup

GitLab Omnibus bundles PgBouncer. In an HA setup, PgBouncer runs on dedicated nodes (or co-located with application nodes) and uses Consul to discover the current Patroni leader:

# In gitlab.rb on PgBouncer node(s)
pgbouncer['enable'] = true
pgbouncer['listen_address'] = '0.0.0.0'
pgbouncer['listen_port'] = 6432

# Pool sizing
pgbouncer['default_pool_size'] = 60
pgbouncer['min_pool_size'] = 10
pgbouncer['reserve_pool_size'] = 5
pgbouncer['max_client_conn'] = 2048

# Transaction pooling (recommended)
pgbouncer['pool_mode'] = 'transaction'

# Consul-based discovery of Patroni leader
pgbouncer['databases'] = {
  gitlabhq_production: {
    host: "master.patroni.service.consul",
    port: 5432,
    pool_size: 100
  }
}

Sizing: key parameters

max_client_conn: Maximum number of client connections PgBouncer will accept. Set this high enough for all Puma threads + Sidekiq workers + headroom. Default 2048 is usually sufficient.
default_pool_size: Number of server connections per user/database pair. This is how many actual PostgreSQL connections PgBouncer maintains. Start with 60–100 and tune based on monitoring.
reserve_pool_size: Extra connections allowed when the pool is exhausted. Acts as a burst buffer.
pool_mode: transaction for GitLab. Session mode negates most of the pooling benefit.
Rule of thumb: default_pool_size × number of databases should not exceed PostgreSQL's max_connections.

Common issues

Prepared statements in transaction mode: PostgreSQL prepared statements are session-level objects. In transaction pooling mode, a client might prepare a statement on connection A, but the next transaction runs on connection B where the statement doesn't exist. GitLab handles this by using prepared_statements: false in its database configuration. If you see prepared statement "a1" does not exist errors, this setting is wrong.
Long transactions blocking the pool: A single long-running query or transaction holds a server connection for its duration, reducing the effective pool size. Monitor pgbouncer_pools_server_active and pgbouncer_pools_server_used. Look for long-running transactions in pg_stat_activity.
SET statements leak: In transaction mode, SET commands affect the server connection but persist after the transaction. The next client using that connection inherits the modified settings. GitLab avoids this by not using session-level SET, but custom queries (from Grafana, ad-hoc scripts) can trigger this.
Connection storm after failover: When Patroni fails over, all PgBouncer connections to the old primary break. PgBouncer reconnects to the new primary, but the burst of new connections can overwhelm PostgreSQL. Tune server_login_retry and server_connect_timeout.

Monitoring PgBouncer

-- Connect to PgBouncer admin console
psql -h 127.0.0.1 -p 6432 -U pgbouncer pgbouncer

-- Key commands
SHOW POOLS;         -- active/waiting connections per pool
SHOW STATS;         -- request rate, query duration, bytes
SHOW DATABASES;     -- configured databases and pool sizes
SHOW CLIENTS;       -- connected clients and their state
SHOW SERVERS;       -- backend PostgreSQL connections

Consultant tip: In every GitLab HA engagement, add PgBouncer monitoring to the Grafana stack early. The key dashboard panels: pool utilization (active vs. idle server connections), client wait time (how long clients queue for a connection), and total client connections vs. max_client_conn. A pool utilization above 80% sustained means you need to increase default_pool_size or add PostgreSQL capacity.

SAST — Static Application Security Testing

Find vulnerabilities in source code before the application runs — shift security left into the development workflow

What is SAST?

Static Application Security Testing analyzes source code, bytecode, or binaries for security vulnerabilities without executing the application. It works by parsing the code into an abstract syntax tree (AST) and applying pattern-matching rules, data flow analysis, and taint tracking to find issues like SQL injection, XSS, insecure deserialization, hardcoded credentials, and buffer overflows.

SAST is "white-box" testing — it sees the source code. This means it can find vulnerabilities deep in code paths that might not be reachable via external testing, but it can also produce false positives because it doesn't know runtime context.

How GitLab SAST works

GitLab auto-detects the project language and selects the appropriate analyzer (container image)
The primary analyzer is Semgrep (multi-language, with GitLab-managed rules). Legacy standalone analyzers like Bandit, ESLint, and Gosec have been consolidated into Semgrep. SpotBugs (Java/Scala/Groovy/Kotlin) remains as a separate analyzer. Advanced SAST (Ultimate) adds cross-function and cross-file taint analysis for deeper detection.
The analyzer runs in a CI job, scans the source code, and outputs a standardized JSON report
GitLab parses the report and surfaces findings in the merge request widget and vulnerability dashboard

Supported languages

C/C++, C#, Go, Java, JavaScript/TypeScript, Kotlin, Python, Ruby, Scala, Swift, PHP, and more. Semgrep covers most languages with community and GitLab-maintained rule packs.

Configuration

# Basic — just include the template
include:
  - template: Security/SAST.gitlab-ci.yml

# Customize — override variables
variables:
  SAST_EXCLUDED_PATHS: "spec,test,tests,vendor"
  SAST_EXCLUDED_ANALYZERS: "spotbugs"  # skip Java analyzer
  SEARCH_MAX_DEPTH: 10                  # scan depth for monorepos

# Custom rules — add your own Semgrep rules
  SAST_RULESET_GIT_REFERENCE: "main"
  # Point to a repo with custom .semgrep.yml rules

Strengths and limitations

Strength: Catches issues early in development — before code is merged or deployed
Strength: No running application needed — works on every commit
Strength: Can find issues in code paths that are rarely exercised at runtime
Limitation: False positives — flags code patterns that look dangerous but aren't exploitable in context
Limitation: Can't find runtime issues (misconfigurations, auth bypass via deployment, SSRF to internal services)
Limitation: Struggles with dynamically generated code, heavy metaprogramming, or complex frameworks

Triage advice: The first SAST scan on a mature codebase will produce dozens or hundreds of findings. Don't try to fix them all. Focus on Critical and High severity in actively-changed code. Use "Dismissed" status liberally for false positives and document why. Over time, the noise ratio drops as the team learns the tool.

DAST — Dynamic Application Security Testing

Black-box security testing against a running application — find vulnerabilities that only appear at runtime

What is DAST?

Dynamic Application Security Testing tests a running application by sending crafted HTTP requests and analyzing responses for security vulnerabilities. It's "black-box" testing — the scanner doesn't see source code. It interacts with the application the same way an attacker would: through the web interface and API endpoints.

DAST finds issues that SAST cannot: server misconfigurations, authentication flaws, session management problems, CORS issues, and vulnerabilities that only manifest when multiple components interact at runtime.

How GitLab DAST works

GitLab DAST uses a browser-based crawler (based on Chromium) that navigates the application like a real user. The legacy proxy-based DAST analyzer was removed in GitLab 17.0.
It discovers pages, forms, API endpoints, and JavaScript-rendered content
The scanner then fuzzes inputs — injecting payloads for XSS, SQL injection, CSRF, command injection, path traversal, etc.
Results are correlated and deduplicated, then surfaced in the merge request and vulnerability dashboard

DAST modes

Passive scan — crawls the application and analyzes responses without sending attack payloads. Fast, safe, low noise. Good for CI pipelines.
Active scan (full) — sends attack payloads to discover exploitable vulnerabilities. Slower, may modify data, should target a dedicated test environment.
API scan — tests API endpoints directly using an OpenAPI/Swagger spec, HAR file, or Postman collection. No crawling needed.

Note: The legacy proxy-based DAST analyzer was removed in GitLab 17.0. All DAST scanning now uses the browser-based engine (DAST version 5+), which provides better crawl coverage but may require more resources.

Configuration

include:
  - template: Security/DAST.gitlab-ci.yml

variables:
  DAST_WEBSITE: "https://staging.example.com"  # target URL

  # Authenticated scanning (critical for real coverage)
  DAST_AUTH_URL: "https://staging.example.com/login"
  DAST_USERNAME: "dast-scanner@example.com"
  DAST_PASSWORD_VARIABLE: "DAST_PASSWORD"  # CI variable
  DAST_USERNAME_FIELD: "css:#username"
  DAST_PASSWORD_FIELD: "css:#password"
  DAST_SUBMIT_FIELD: "css:button[type='submit']"

  # API scanning
  DAST_API_OPENAPI: "https://staging.example.com/api/v1/openapi.json"

dast:
  stage: dast
  # Run only on staging deploys, not every commit
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

Practical considerations

Environment: DAST needs a running target — typically a review environment or staging. Never point it at production.
Authentication: Unauthenticated DAST only sees the login page. Configure authenticated scanning to test the actual application.
Speed: Full active scans can take 30-90 minutes. Run passive scans in MR pipelines and full scans nightly or on main branch only.
Data pollution: Active scans create records, submit forms, and may trigger emails. Use a disposable test environment with seed data.

SAST vs. DAST: They complement each other. SAST finds code-level bugs early but has false positives and can't see runtime behavior. DAST finds real exploitable issues in a running app but has less code coverage and runs later in the pipeline. Use both.

Dependency Scanning

Detect known vulnerabilities in third-party libraries before they reach production

What is Dependency Scanning?

Dependency Scanning (also called Software Composition Analysis / SCA) analyzes your project's dependencies — the third-party libraries and packages declared in lock files — against databases of known vulnerabilities (CVEs). Most modern applications are 80-90% third-party code, making this one of the highest-value scanners.

How it works in GitLab

The scanner parses lock files and manifests: package-lock.json, yarn.lock, Gemfile.lock, go.sum, pom.xml, requirements.txt, Pipfile.lock, Cargo.lock, etc.
Each dependency (including transitive dependencies) is checked against the GitLab Advisory Database (aggregating NVD, GitHub Advisories, and other sources)
Findings include CVE ID, severity (CVSS score), affected versions, and the fixed version if available
Results appear in the MR widget with a clear "upgrade to version X.Y.Z to fix" recommendation

Configuration

include:
  - template: Security/Dependency-Scanning.gitlab-ci.yml

# Typically no configuration needed — it auto-detects
# the package manager from lock files

# Optional customization:
variables:
  DS_EXCLUDED_PATHS: "vendor,node_modules"
  DS_MAX_DEPTH: 2  # limit transitive dependency depth

Vulnerability lifecycle

Detected — scanner finds a CVE in a dependency
Confirmed — team verifies the vulnerability is relevant (not all CVEs are exploitable in your context)
Resolved — dependency is upgraded to a fixed version
Dismissed — false positive or accepted risk (document the reason)

Key practices

Keep lock files committed — scanners need deterministic dependency trees. Without lock files, scans are unreliable.
Automate updates — use Dependabot, Renovate Bot, or GitLab's own dependency update MRs to stay current
Transitive dependencies matter — your app may not directly use a vulnerable library, but a dependency of a dependency might. The scanner catches these.
Don't ignore severity — a Critical CVE in a library that handles user input (e.g., a JSON parser, XML library, or image processor) is a real risk

Quick win: Dependency scanning is the easiest scanner to adopt and produces the most actionable results. Most findings have a clear fix: upgrade to version X. Enable it first on all projects.

Container Scanning

Find vulnerabilities in Docker images — OS packages, language libraries, and base image issues

What is Container Scanning?

Container Scanning analyzes Docker container images for known vulnerabilities in OS-level packages (apt, yum, apk) and language-specific packages. It's the container equivalent of dependency scanning — but for the entire runtime environment, not just your application's declared dependencies.

How it works in GitLab

GitLab uses Trivy (by Aqua Security) as the default container scanner
The scanner pulls the Docker image built in your CI pipeline and analyzes every layer
It checks installed packages against vulnerability databases (NVD, Alpine SecDB, Debian Security Tracker, Red Hat CVE database, etc.)
Results include CVE ID, severity, affected package, installed version, and fixed version

Configuration

include:
  - template: Security/Container-Scanning.gitlab-ci.yml

variables:
  CS_IMAGE: "$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA"  # image to scan
  CS_SEVERITY_THRESHOLD: "HIGH"  # only report HIGH and CRITICAL

# The scanner runs after the build stage
# (it needs the image to exist in a registry)

Common findings and remediation

Outdated base image — FROM ubuntu:20.04 with unpatched OpenSSL. Fix: update the base image tag or use :latest with pinned digests and regular rebuilds.
Unnecessary packages — base images often include tools not needed at runtime (curl, wget, gcc). Fix: use minimal base images (alpine, distroless, scratch) or multi-stage builds.
Language packages in image — pip/npm/gem packages installed during build may have CVEs. Fix: keep dependency scanning and container scanning aligned.

Best practices

Use minimal base images — fewer packages = fewer vulnerabilities. Alpine or distroless images have a dramatically smaller attack surface.
Rebuild regularly — even if your code hasn't changed, base images get security patches. Schedule weekly rebuilds.
Multi-stage builds — compile in one stage, copy only the binary to a minimal runtime image. Build tools and headers don't ship to production.
Pin image digests — FROM node:20@sha256:abc... ensures reproducible builds. Combine with automated digest updates.

Integration with Harbor: If using Harbor as your container registry, you can run Trivy scanning in both GitLab CI and Harbor. Harbor scans on push; GitLab scans in the pipeline. Use Harbor's scan-on-push as a safety net and GitLab's scanner for MR-level visibility.

Secret Detection

Catch accidentally committed credentials before they become a breach — API keys, passwords, tokens, and private keys

What is Secret Detection?

Secret Detection scans git commits for patterns that match known secret formats — AWS access keys, private RSA/ECDSA keys, database connection strings, API tokens, passwords in config files, and more. It uses a combination of regex pattern matching and entropy analysis (high-entropy strings that look like random tokens).

This is arguably the most immediately actionable scanner. A committed secret is a live vulnerability — it's in git history forever (even if deleted in a later commit) and can be exploited immediately.

Detection modes

Pipeline (post-commit) — runs in CI after the commit is pushed. Scans the diff of each commit. JSON report artifacts available in all tiers; MR widget, vulnerability dashboard, and security policies require Ultimate.
Pre-receive (push rules) — server-side hook that blocks the push before the secret enters the repository. This is the most effective mode — the secret never makes it into git history. Requires GitLab Ultimate.
Client-side (pre-commit) — optional git hook that runs locally before the developer commits. Fastest feedback loop but requires developer setup.

What it detects

AWS access key IDs and secret keys (AKIA...)
GCP service account keys (JSON key files)
Azure storage keys, connection strings
GitHub, GitLab, Slack, Stripe, Twilio tokens
Private keys (RSA, ECDSA, Ed25519, PGP)
Database connection strings with embedded passwords
Generic high-entropy strings (configurable sensitivity)
Custom patterns via .gitlab/secret-detection-ruleset.toml

Configuration

include:
  - template: Security/Secret-Detection.gitlab-ci.yml

variables:
  SECRET_DETECTION_HISTORIC_SCAN: "true"  # scan full git history (first run)
  SECRET_DETECTION_EXCLUDED_PATHS: "test/"

# Custom rules — add organization-specific patterns
# .gitlab/secret-detection-ruleset.toml
# [[rules]]
#   id = "internal-api-key"
#   regex = '''MYCO-[A-Za-z0-9]{32}'''
#   description = "Internal API key"

Incident response for committed secrets

Rotate immediately — the secret is compromised the moment it's pushed, even to a private repo. Assume it's been read.
Revoke the old credential — disable the API key, rotate the password, regenerate the token
Don't just delete the file — the secret is in git history. Deleting it in a new commit doesn't remove it. Use git filter-repo to rewrite history if needed, but rotation is the priority.
Audit access logs — check if the secret was used by an unauthorized party during the exposure window

Prevention > detection: Secret Detection is a safety net, not a primary control. The real fix is to never put secrets in code: use environment variables, CI/CD variables (masked + protected), external secret managers (OpenBao/Vault), or Kubernetes secrets. Enable pre-receive push rules to block secrets from entering the repo in the first place.

SBOM — Software Bill of Materials

A machine-readable inventory of every component in your software — increasingly required for compliance and supply chain security

What is an SBOM?

A Software Bill of Materials is a structured, machine-readable list of all components, libraries, and dependencies that make up a piece of software. Think of it as a nutritional label for software — it tells you exactly what's inside. SBOMs have become a critical compliance requirement following high-profile supply chain attacks (SolarWinds, Log4Shell) and government mandates (US Executive Order 14028, EU Cyber Resilience Act).

SBOM formats

CycloneDX — OWASP standard. JSON or XML format. GitLab generates CycloneDX by default. Designed for security use cases: vulnerability tracking, license compliance, and component lifecycle.
SPDX (Software Package Data Exchange) — Linux Foundation standard. ISO/IEC 5962:2021. Stronger focus on license compliance. Widely used in open-source projects and government procurement.

How GitLab generates SBOMs

GitLab's Dependency Scanning and Container Scanning automatically generate CycloneDX SBOM reports as part of their scan output
The SBOM is stored as a CI artifact and linked to the pipeline
The Dependency List page (project-level) shows a browsable view of all components with versions, licenses, and known vulnerabilities
SBOMs can be exported for sharing with customers, auditors, or regulatory bodies

What's in an SBOM

Component name and version — e.g., lodash@4.17.21, openssl 3.0.8-1ubuntu1
Package URL (purl) — standardized identifier: pkg:npm/lodash@4.17.21
License — MIT, Apache-2.0, GPL-3.0, etc.
Supplier — who produced the component
Dependency relationship — direct vs. transitive
Hashes — SHA-256 of the component for integrity verification

Why SBOMs matter

Vulnerability response: When the next Log4Shell drops, an SBOM tells you in seconds whether you're affected — across all projects, all environments.
License compliance: Automatically flag GPL dependencies in proprietary software, or track license obligations for open-source usage.
Regulatory compliance: US federal agencies require SBOMs from software vendors. The EU Cyber Resilience Act mandates them for products sold in the EU.
Customer trust: Enterprise customers increasingly request SBOMs as part of procurement due diligence.

Practical tip: SBOMs are generated automatically if you have Dependency Scanning or Container Scanning enabled — there's no extra configuration. The challenge isn't generating them; it's having a process to act on them. Pair SBOMs with a vulnerability management workflow so that when a new CVE is published, you can immediately query which projects are affected.

SAML 2.0

Security Assertion Markup Language — the XML-based SSO standard that dominates enterprise identity federation

What is SAML?

SAML 2.0 (Security Assertion Markup Language) is an XML-based open standard for exchanging authentication and authorization data between an Identity Provider (IdP) and a Service Provider (SP). Published in 2005 by OASIS, it remains the dominant SSO protocol in enterprise environments.

In a GitLab context, GitLab acts as the SAML SP. An external IdP (Keycloak, ADFS, Okta, Microsoft Entra ID) authenticates users and sends SAML assertions to GitLab. GitLab trusts the assertion and creates/maps the user session.

How the SAML flow works with GitLab

User clicks "Sign in with SSO" on GitLab
GitLab generates an AuthnRequest and redirects the user to the IdP
IdP authenticates the user (login form, Kerberos, MFA)
IdP generates a signed SAML Response containing an Assertion with user identity and attributes
IdP POSTs the response to GitLab's ACS URL (/users/auth/saml/callback)
GitLab validates the signature, maps attributes (email, name, groups), and creates the session

GitLab SAML configuration

# /etc/gitlab/gitlab.rb
gitlab_rails['omniauth_providers'] = [
  {
    name: "saml",
    label: "Company SSO",
    args: {
      assertion_consumer_service_url: "https://gitlab.example.com/users/auth/saml/callback",
      idp_cert_fingerprint: "XX:XX:XX:...",
      idp_sso_target_url: "https://idp.example.com/sso/saml",
      issuer: "https://gitlab.example.com",
      name_identifier_format: "urn:oasis:names:tc:SAML:2.0:nameid-format:emailAddress",
      attribute_statements: {
        email: ["email"],
        first_name: ["firstName"],
        last_name: ["lastName"]
      }
    }
  }
]

Key concepts

Assertion — signed XML payload containing who the user is and their attributes
Entity ID — unique identifier for each party. Must match exactly between IdP and SP configurations
ACS URL — the GitLab endpoint that receives the SAML response
Name ID — the user identifier in the assertion (typically email)
Metadata — XML document describing endpoints and certificates. Exchange between IdP and SP to automate setup
Group sync — map SAML group attributes to GitLab groups (requires Premium+)

Consultant guidance: For GitLab, prefer SAML over LDAP for SSO — SAML is stateless (no bind credentials stored on GitLab), supports MFA at the IdP level, and works with cloud IdPs. LDAP is better for real-time group sync and provisioning, but SAML group sync (Premium) covers most needs.

OAuth 2.0

The authorization framework that underpins modern API security and delegated access

What is OAuth 2.0?

OAuth 2.0 is an authorization framework (RFC 6749) that enables applications to obtain limited access to user resources without exposing credentials. A critical distinction: OAuth 2.0 is about authorization (what can you access?), not authentication (who are you?). OIDC was built on top of OAuth 2.0 to add authentication.

OAuth 2.0 in GitLab

GitLab uses OAuth 2.0 in two ways:

As an OAuth provider — GitLab can act as an OAuth 2.0 authorization server. Third-party applications can request access to GitLab APIs on behalf of a user. This is how integrations like IDE plugins, CLI tools, and custom dashboards authenticate.
As an OAuth consumer — GitLab can authenticate users via external OAuth/OIDC providers (Keycloak, Google, GitHub, Microsoft Entra ID) through OmniAuth.

Core roles

Resource Owner — the GitLab user who owns repositories, issues, pipelines
Client — the application requesting access (registered as an OAuth Application in GitLab)
Authorization Server — GitLab itself (issues access tokens after user consent)
Resource Server — GitLab's API (validates access tokens on each request)

Creating an OAuth application in GitLab

# Admin Area > Applications > New Application
Name:          My Integration
Redirect URI:  https://myapp.example.com/callback
Scopes:        api, read_user, read_repository
Confidential:  Yes (server-side apps) / No (SPAs, mobile)

# Result:
Application ID:  abc123...  (client_id)
Secret:          xyz789...  (client_secret)

Grant types supported by GitLab

Authorization Code — standard flow for web apps. User is redirected to GitLab, approves access, GitLab returns a code, app exchanges it for tokens.
Authorization Code + PKCE — for public clients (SPAs, CLIs). Adds a code verifier/challenge to prevent interception.
Resource Owner Password Credentials — direct username/password exchange. Deprecated in OAuth 2.1, but GitLab still supports it for legacy integrations.

GitLab API scopes

api — full read/write access to the API
read_user — read the authenticated user's profile
read_api — read-only API access
read_repository — read repository contents (clone/fetch)
write_repository — write to repositories (push)
read_registry — read container registry images
openid — OIDC authentication (ID token)

Security note: Prefer OAuth applications over Personal Access Tokens (PATs) for integrations. OAuth tokens are scoped, time-limited, and tied to a user's session. PATs are static, long-lived, and a frequent source of credential leaks. For CI/CD, use the built-in CI_JOB_TOKEN instead of PATs where possible.

JSON Web Token (JWT)

The compact, self-contained token format that carries identity and authorization claims

What is a JWT?

A JSON Web Token (RFC 7519) is a compact, URL-safe way to represent claims between two parties. It consists of three Base64URL-encoded parts separated by dots: header.payload.signature. JWTs are used throughout GitLab — as OAuth access tokens, CI/CD job tokens, and OIDC ID tokens.

JWT structure

# Header
{ "alg": "RS256", "typ": "JWT", "kid": "key-id" }

# Payload (claims)
{
  "iss": "https://gitlab.example.com",
  "sub": "12345",
  "aud": "my-app",
  "exp": 1711234567,
  "iat": 1711230967,
  "namespace_id": "42",
  "namespace_path": "my-group",
  "project_id": "99",
  "project_path": "my-group/my-project",
  "user_login": "jsmith",
  "user_email": "jsmith@example.com",
  "ref": "main",
  "pipeline_id": "7890"
}

# Signature
RSASHA256(base64url(header) + "." + base64url(payload), privateKey)

JWTs in GitLab CI/CD

GitLab CI/CD can generate JWTs for pipelines, enabling keyless authentication to external services:

# .gitlab-ci.yml — OIDC authentication to cloud providers
deploy:
  id_tokens:
    VAULT_TOKEN:
      aud: https://vault.example.com
    AWS_TOKEN:
      aud: sts.amazonaws.com
  script:
    # Use the JWT to authenticate to Vault (no static secrets!)
    - export VAULT_TOKEN=$(vault write -field=token
        auth/jwt/login role=gitlab-deploy jwt=$VAULT_TOKEN)
    # Use the JWT to assume an AWS IAM role via OIDC federation
    - aws sts assume-role-with-web-identity
        --role-arn arn:aws:iam::123456:role/deploy
        --web-identity-token $AWS_TOKEN

Key claims in GitLab JWTs

iss — the GitLab instance URL
sub — project_path:ref_type:ref (e.g., my-group/my-project:ref_type:branch:ref:main)
namespace_path / project_path — allows external services to authorize based on which project/group triggered the pipeline
ref — the branch or tag that triggered the pipeline
pipeline_id / pipeline_source — audit trail for which pipeline generated the token

Token validation

Fetch JWKS — GitLab publishes public keys at https://gitlab.example.com/-/jwks
Verify signature — match the kid header to a JWKS key, verify with the public key
Check expiration — reject if exp is past
Check issuer — iss must match your GitLab instance
Check audience — aud must match what the external service expects
Check claims — verify project_path, ref, etc. match your authorization policy

Key insight: GitLab CI/CD JWTs eliminate the need for static secrets in pipelines. Instead of storing AWS keys or Vault tokens as CI variables, configure OIDC federation — the pipeline proves its identity with a JWT, and the external service grants short-lived credentials. This is the modern approach to CI/CD secret management and pairs perfectly with OpenBao/Vault JWT auth.

1K Reference Architecture

Single-node all-in-one — up to 1,000 users, 20 RPS

Overview

The simplest GitLab deployment: everything on one server. Puma, Sidekiq, Gitaly, PostgreSQL, Redis, Prometheus — all bundled via Omnibus on a single VM.

Node specifications

Service	Nodes	vCPU	RAM
All-in-one	1	8	16 GB

Total: 1 node

Supported modifications

External PostgreSQL (Cloud SQL, RDS)
External object storage (S3)
Elasticsearch for Advanced Search (Premium/Ultimate)

Unsupported modifications

HA configurations — not possible at this tier
Cloud Native Hybrid — requires minimum 2K

Best for: Development environments, small teams, proof-of-concept deployments, or organizations where downtime during maintenance is acceptable.

2K Reference Architecture

Separated services, no HA — up to 2,000 users, 40 RPS

Overview

Services are split across dedicated nodes but without redundancy. A single failure takes down the affected component.

Node specifications

Service	Nodes	vCPU	RAM
External Load Balancer	1	4	3.6 GB
PostgreSQL	1	2	7.5 GB
Redis	1	1	3.75 GB
Gitaly	1	4	15 GB
Sidekiq	1	4	15 GB
GitLab Rails (Puma)	2	8	7.2 GB
Monitoring (Prometheus)	1	2	1.8 GB

Total: 8 nodes

Supported modifications

External PaaS PostgreSQL (Cloud SQL, RDS)
External Redis (ElastiCache, Memorystore)
Cloud Native Hybrid variant available

Note: For HA needs at this user count, GitLab recommends using a scaled-down 3K architecture instead of trying to add redundancy to 2K.

3K Reference Architecture

Smallest HA architecture — up to 3,000 users, 60 RPS, ~28 nodes

Overview

The most commonly deployed production architecture. Every critical service has redundancy. This is the tier to recommend for any customer who needs high availability.

Node specifications

Service	Nodes	vCPU	RAM
External Load Balancer	1	4	3.6 GB
Internal Load Balancer	1	4	3.6 GB
Consul	3	2	1.8 GB
PostgreSQL (Patroni)	3	2	7.5 GB
PgBouncer	3	2	1.8 GB
Redis + Sentinel	3	2	7.5 GB
Gitaly	3	4	15 GB
Praefect	3	2	1.8 GB
Praefect PostgreSQL	1+	2	1.8 GB
Sidekiq	2	4	15 GB
GitLab Rails (Puma)	3	8	7.2 GB
Monitoring (Prometheus)	1	2	1.8 GB

Total: ~28 nodes

Supported modifications

Scaled-down HA — reduce specs while maintaining node redundancy for fewer users
Cloud Native Hybrid — Puma + Sidekiq in Kubernetes, stateful services on VMs
External PostgreSQL/Redis via PaaS
Sharded Gitaly instead of Praefect cluster

Consultant tip: The 3K is the default recommendation for any HA deployment. Even customers with 500 users who need HA should use a scaled-down 3K rather than trying to bolt HA onto a smaller architecture.

5K Reference Architecture

HA with larger specs — up to 5,000 users, 100 RPS, ~28 nodes

Overview

Same node count as 3K but with larger CPU and RAM per node. The architecture shape doesn't change — services just get more resources.

Node specifications

Service	Nodes	vCPU	RAM
External Load Balancer	1	4	3.6 GB
Internal Load Balancer	1	4	3.6 GB
Consul	3	2	1.8 GB
PostgreSQL (Patroni)	3	4	15 GB
PgBouncer	3	2	1.8 GB
Redis + Sentinel	3	2	7.5 GB
Gitaly	3	8	30 GB
Praefect	3	2	1.8 GB
Praefect PostgreSQL	1+	2	1.8 GB
Sidekiq	2	4	15 GB
GitLab Rails (Puma)	3	16	14.4 GB
Monitoring	1	2	1.8 GB

Total: ~28 nodes

Supported modifications

Split Redis (separate Cache + Persistent = 6 Redis nodes)
Sidekiq autoscaling via Auto Scaling Groups
TLS encryption between Praefect and Gitaly
Cloud Native Hybrid variant

Scaling note: The jump from 3K to 5K is entirely about vertical scaling (bigger nodes), not horizontal scaling (more nodes). The architecture diagram is identical.

10K Reference Architecture

Large-scale HA — up to 10,000 users, 200 RPS, ~35 nodes

Overview

At this tier, Redis is split into separate Cache and Persistent clusters (6 Redis nodes instead of 3), and Sidekiq scales to 4 nodes.

Node specifications

Service	Nodes	vCPU	RAM
External Load Balancer	1	4	3.6 GB
Internal Load Balancer	1	4	3.6 GB
Consul	3	2	1.8 GB
PostgreSQL (Patroni)	3	8	30 GB
PgBouncer	3	2	1.8 GB
Redis Cache + Sentinel	3	4	15 GB
Redis Persistent + Sentinel	3	4	15 GB
Gitaly	3	16	60 GB
Praefect	3	2	1.8 GB
Praefect PostgreSQL	1+	2	1.8 GB
Sidekiq	4	4	15 GB
GitLab Rails (Puma)	3	32	28.8 GB
Monitoring	1	4	3.6 GB

Total: ~35 nodes (~236 vCPU, ~535 GB RAM)

Key change from 5K: Redis splits into two clusters (Cache for session data, Persistent for Sidekiq queues). This is the inflection point where Redis becomes a meaningful scaling concern.

25K Reference Architecture

Enterprise-scale HA — up to 25,000 users, 500 RPS, ~42 nodes

Node specifications

Service	Nodes	vCPU	RAM
External Load Balancer	1	8	7.2 GB
Internal Load Balancer	1	8	7.2 GB
Consul	3	2	1.8 GB
PostgreSQL (Patroni)	3	16	60 GB
PgBouncer	3	2	1.8 GB
Redis Cache	3	4	15 GB
Redis Persistent	3	4	15 GB
Gitaly	3	32	120 GB
Praefect	3	4	3.6 GB
Praefect PostgreSQL	1+	2	1.8 GB
Sidekiq	4	4	15 GB
GitLab Rails (Puma)	5	32	28.8 GB
Monitoring	1	4	3.6 GB

Total: ~42 nodes

Key change: Puma scales to 5 nodes (up from 3). Gitaly nodes grow to 32 vCPU and 120 GB RAM each — git operations are the primary bottleneck at this scale.

50K Reference Architecture

Maximum scale — up to 50,000 users, 1,000 RPS, ~45 nodes

Node specifications

Service	Nodes	vCPU	RAM
External Load Balancer	1	16	14.4 GB
Internal Load Balancer	1	16	14.4 GB
Consul	3	2	1.8 GB
PostgreSQL (Patroni)	3	32	120 GB
PgBouncer	3	2	1.8 GB
Redis Cache	3	4	15 GB
Redis Persistent	3	4	15 GB
Gitaly	3	64	240 GB
Praefect	3	4	3.6 GB
Praefect PostgreSQL	1+	2	1.8 GB
Sidekiq	4	4	15 GB
GitLab Rails (Puma)	12	32	28.8 GB
Monitoring	1	4	3.6 GB

Total: ~45 nodes

Key change: Puma scales to 12 nodes — the main horizontal scaling lever. Gitaly reaches 64 vCPU and 240 GB RAM per node. At this scale, Cloud Native Hybrid is strongly recommended (Puma and Sidekiq in Kubernetes for easier scaling).

GitLab Environment Toolkit (GET)

Official IaC toolkit for deploying GitLab reference architectures using Terraform and Ansible

What is GET?

The GitLab Environment Toolkit is GitLab's official Infrastructure as Code solution for provisioning and configuring GitLab reference architectures. It combines Terraform modules (infrastructure provisioning) with Ansible playbooks (GitLab configuration) to deploy production-ready environments that match GitLab's published reference architectures exactly.

Supported architectures

1K — single node (Ansible only, no Terraform needed)
2K — separated services, no HA
3K — smallest HA architecture (~28 nodes)
5K — HA with larger per-node specs
10K — split Redis, 4 Sidekiq nodes
25K / 50K — enterprise scale

Cloud providers

AWS — EC2, RDS (optional), ElastiCache (optional), S3, ELB/NLB
GCP — Compute Engine, Cloud SQL (optional), Memorystore (optional), GCS, Load Balancer
Azure — VMs, Azure Database for PostgreSQL (optional), Azure Cache for Redis (optional), Blob Storage, Load Balancer

Key capabilities

Cloud Native Hybrid — deploy Puma and Sidekiq in Kubernetes (via Helm) while keeping stateful services on VMs
Geo deployments — provision primary + secondary sites with full Geo replication configuration
Custom configs — inject custom gitlab.rb settings via Ansible variables
Day 2 operations — re-run Ansible playbooks for upgrades, scaling, and config changes
Air-gapped support — deploy in environments without internet access

Terraform workflow

# 1. Configure environment variables
cp terraform/environments/3k/variables.tf.example my-env/variables.tf
# Edit: AWS region, VPC, SSH key, domain, instance types...

# 2. Provision infrastructure
cd terraform/environments/my-env
terraform init
terraform plan    # review what will be created
terraform apply   # provision ~28 VMs, LBs, security groups

# 3. Configure GitLab via Ansible
cd ansible
ansible-playbook -i environments/my-env/inventory playbooks/all.yml

Who uses GET: GitLab Professional Services uses GET for all customer deployments. It's also available to customers and partners. GET encodes years of deployment best practices — don't hand-roll what GET automates.

GitLab Geo

Cross-region replication for disaster recovery, distributed reads, and data residency compliance

What is Geo?

GitLab Geo creates read-only replicas of a GitLab instance in different geographic regions. It replicates Git repositories, the PostgreSQL database, LFS objects, uploads, container registry images, and other data types to one or more secondary sites.

How replication works

Database: PostgreSQL streaming replication (physical, not logical). The secondary has a read-only replica of the entire database.
Git repositories: Geo-specific replication via Gitaly. Repositories are synced asynchronously after push events on the primary.
Files & artifacts: Synced via internal HTTP API calls from secondary to primary. Object storage can use native cloud replication (S3 CRR) as an alternative.
Container registry: Registry images are replicated if using filesystem storage. With object storage, use cloud-native replication.

Failover process

Geo failover is manual, not automatic. The process:

Planned: Enable maintenance mode on primary → wait for replication to complete → run gitlab-ctl geo promote on secondary → update DNS
Unplanned: Run gitlab-ctl geo promote on secondary immediately → update DNS → accept potential data loss (equal to replication lag)

Monitoring Geo health

# Check Geo status from the secondary
gitlab-rake geo:status

# Key metrics to monitor:
# - Repositories synced vs. failed
# - Database replication lag (seconds)
# - Last event ID gap between primary and secondary
# - Verification status (checksums match)

Common gotchas

Primary and secondary must run the same GitLab version — upgrade primary first, then secondary
Geo does not replicate external services (Elasticsearch, external Redis, external PostgreSQL beyond streaming replication)
The secondary site needs its own separate object storage buckets — don't share buckets between sites
After failover, the old primary cannot simply be "demoted" — it must be fully reconfigured as a new secondary

License: Geo requires Premium or Ultimate. It's the most common reason customers upgrade from Free/CE. GET supports automated provisioning of multi-site Geo deployments.

Direct Transfer

GitLab's native group and project migration mechanism — API-based transfer between any two GitLab instances

What is Direct Transfer?

Direct Transfer (formerly "Bulk Import" or "GitLab Migration") is GitLab's built-in mechanism for migrating groups and projects between instances. It works via HTTPS API calls — the destination instance pulls data directly from the source. No intermediate files or manual export/import steps.

Requirements

Version: Both instances should be 16.8+ for best results (Direct Transfer reached GA in Q2 2025). Source can be at most 2 minor versions behind destination.
Network: HTTPS connectivity between instances (destination must reach source API).
Source token: Personal access token with api scope on the source instance.
Destination access: Owner role on the destination top-level group. Admin token is not required for standard migrations.
Feature flag: Must be enabled in Admin → Settings → General → Import/Export on the destination.

What it transfers

Repositories (including LFS), wikis
Issues, merge requests, comments, resource events
Labels, milestones, boards, badges
Epics, iterations (Premium+)
Members and role mappings (users must already exist on destination)
Releases, snippets, CI pipeline history
Group/subgroup structure

What it does NOT transfer

CI/CD variables — contains secrets, must be re-created manually or via Congregate
Container registry images — must be pushed separately (Congregate handles this)
Deploy tokens and webhooks — security-sensitive, excluded by design
Runners — must be re-registered on the destination
Job artifacts — ephemeral, not migrated
Approval rules, feature flags — not yet supported
Pages domains and remote mirrors

User mapping

Direct Transfer maps users by public email address. Users are never created on the destination — they must already exist (via SCIM, SAML, or manual creation). If a user's email doesn't match, their contributions are reassigned to the user performing the import. Pre-migration user mapping verification is critical.

Performance: Direct Transfer processes up to 5 entities (groups/projects) concurrently per import. For large migrations (100+ projects), Congregate adds wave orchestration and parallelization to manage throughput and API rate limits.

Congregate

GitLab Professional Services' migration automation tool for large-scale platform migrations

What is Congregate?

Congregate is GitLab Professional Services' migration automation tool. It now uses Direct Transfer under the hood for core group/project migration, then supplements it with post-migration API calls to handle objects that Direct Transfer doesn't cover.

How it works

Direct Transfer phase: Congregate triggers Direct Transfer API to migrate groups, projects, repos, issues, MRs, and other core objects
Post-migration phase: API calls to migrate CI/CD variables, container registry images, webhooks, deploy tokens, and other excluded objects
Verification: Validates migration completeness and generates reports

Supported sources

GitLab — self→self, self→SaaS, SaaS→SaaS
GitHub — GitHub.com and GitHub Enterprise
Bitbucket — Server and Cloud
Azure DevOps — SaaS and Server
SVN — with documented conversion guidance

Wave orchestration

Congregate breaks large migrations into waves — batches of groups/projects that migrate together. This manages API rate limits, allows staged cutover (e.g., migrate team-by-team over weeks), and enables progress tracking across hundreds of projects.

Technical details

Python-based with Celery task scheduling, MongoDB for state tracking, and a web UI for monitoring progress. Supports incremental syncs (run multiple times, final cutover with minimal delta) and air-gapped migrations via two Congregate nodes with file-based data transfer.

Key limitation: Congregate is not self-service. Customers must engage GitLab Professional Services. For self-service migrations, use Direct Transfer from the GitLab UI. Congregate is worth it when you have 100+ projects, complex user mappings, or need the post-migration tasks for CI/CD variables and container registries.

Migrating from GitHub

Built-in importer with comprehensive project migration support

What gets migrated

Repositories (including wiki repos)
Issues and issue comments
Pull requests and PR review comments
Labels, milestones, releases
Collaborators (mapped to GitLab users by email)

How to import

GitLab provides a built-in GitHub importer: New Project → Import Project → GitHub. Authenticate via personal access token (needs repo scope). Supports GitHub.com and GitHub Enterprise Server.

What doesn't migrate

GitHub Actions workflows — must be manually converted to .gitlab-ci.yml
GitHub Packages — re-publish to GitLab Package/Container Registry
Branch protection rules — recreate in GitLab
GitHub Apps and webhook configurations

CI/CD conversion: GitHub Actions and GitLab CI share many concepts (YAML-based, jobs, stages, artifacts). GitLab provides a syntax comparison guide. The migration is manual but straightforward for most workflows.

Migrating from Bitbucket

Built-in importer for Bitbucket Cloud and Server

Bitbucket Cloud

Built-in importer: New Project → Import → Bitbucket Cloud
Authenticate via OAuth or App Password
Migrates: repositories, pull requests, issues (if enabled), wiki
Does not migrate: Bitbucket Pipelines (convert to .gitlab-ci.yml), deployment environments, project/workspace settings

Bitbucket Server / Data Center

Built-in importer available for repositories and pull requests
Requires API access to the Bitbucket Server instance
For large-scale migrations, use Congregate via GitLab PS

Planning considerations

Bitbucket projects map to GitLab groups/subgroups — plan the hierarchy before importing
Bitbucket branch permissions → GitLab protected branches (manual recreation)
Bitbucket Pipelines → .gitlab-ci.yml (manual conversion)

Tip: For Bitbucket Server with many repos, export a list via the REST API first (/rest/api/1.0/projects/{key}/repos) to plan the migration scope and group structure.

Migrating from Jira

Issue migration using jira2gitlab (open source) or Jira2Lab (GitLab PS)

jira2gitlab (open source)

Community tool at github.com/swingbit/jira2gitlab (MIT license). Works with Jira Server 8.5.1+ only (not Jira Cloud).

What it migrates

Issues with titles, descriptions (Jira markup → Markdown conversion)
Comments (including tables, formatting)
Attachments (optional)
Labels, components, priority, status, resolution → GitLab labels
Fix versions → GitLab milestones
Worklogs → comments with /spend quick actions
Sub-tasks → issues with blocking relationships
Epics → issues with label-based coupling
"Relates to", "Blocks", "Duplicates" relationships
Custom fields (as comment tables)

Key features

Incremental/resumable — can run multiple times, updates changed issues
User mapping — explicit mapping file, unmapped users attributed to admin
Requires: Jira admin credentials, GitLab admin token, pre-created GitLab groups

Jira2Lab (GitLab PS)

GitLab Professional Services' enterprise fork for large-scale Jira migrations. Handles thousands of issues with better performance and reporting.

Alternative: Keep Jira

If using Jira Cloud (jira2gitlab doesn't support it), consider keeping Jira and using GitLab's native Jira integration instead: link commits/MRs to Jira issues, view GitLab pipelines in Jira, create GitLab branches from Jira issues.

Decision guide: Jira Server → use jira2gitlab or Jira2Lab to fully migrate. Jira Cloud → evaluate whether the native GitLab-Jira integration meets needs before attempting migration.

Migrating from Jenkins

Manual pipeline conversion — Jenkinsfile to .gitlab-ci.yml

Largely manual migration

There is no fully automated tool to convert Jenkins pipelines to GitLab CI. Community CLI converters exist for simple cases, and GitLab provides a JenkinsFile Wrapper (unsupported) that runs Jenkins jobs inside GitLab CI as a transitional bridge. However, complex pipelines must be rewritten as .gitlab-ci.yml. The concepts map closely:

Jenkins	GitLab CI
Pipeline	Pipeline
Stage	Stage
Step	Job / script command
Agent / Node	Runner / tags
Jenkinsfile	.gitlab-ci.yml
Shared Library	CI/CD Components / includes
Credentials	CI/CD Variables (masked, protected)
Multibranch Pipeline	Default (every branch runs CI)
Blue Ocean	Pipeline editor / visualization

Migration strategy

Inventory pipelines — list all Jenkins jobs, categorize by type (build, test, deploy)
Prioritize — start with the most-used pipelines, convert less-used ones later
Convert incrementally — run Jenkins and GitLab CI in parallel during transition
Decommission Jenkins — once all pipelines are verified in GitLab CI

Common patterns

# Jenkins: stage('Build') { steps { sh 'make build' } }
# GitLab CI equivalent:
build:
  stage: build
  script:
    - make build
  artifacts:
    paths:
      - build/

# Jenkins: when { branch 'main' }
# GitLab CI equivalent:
deploy:
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

Consultant tip: Jenkins plugin dependencies are the hardest part. Many Jenkins pipelines rely on plugins (SonarQube, Nexus, Artifactory) that need equivalent GitLab integrations or CI/CD template includes. Audit plugin usage before estimating migration effort.

Migrating from SVN

Convert Subversion repositories to Git while preserving history

Migration process

Create author mapping — map SVN usernames to Git author format
Clone with git svn — converts SVN history to Git commits
Clean up — convert SVN branches/tags to proper Git refs
Push to GitLab

# Create author mapping file (authors.txt)
# svnuser = Git Name <email@example.com>
svn log --quiet svn://svn.example.com/repo | \
  awk '/^r/ {print $3}' | sort -u > svn_authors.txt
# Edit to: username = Full Name <email@example.com>

# Clone SVN repo with full history
git svn clone svn://svn.example.com/repo \
  --stdlayout \
  --authors-file=authors.txt \
  --no-metadata \
  my-repo

# Convert SVN tags to Git tags
cd my-repo
for tag in $(git branch -r | grep 'tags/'); do
  git tag $(echo $tag | sed 's|tags/||') $tag
  git branch -r -d $tag
done

# Push to GitLab
git remote add origin https://gitlab.example.com/group/repo.git
git push origin --all
git push origin --tags

Considerations

Large repos — SVN repos with 100K+ commits take hours to convert. Run on a fast machine with SSD.
svn:externals — no direct Git equivalent. Convert to Git submodules or subtrees, or restructure.
Partial checkout — SVN supports sparse checkout of subdirectories. Git doesn't (well). Monorepo vs multi-repo decision needed.
Binary files — move large binaries to Git LFS during conversion.

Tip: Use --stdlayout if the SVN repo follows the standard trunk/branches/tags layout. For non-standard layouts, use --trunk, --branches, --tags flags explicitly.

Migrating from Mercurial

Convert Mercurial repositories to Git for import into GitLab

Migration tools

hg-fast-export (recommended) — fast, handles most repositories well
hg-git — Mercurial extension that enables pushing to Git remotes
git-cinnabar — Mozilla's tool for bidirectional Hg/Git

Using hg-fast-export

# Install fast-export
git clone https://github.com/frej/fast-export.git

# Create a new Git repo
git init my-repo-git
cd my-repo-git

# Run the conversion
../fast-export/hg-fast-export.sh \
  -r /path/to/mercurial-repo \
  --force

# Clean up and push to GitLab
git checkout HEAD
git remote add origin https://gitlab.example.com/group/repo.git
git push origin --all
git push origin --tags

Considerations

Named branches — Mercurial named branches map to Git branches, but semantics differ (Hg branches are permanent, Git branches are movable pointers)
Bookmarks — Mercurial bookmarks are closer to Git branches. Plan the mapping.
Anonymous heads — Mercurial allows multiple heads on one branch. These need manual resolution.
Subrepos — convert to Git submodules or merge into a monorepo.

Note: Mercurial is increasingly rare. If the customer has a small number of repos, the manual conversion is straightforward. For large Hg installations, engage GitLab PS with Congregate.