Disaster Recovery (Geo)
DETAILS: Tier: Premium, Ultimate Offering: Self-managed
Geo replicates your database, your Git repositories, and few other assets, but there are some limitations.
WARNING: Multi-secondary configurations require the complete re-synchronization and re-configuration of all non-promoted secondaries and causes downtime.
Promoting a secondary Geo site in single-secondary configurations
We don't currently provide an automated way to promote a Geo replica and do a
failover, but you can do it manually if you have root access to the machine.
This process promotes a secondary Geo site to a primary site. To regain geographic redundancy as quickly as possible, you should add a new secondary site immediately after following these instructions.
Step 1. Allow replication to finish if possible
If the secondary site is still replicating data from the primary site, follow the planned failover docs as closely as possible in order to avoid unnecessary data loss.
Step 2. Permanently disable the primary site
WARNING: If the primary site goes offline, there may be data saved on the primary site that have not been replicated to the secondary site. This data should be treated as lost if you proceed.
If an outage on the primary site happens, you should do everything possible to avoid a split-brain situation where writes can occur in two different GitLab instances, complicating recovery efforts. So to prepare for the failover, we must disable the primary site.
-
If you have SSH access:
-
SSH into the primary site to stop and disable GitLab:
sudo gitlab-ctl stop -
Prevent GitLab from starting up again if the server unexpectedly reboots:
sudo systemctl disable gitlab-runsvdir
-
-
If you do not have SSH access to the primary site, take the machine offline and prevent it from rebooting by any means at your disposal. You might need to:
- Reconfigure the load balancers.
- Change DNS records (for example, point the primary DNS record to the secondary site to stop usage of the primary site).
- Stop the virtual servers.
- Block traffic through a firewall.
- Revoke object storage permissions from the primary site.
- Physically disconnect a machine.
If you plan to update the primary domain DNS record, you may wish to maintain a low TTL to ensure fast propagation of DNS changes.
Step 3. Promoting a secondary site
Note the following when promoting a secondary:
- If the secondary site has been paused, the promotion performs a point-in-time recovery to the last known state. Data that was created on the primary while the secondary was paused is lost.
- A new secondary should not be added at this time. If you want to add a new secondary, do this after you have completed the entire process of promoting the secondary to the primary.
- If you encounter an
ActiveRecord::RecordInvalid: Validation failed: Name has already been takenerror message during this process, for more information, see this troubleshooting advice. - You should point the primary domain DNS at the newly promoted site. Otherwise, runners must be registered again with the newly promoted site, and all Git remotes, bookmarks, and external integrations must be updated.
Promoting a secondary site running on a single node
-
SSH in to your secondary site and execute:
-
To promote the secondary site to primary:
sudo gitlab-ctl geo promote -
To promote the secondary site to primary without any further confirmation:
sudo gitlab-ctl geo promote --force
-
-
Verify you can connect to the newly-promoted primary site using the URL used previously for the secondary site.
-
If successful, the secondary site