ClickHouse Backup
This article explains how to use the backup-related playbooks in clickhouse_ansible to establish a standardized backup process for ClickHouse clusters.
The commands in this article are based on the portable Ansible distribution by default and have executed setup_portable_ansible.sh and source ~/.bashrc, so use ansible-playbook directly.
1. Scope of application
Backup related entries include:
playbooks/prepare_backup_disk.yml: Configure backup disk.playbooks/backup_cluster.yml: Perform full or incremental backup.
The default documentation example uses NFS as backup storage.
2. Preconditions
- ClickHouse cluster deployment has been completed.
- The NFS server configuration has been completed on
192.168.199.162or other dedicated host. - The backup node can access the backup mounting directory, such as
/backup. - Use a dedicated backup inventory, such as
inventory/hosts.backup.ini. - If you want to perform off-site recovery in the future, it is recommended to perform NFS mounting and backup disk preparation on the recovery target at the same time.
- Objects within the backup scope preferentially use the replication table engine to avoid data gaps caused by single-copy backup.
3. Backup inventory minimal example
[clickhouse_backup]
ck-131-1 ansible_host=192.0.2.131 shard=1 replica=1 clickhouse_tcp_port=9000
ck-131-2 ansible_host=192.0.2.131 shard=3 replica=2 clickhouse_tcp_port=9001
ck-132-1 ansible_host=192.0.2.132 shard=1 replica=2 clickhouse_tcp_port=9000
ck-132-2 ansible_host=192.0.2.132 shard=2 replica=1 clickhouse_tcp_port=9001
ck-133-1 ansible_host=192.0.2.133 shard=2 replica=2 clickhouse_tcp_port=9000
ck-133-2 ansible_host=192.0.2.133 shard=3 replica=1 clickhouse_tcp_port=9001
[all:vars]
dbbot_inventory_purpose=backup
ansible_python_interpreter=auto_silent
ansible_user=root
ansible_ssh_pass="'<your_ssh_password>'"
Note: In backup scenarios, it is recommended to fill in clickhouse_tcp_port explicitly to avoid relying on port derivation logic.
4. Key parameters
Edit playbooks/vars/backup_config.yml and confirm the following parameters first:
backup_databases/backup_tablesbackup_modebackup_base_batch_idbackup_storage_diskbackup_mount_dirbackup_checkpoint_modebackup_require_replicated_tablesbackup_allow_partial_cluster
Additional instructions:
backup_cluster.ymlnow also checksclickhouse_default_password.- If you are still using the public default password
Dbbot_default@8888, it will be intercepted bypre_tasksby default; it is only recommended to explicitly setfcs_allow_dbbot_default_passwd: truein experimental environments.
5. Configure NFS and backup disk for the first time
5.1 Configure NFS server
cd /usr/local/dbbot/clickhouse_ansible/playbooks
ansible-playbook \
-i ../inventory/hosts.nfs_server.ini \
setup_nfs_server.yml
By default /srv/nfs/clickhouse_backup is exported on 192.168.199.162.
5.2 Mount NFS on the source cluster
cd /usr/local/dbbot/clickhouse_ansible/playbooks
ansible-playbook \
-i ../inventory/hosts.backup.ini \
setup_nfs_client_mount_rc_local.yml
5.3 Write ClickHouse backup disk configuration for the backup node
cd /usr/local/dbbot/clickhouse_ansible/playbooks
ansible-playbook \
-i ../inventory/hosts.backup.ini \
prepare_backup_disk.yml \
-e "backup_storage_disk=backup_nfs backup_mount_dir=/backup"
After execution, you should confirm that the corresponding backup disk is visible in system.disks.
5.4 If you want to restore to the disaster recovery cluster later, prepare the DR side backup disk in advance
cd /usr/local/dbbot/clickhouse_ansible/playbooks
ansible-playbook \
-i ../inventory/hosts.dr_backup.ini \
setup_nfs_client_mount_rc_local.yml
ansible-playbook \
-i ../inventory/hosts.dr_backup.ini \
prepare_backup_disk.yml \
-e "backup_storage_disk=backup_nfs backup_mount_dir=/backup"
6. Perform full backup
cd /usr/local/dbbot/clickhouse_ansible/playbooks
ansible-playbook \
-i ../inventory/hosts.backup.ini \
backup_cluster.yml \
-e '{"backup_databases":["biz_db"],"backup_mode":"full"}'
If you want to fix the batch number, you can explicitly pass in backup_batch_id:
cd /usr/local/dbbot/clickhouse_ansible/playbooks
ansible-playbook \
-i ../inventory/hosts.backup.ini \
backup_cluster.yml \
-e '{"backup_databases":["biz_db"],"backup_mode":"full","backup_batch_id":"20260306T210000_CST_bk001"}'
7. Perform incremental backup
cd /usr/local/dbbot/clickhouse_ansible/playbooks
ansible-playbook \
-i ../inventory/hosts.backup.ini \
backup_cluster.yml \
-e '{"backup_databases":["biz_db"],"backup_mode":"incremental","backup_base_batch_id":"20260306T210000_CST_bk001"}'
8. Backup products and calibers
After successful execution, the backup process typically outputs the following information:
backup_batch_idsafe_ts- manifest path
The manifest will be written in two copies by default:
- Backup directory:
/backup/<cluster>/<batch_id>/manifest/manifest.json - Control node:
artifacts/manifests/backup/<batch_id>.json
Among them, safe_ts can be used as the follow-up business complement or the time caliber of data playback.
9. Validation before and after backup
9.1 Copy health check
SELECT database, table, is_readonly, queue_size, absolute_delay
FROM system.replicas
ORDER BY database, table;
9.2 Check backup results
jq '{batch_id, cluster_name, backup_mode, safe_ts, results}' \
/backup/<cluster>/<batch_id>/manifest/manifest.json
10. Risks and Recommendations
- The default policy selects only one copy of each shard for physical backup to reduce repeated IO.
- If there are non-replicated local tables in the backup scope, it is recommended to rectify them before performing production backup.
- It is not recommended to use
--checkmode for backup Playbooks. - In the production environment, it is recommended to retain batch numbers with fixed naming conventions to facilitate auditing and recovery.
- The NFS mount of the recovery target and
prepare_backup_disk.ymlshould not wait until a real disaster occurs before executing it for the first time. It is recommended to solidify it during the drill stage.