ClickHouse Restore

This article explains how to restore a backup batch to a target ClickHouse cluster using the restore playbook in clickhouse_ansible.

The commands in this article are based on the portable Ansible distribution by default and have executed setup_portable_ansible.sh and source ~/.bashrc, so use ansible-playbook directly.

1. Overview of recovery principles

By default, the recovery process takes the backup manifest as input and uses the strategy of “full recovery of the primary copy of each shard and recovery of other copy structures”:

  1. Select a primary copy for each shard to perform full recovery.
  2. Other replicas perform structure_only recovery to complete the structure and replication metadata.
  3. The ClickHouse replication mechanism then completes the remaining data equalization.

This can reduce the duplicate IO caused by fully importing all copies from the backup disk at the same time.

2. Preconditions

  1. A backup batch and manifest file already exist.
  2. The target cluster has been NFS mounted and the backup disk is correctly configured, such as backup_nfs.
  3. All target nodes can access the backup mounting directory, such as /backup.
  4. Use a dedicated recovery inventory, such as inventory/hosts.restore.ini.
  5. If the target is disaster recovery of the original cluster, write stop and object cleanup need to be completed first.
  6. If it is the default drill environment, the recovery target is 192.168.199.141/142/143, backup disk preparation uses inventory/hosts.dr_backup.ini, and recovery execution uses inventory/hosts.restore.ini.

3. Minimal example of restoring inventory

[clickhouse_restore]
ck-141-1 ansible_host=198.51.100.141 shard=1 replica=1 clickhouse_tcp_port=9000
ck-141-2 ansible_host=198.51.100.141 shard=3 replica=2 clickhouse_tcp_port=9001
ck-142-1 ansible_host=198.51.100.142 shard=1 replica=2 clickhouse_tcp_port=9000
ck-142-2 ansible_host=198.51.100.142 shard=2 replica=1 clickhouse_tcp_port=9001
ck-143-1 ansible_host=198.51.100.143 shard=2 replica=2 clickhouse_tcp_port=9000
ck-143-2 ansible_host=198.51.100.143 shard=3 replica=1 clickhouse_tcp_port=9001

[all:vars]
dbbot_inventory_purpose=restore
ansible_python_interpreter=auto_silent
ansible_user=root
ansible_ssh_pass="'<your_ssh_password>'"

4. Key parameters

Edit playbooks/vars/restore_config.yml and confirm:

  • restore_batch_id
  • restore_manifest_file
  • restore_to_all_replicas
  • restore_allow_non_empty_tables
  • restore_require_manual_confirm
  • restore_enable_two_phase_mv_compat
  • restore_allow_partial_cluster

Additional instructions:

  • restore_cluster.yml also checks clickhouse_default_password.
  • If you are still using the public default password Dbbot_default@8888, it will be intercepted by pre_tasks by default; it is only recommended to explicitly set fcs_allow_dbbot_default_passwd: true in experimental environments.

5. Target cluster preparation before recovery

cd /usr/local/dbbot/clickhouse_ansible/playbooks
ansible-playbook \
  -i ../inventory/hosts.dr_backup.ini \
  setup_nfs_client_mount_rc_local.yml

ansible-playbook \
  -i ../inventory/hosts.dr_backup.ini \
  prepare_backup_disk.yml \
  -e "backup_storage_disk=backup_nfs backup_mount_dir=/backup"

The above two steps must be completed on the recovery target cluster before entering the formal recovery.

6. Standard recovery commands

cd /usr/local/dbbot/clickhouse_ansible/playbooks
ansible-playbook \
  -i ../inventory/hosts.restore.ini \
  restore_cluster.yml \
  -e "restore_batch_id=20260306T210000_CST_bk001"

By default, you will be prompted to enter a confirmation phrase, for example: RESTORE 20260306T210000_CST_bk001.

7. Single copy recovery model

If you only want to restore one copy of each shard, you can temporarily overwrite:

cd /usr/local/dbbot/clickhouse_ansible/playbooks
ansible-playbook \
  -i ../inventory/hosts.restore.ini \
  restore_cluster.yml \
  -e "restore_batch_id=20260306T210000_CST_bk001 restore_to_all_replicas=false"

This mode is more suitable for drills or special scenarios; when using it in production, you should first confirm that the table structure and replication strategy meet the requirements.

  1. Stop business writing and freeze the change window.
  2. Select the recovery batch and check safe_ts in the manifest.
  3. Use recovery-specific inventory to point to the target cluster.
  4. Manually clean up the database or table that needs to be restored.
  5. Execute restore_cluster.yml.
  6. Verify row count, replica status, and replication health.
  7. Use safe_ts as the starting point to perform business complement or playback.

9. Validation after recovery

8.1 Total amount validation

SELECT count() FROM <db>.<distributed_table>;

8.2 Number of local rows in each copy

SELECT
  hostName() AS host,
  tcpPort() AS port,
  count() AS rows
FROM clusterAllReplicas('<cluster>', '<db>', '<local_table>')
GROUP BY host, port
ORDER BY host, port;

8.3 Copy health

SELECT table, is_readonly, queue_size, absolute_delay
FROM system.replicas
WHERE database = '<db>'
ORDER BY table;

8.4 Acceptance criteria of TTL table

If the recovery object has TTL...DELETE, do not directly use count() of the entire table of the source cluster and the target cluster to strengthen consistency acceptance.

Reason:

  1. RESTORE is only responsible for restoring the part.
  2. TTL DELETE relies on background merge asynchronous cleanup and will not be completed instantly during recovery.
  3. Even if ALTER TABLE ... MATERIALIZE TTL is executed manually, the execution time of the source cluster and the target cluster may be different, and a small amount of data near the TTL boundary may continue to drift.

For example, mysql_slowlog_raw_local defaults to:

  • Number span 60 days
  • Table TTL 30 DAY DELETE

This combination is more suitable for demonstrating TTL behavior, and is not suitable for directly using “the total number of all tables must be exactly the same” for recovery acceptance.

For TTL tables, it is recommended to use one of the following two methods:

  1. Use a fixed time window for comparison, and both source and target use the same literal cutoff:
SELECT count()
FROM lab_ck_biz.mysql_slowlog_raw_local
WHERE log_time >= toDateTime('2026-02-07 00:00:00');
  1. Prioritize business tables without TTL for recovery consistency acceptance.

If you just want to speed up TTL cleanup, you can execute it on both the source and destination:

ALTER TABLE lab_ck_biz.mysql_slowlog_raw_local
ON CLUSTER 'example_3shards_2replicas'
MATERIALIZE TTL
SETTINGS mutations_sync = 2;

But please note: this command can only help speed up TTL convergence. It is still not recommended to use “the total number of the entire table in the TTL table is completely consistent” as the only acceptance criterion.

10. Precautions

  1. Restoring Playbook by default requires that the inventory file name contains restore and dbbot_inventory_purpose=restore is set.
  2. Playbook will not automatically execute high-risk DROP DATABASE or DROP TABLE. These actions require manual review before execution.
  3. For scenes containing materialized views, two-stage compatible recovery logic is enabled by default.
  4. If the backup contains configuration snapshots, whether to play back the configuration should be determined based on the change window and target environment assessment.
  5. The NFS mount and prepare_backup_disk.yml on the recovery target side will not be automatically handled by restore_cluster.yml and must be prepared in advance.
  6. If you want to perform cross-cluster acceptance of the source cluster and the target cluster, please use the independent ClickHouse Post-Restore Data Validation document and validate_restore_consistency.yml.