ClickHouse 运维与排障

本文整理 clickhouse_ansible 公开文档中最常用的 ClickHouse 运维命令与排障查询,适合作为日常巡检和验收参考。

1. 建议先设置连接变量

export CH_HOST="192.0.2.131"
export CH_PORT="9000"
export CH_USER="default"
export CH_PASS="<clickhouse_default_password>"
export CH_CLUSTER="example_3shards_2replicas"

如果你习惯使用快捷函数,可定义:

chc() {
  clickhouse-client --host "${CH_HOST}" --port "${CH_PORT}" --user "${CH_USER}" --password "${CH_PASS}" "$@"
}

2. 基础登录与版本检查

clickhouse-client --host "${CH_HOST}" --port "${CH_PORT}" --user "${CH_USER}" --password "${CH_PASS}"
SELECT version();

3. 集群健康检查

3.1 集群拓扑

SELECT cluster, shard_num, replica_num, host_name, host_address, port
FROM system.clusters
WHERE cluster = 'example_3shards_2replicas'
ORDER BY shard_num, replica_num;

3.2 副本状态

SELECT database, table, is_leader, is_readonly, queue_size, absolute_delay
FROM system.replicas
ORDER BY database, table;

3.3 协调服务连通性

SELECT count() FROM system.zookeeper WHERE path IN ('/', '/clickhouse');

3.4 所有节点版本

SELECT hostName(), version()
FROM clusterAllReplicas('example_3shards_2replicas', system.one)
ORDER BY hostName();

4. 常用排障 SQL

4.1 当前执行中的查询

SELECT query_id, user, elapsed, read_rows, memory_usage, query
FROM system.processes
ORDER BY elapsed DESC;

4.2 复制队列堆积

SELECT database, table, queue_size, inserts_in_queue, merges_in_queue, log_max_index - log_pointer AS log_lag
FROM system.replicas
ORDER BY queue_size DESC;

4.3 Mutation 状态

SELECT database, table, mutation_id, command, create_time, is_done, latest_failed_part, latest_fail_reason
FROM system.mutations
ORDER BY create_time DESC
LIMIT 20;

4.4 活跃数据分区

SELECT database, table, partition, count() AS parts, sum(rows) AS rows, formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
WHERE active
GROUP BY database, table, partition
ORDER BY rows DESC
LIMIT 30;

5. 常见系统命令

systemctl status clickhouse-server --no-pager
systemctl status clickhouse-server-2 --no-pager
ss -lntp | egrep ':8123 |:8124 |:9000 |:9001 |:9009 |:9010 '
journalctl -u clickhouse-server -n 200 --no-pager
journalctl -u clickhouse-server-2 -n 200 --no-pager

6. 与备份恢复文档的关系