ClickHouse Operations and Troubleshooting

This article compiles the most commonly used ClickHouse operations commands and troubleshooting queries in clickhouse_ansible public documents, which is suitable as a reference for daily inspection and acceptance.

export CH_HOST="192.0.2.131"
export CH_PORT="9000"
export CH_USER="default"
export CH_PASS="<clickhouse_default_password>"
export CH_CLUSTER="example_3shards_2replicas"

If you are used to using shortcut functions, you can define:

chc() {
  clickhouse-client --host "${CH_HOST}" --port "${CH_PORT}" --user "${CH_USER}" --password "${CH_PASS}" "$@"
}

2. Basic login and version check

clickhouse-client --host "${CH_HOST}" --port "${CH_PORT}" --user "${CH_USER}" --password "${CH_PASS}"
SELECT version();

3. Cluster health check

3.1 Cluster topology

SELECT cluster, shard_num, replica_num, host_name, host_address, port
FROM system.clusters
WHERE cluster = 'example_3shards_2replicas'
ORDER BY shard_num, replica_num;

3.2 Copy status

SELECT database, table, is_leader, is_readonly, queue_size, absolute_delay
FROM system.replicas
ORDER BY database, table;

3.3 Coordinate service connectivity

SELECT count() FROM system.zookeeper WHERE path IN ('/', '/clickhouse');

3.4 All node versions

SELECT hostName(), version()
FROM clusterAllReplicas('example_3shards_2replicas', system.one)
ORDER BY hostName();

4. Commonly used troubleshooting SQL

4.1 Currently executing query

SELECT query_id, user, elapsed, read_rows, memory_usage, query
FROM system.processes
ORDER BY elapsed DESC;

4.2 Replication queue accumulation

SELECT database, table, queue_size, inserts_in_queue, merges_in_queue, log_max_index - log_pointer AS log_lag
FROM system.replicas
ORDER BY queue_size DESC;

4.3 Mutation status

SELECT database, table, mutation_id, command, create_time, is_done, latest_failed_part, latest_fail_reason
FROM system.mutations
ORDER BY create_time DESC
LIMIT 20;

4.4 Active data partition

SELECT database, table, partition, count() AS parts, sum(rows) AS rows, formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
WHERE active
GROUP BY database, table, partition
ORDER BY rows DESC
LIMIT 30;

5. Common system commands

systemctl status clickhouse-server --no-pager
systemctl status clickhouse-server-2 --no-pager
ss -lntp | egrep ':8123 |:8124 |:9000 |:9001 |:9009 |:9010 '
journalctl -u clickhouse-server -n 200 --no-pager
journalctl -u clickhouse-server-2 -n 200 --no-pager

6. Relationship with backup and recovery documents