ClickHouse Operations and Troubleshooting
This article compiles the most commonly used ClickHouse operations commands and troubleshooting queries in clickhouse_ansible public documents, which is suitable as a reference for daily inspection and acceptance.
1. It is recommended to set the connection variables first
export CH_HOST="192.0.2.131"
export CH_PORT="9000"
export CH_USER="default"
export CH_PASS="<clickhouse_default_password>"
export CH_CLUSTER="example_3shards_2replicas"
If you are used to using shortcut functions, you can define:
chc() {
clickhouse-client --host "${CH_HOST}" --port "${CH_PORT}" --user "${CH_USER}" --password "${CH_PASS}" "$@"
}
2. Basic login and version check
clickhouse-client --host "${CH_HOST}" --port "${CH_PORT}" --user "${CH_USER}" --password "${CH_PASS}"
SELECT version();
3. Cluster health check
3.1 Cluster topology
SELECT cluster, shard_num, replica_num, host_name, host_address, port
FROM system.clusters
WHERE cluster = 'example_3shards_2replicas'
ORDER BY shard_num, replica_num;
3.2 Copy status
SELECT database, table, is_leader, is_readonly, queue_size, absolute_delay
FROM system.replicas
ORDER BY database, table;
3.3 Coordinate service connectivity
SELECT count() FROM system.zookeeper WHERE path IN ('/', '/clickhouse');
3.4 All node versions
SELECT hostName(), version()
FROM clusterAllReplicas('example_3shards_2replicas', system.one)
ORDER BY hostName();
4. Commonly used troubleshooting SQL
4.1 Currently executing query
SELECT query_id, user, elapsed, read_rows, memory_usage, query
FROM system.processes
ORDER BY elapsed DESC;
4.2 Replication queue accumulation
SELECT database, table, queue_size, inserts_in_queue, merges_in_queue, log_max_index - log_pointer AS log_lag
FROM system.replicas
ORDER BY queue_size DESC;
4.3 Mutation status
SELECT database, table, mutation_id, command, create_time, is_done, latest_failed_part, latest_fail_reason
FROM system.mutations
ORDER BY create_time DESC
LIMIT 20;
4.4 Active data partition
SELECT database, table, partition, count() AS parts, sum(rows) AS rows, formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
WHERE active
GROUP BY database, table, partition
ORDER BY rows DESC
LIMIT 30;
5. Common system commands
systemctl status clickhouse-server --no-pager
systemctl status clickhouse-server-2 --no-pager
ss -lntp | egrep ':8123 |:8124 |:9000 |:9001 |:9009 |:9010 '
journalctl -u clickhouse-server -n 200 --no-pager
journalctl -u clickhouse-server-2 -n 200 --no-pager
6. Relationship with backup and recovery documents
- For the backup process, please see: ClickHouse Backup
- For the recovery process, please see: ClickHouse Recovery
- For design specifications, please see: ClickHouse Deployment, Backup, and Restore Specification