Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ClickHouse server monitoring & Support service hierarchy #11966

Merged
merged 11 commits into from
Mar 7, 2024

Conversation

CzyerChen
Copy link
Contributor

Feature #11904

  • If this is non-trivial feature, paste the links/URLs to the design doc.
  • Update the documentation to include this new feature.
  • Tests(including UT, IT, E2E) are added to verify the new feature.
  • If it's UI related, attach the screenshots below.

ClickHouse root:
image

ClickHouse Service:
image
image

ClickHouse Instance:
image
image
image

Service Hierarchy:
image
image
image
click CLICKHOUSE node and jump to server monitoring dashboard.

Virtual DB:
image
image
image

@wu-sheng wu-sheng requested review from wu-sheng and wankai123 March 2, 2024 00:19
@wu-sheng wu-sheng added this to the 10.0.0 milestone Mar 2, 2024
@wu-sheng wu-sheng added backend OAP backend related. feature New feature labels Mar 2, 2024
@wu-sheng
Copy link
Member

wu-sheng commented Mar 2, 2024

The SWIP part should be updated too.

@wu-sheng
Copy link
Member

wu-sheng commented Mar 2, 2024

@mrproliu Could you check why ebpf failed?

Copy link
Member

@wu-sheng wu-sheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e2e seems not to be added into GHA control file?

docs/en/changes/changes.md Outdated Show resolved Hide resolved
Comment on lines 17 to 25
prometheus:
config:
scrape_configs:
- job_name: 'clickhouse-monitoring'
scrape_interval: 15s
static_configs:
- targets: ['clickhouse1:9363','clickhouse2:9363']
labels:
host_name: clickhouse:8123
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By following my question, the point is, is this process making metrics from two targets in one metric report or two?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CzyerChen Could you share how the metrics from clickhouse1 and clickhouse2 would be scraped here? One metric reported per target with the same host_name, or one metric for both targets in the same time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClickHouseMetrics_MySQLConnection -> {SampleFamily@26369} 
"SampleFamily(samples=[Sample(name=ClickHouseMetrics_MySQLConnection, labels={net_host_port=9363, job_name=clickhouse-monitoring, 
node_identifier_host_name=clickhouse1, 
http_scheme=http, 
service_instance_id=clickhouse1:9363,
 host_name=clickhouse:8123}, 
value=0.0, timestamp=1709519785903)], context=SampleFamily.RunningContext(metricName=null, meterSamples={}, defaultHistogramBucketUnit=SECONDS))"

The OTEL will scrape and send the metrics node by node, in other words, OAP only received 1 node's metric at one time, so we can't sum the metrics from separate node's metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to make it from two targets to two hostnames. Prometheus endpoint fetches data for per node.
I will correct otel-collector-config.yaml.

@wu-sheng wu-sheng mentioned this pull request Mar 5, 2024
2 tasks
# Conflicts:
#	docs/en/changes/changes.md
#	docs/en/concepts-and-designs/service-hierarchy.md
#	oap-server/server-starter/src/main/resources/hierarchy-definition.yml
@CzyerChen CzyerChen force-pushed the clickhouse-monitoring branch from eb82717 to 2957524 Compare March 5, 2024 15:57
@CzyerChen
Copy link
Contributor Author

Mainly sum up labeled value by MQE.

ClickHouse-Root  ClickHouse-Service Sw ClickHouse-Root  ClickHouse-Service Sw ClickHouse-Root  ClickHouse-Service  ClickHouse-Instance ClickHouse-Root  ClickHouse-Service  ClickHouse-Instance Sw ClickHouse-Root ClickHouse-Service  ClickHouse-Instance Sw ClickHouse-Root ClickHouse-Service  ClickHouse-Instance

@wu-sheng
Copy link
Member

wu-sheng commented Mar 6, 2024

@wankai123 PTAL

@CzyerChen
Copy link
Contributor Author

Two empty metrics ClickHouseMetrics_KeeperAliveConnections/ClickHouseMetrics_PartMutation answered here: ClickHouse/ClickHouse#60742

meter_clickhouse_instance_connections_alive/meter_clickhouse_connections_alive here will always be 0 because it is from ClickHouse Keeper Prometheus endpoint rather than ClickHouse Prometheus endpoint, and I will remove it.

@wu-sheng
Copy link
Member

wu-sheng commented Mar 6, 2024

Are we fetching metrics from ClickHouse Keeper Prometheus endpoint?

@CzyerChen
Copy link
Contributor Author

ClickHouse Keeper Prometheus endpoint

My Cluster test works with Zookeeper not ClickHouse Keeper. If to test the data from ClickHouse Keeper Prometheus endpoint, we have to try a ClickHouse Keeper cluster.

@wu-sheng
Copy link
Member

wu-sheng commented Mar 6, 2024

I see, then, please be clear about this in the docs if you don't intent to build this mode.

Copy link
Member

@wankai123 wankai123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The others LGTM

@wu-sheng
Copy link
Member

wu-sheng commented Mar 6, 2024

Please update the SWIP according to the latest implementation and add it to the docs.

@CzyerChen
Copy link
Contributor Author

  • connections_alive means Number of alive connections for embedded ClickHouse Keeper
  • add metric of Keeper Outstanding Requets another important metric for embedded ClickHouse Keeper
  • update dashboard layout
SwClickHouse-Root ClickHouse-Service Sw ClickHouse-Root  ClickHouse-Service 2024-03-06 2217 ~ 2024-03-06 2232 UTC+80 SwClickHouse-Root  ClickHouse-Service ClickHous-Instance

@CzyerChen
Copy link
Contributor Author

Please update the SWIP according to the latest implementation and add it to the docs.

ok

@wu-sheng
Copy link
Member

wu-sheng commented Mar 6, 2024

SWIP5 needs to attach on SWIP page.

@CzyerChen
Copy link
Contributor Author

SWIP5 needs to attach on SWIP page.

Ok, linked.

@wu-sheng
Copy link
Member

wu-sheng commented Mar 7, 2024

I mean this page should be updated, https://github.com/apache/skywalking/blob/master/docs/en/swip/readme.md. Note the next SWIP number and the following existing SWIPs.

@wu-sheng
Copy link
Member

wu-sheng commented Mar 7, 2024

BTW, have you read #11992? Is this helping to polish clickhouse monitoring? Or it is good enough for now.

@wu-sheng wu-sheng requested a review from wankai123 March 7, 2024 00:55
@CzyerChen
Copy link
Contributor Author

BTW, have you read #11992? Is this helping to polish clickhouse monitoring? Or it is good enough for now.

It doesn't seem to be necessary on ClickHouse, where the metrics values are all singular values.

The enhancement looks great for multi labels.

@wankai123 wankai123 merged commit 00ccace into apache:master Mar 7, 2024
152 checks passed
@wu-sheng wu-sheng mentioned this pull request Mar 7, 2024
2 tasks
@wu-sheng
Copy link
Member

wu-sheng commented Mar 7, 2024

Would you like to post dual-language blogs for this new feature?

@CzyerChen
Copy link
Contributor Author

Would you like to post dual-language blogs for this new feature?

Ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend OAP backend related. feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants