feat: add aggregated rocksdb metrics #6354
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This patch adds a pattern for computing/reporting metrics that return aggregates
of the rocksdb metrics added by KIP-607. Additionally, this particular PR adds the
following metrics:
num-running-compactions-total: the total number of running compactions
estimate-num-keys-total: an estimate of the total number of rocksdb keys
block-cache-usage-total: total memory usage of all block cache
block-cache-pinned-usage-total: total memory used by pinned blocks
estimate-table-readers-mem-total: estimate of the total table readers mem
ksqlDB registers for notification about new rocksdb metrics by creating a
MetricsReporter implementation called RocksDBMetricCollector. The metrics
system calls into MetricsReporter.metricChange when a new metric is added.
RocksDBMetricCollector looks out for rocksdb property metrics it cares about
and tracks them under the relevant aggregates. Each aggregate is registered
with the ksql metrics context on the first instantiation of
RocksDBMetricCollector.
Metrics are computed lazily when read, and are rate-limited to a configurable
interval. The interval is set using the property
ksql.rocksdb.metrics.update.interval.seconds
One alternative I considered was to dynamically add the metrics as they are sent
to RocksDBMetricCollector.metricChange (rather than hard-coding a static list).
Opted not to do this in case we add metrics in the future that use different types,
or want to compute different aggregates (e.g. for some metrics maybe a max or
average makes more sense)
Testing done
Ran our aggregation benchmark with these metrics collected and 1000 partitions
and didn't see any perf regression (processing rate 39098 records/second)