blobstore: Add `metrics_tag` config field #205

minor-fixes · 2024-05-11T02:21:08Z

This change adds a single string field to the base blobstore config message, the value of which gets propagated to a tag value on exported metrics for the corresponding backend instance.

In particular, this change allows for attaching different tags on separate uses of a single "label" backend in order to collect metrics specific to a particular instantiation. Without this change, all accesses to the referenced backend shared one combined set of metrics; with this change, it's possible to set separate metrics_tag values on each instantiation and collect separate metrics, as well as tagging the underlying backend and grabbing aggregate metrics as well.

TODO before marking non-draft:

Tested: see comment below

minor-fixes · 2024-05-11T02:35:47Z

With the config attached below, I started a bb_storage instance, and issued 7 RPCs:

1x "access through readCaching": grpc_cli call localhost:8980 ContentAddressableStorage.FindMissingBlobs 'instance_name: "" digest_function: 1 blob_digests { hash: "aed662c522d64bb9e73534f4cd9cd0d98c8407e737689f1a591e4409f4d9e0ce" size_bytes: 5 }'
2x "access slow directly": grpc_cli call localhost:8980 ContentAddressableStorage.FindMissingBlobs 'instance_name: "slow" digest_function: 1 blob_digests { hash: "aed662c522d64bb9e73534f4cd9cd0d98c8407e737689f1a591e4409f4d9e0ce" size_bytes: 5 }'
4x "access fast directly": grpc_cli call localhost:8980 ContentAddressableStorage.FindMissingBlobs 'instance_name: "fast" digest_function: 1 blob_digests { hash: "aed662c522d64bb9e73534f4cd9cd0d98c8407e737689f1a591e4409f4d9e0ce" size_bytes: 5 }'

The counts for the respective tags were:

top: 7 (sum of all RPCs)
read_cached: 1 (only one call to the readCaching backend)
slow_underlying: 3 (1 call to readCaching + 2 calls to 'slow' directly via instance name)
fast_underlying: 4 (0 calls to readCaching? + 4 calls to 'fast' directly via instance name)
slow_direct: 2 (counts only direct calls via instance name)
slow_cached: 1 (only one call to the readCaching backend)
fast_direct: 4 (counts only direct calls via instance name)
fast_cached: 0 (presumably; could not find metrics)

I wouldn't mind adding an appropriate test though; any ideas on a better testing approach?

Config:

local SmallCache = function(tag) {
  metricsTag: tag,
  'local': {
    keyLocationMapInMemory: { entries: 1000 },
    keyLocationMapMaximumGetAttempts: 8,
    keyLocationMapMaximumPutAttempts: 32,
    oldBlocks: 2,
    currentBlocks: 1,
    newBlocks: 1,
    blocksInMemory: {
      blockSizeBytes: 10 * 1000 * 1000,
    },
  },
};

{
  executeAuthorizer: { allow: {} },
  grpcServers: [
    {
      listenAddresses: ["127.0.0.1:8980"],
      authenticationPolicy: { allow: {} },
    },
  ],
  global: {
    diagnosticsHttpServer: {
      httpServers: [{ listen_addresses: ["127.0.0.1:9980"], authenticationPolicy: { allow: {} } }],
      enablePrometheus: true,
    },
  },
  contentAddressableStorage: {
    getAuthorizer: { allow: {} },
    putAuthorizer: { allow: {} },
    findMissingAuthorizer: { allow: {} },
    backend: {
      withLabels: {
        backend: {
          metricsTag: 'top',
          demultiplexing: {
            instanceNamePrefixes: {
              '': {
                backend: {
                  metricsTag: 'read_cached',
                  readCaching: {
                    slow: { label: 'slow', metricsTag: 'slow_cached' },
                    fast: { label: 'fast', metricsTag: 'fast_cached' },
                    replicator: { 'local': {} },
                  },
                },
              },
              'slow': {
                backend: {
                  label: 'slow',
                  metricsTag: 'slow_direct',
                },
              },
              'fast': {
                backend: {
                  label: 'fast',
                  metricsTag: 'fast_direct',
                },
              },
            },
          },
        },
        labels: {
          'fast': SmallCache('fast_underlying'),
          'slow': SmallCache('slow_underlying'),
        },
      },
    },
  },
}

EdSchouten · 2024-05-25T13:41:32Z

Instead of actually having a separate metrics_tag, what are your thoughts on reusing the existing with_label / label facility? As in, make it so that using with_label causes a label="..." to be added to any BlobAccess underneath.

minor-fixes · 2024-11-11T18:42:31Z

Instead of actually having a separate metrics_tag, what are your thoughts on reusing the existing with_label / label facility? As in, make it so that using with_label causes a label="..." to be added to any BlobAccess underneath.

I'm trying to make it a bit easier to probe metrics at any layer, so it's a bit more clear when things are misconfigured. The WithLabelsBlobAccess does look like a good candidate for this, but I'm concerned it will complicate things in practice:

in order to instrument a particular layer, it needs to be moved to a labeled backend, which is a moderate configuration change - if one were trying to probe metrics to debug a misconfiguration, such a config change to do so would create more uncertainty.
nesting labeled backends could be challenging to configure, and difficult to reason about (if children of labeled backends inherit parent labels)

With this approach, one only needs to add unique metricsTag specifiers in the config to desired points in the blobstore configuration stack.

As an added bonus, metrics for the same labeled backend can be tracked with different identifiers if the labeled backend is reused, making it possible to track metrics for disparate entrypoints into the same labeled backend separately if desired. In the example above, the labeled backend calls can be measured via the slow_underlying labels, and the calls from the two wrappers can also be recorded separately via the slow_direct and slow_cached labels.

WIP; more description to come

minor-fixes force-pushed the metrics_tag branch 4 times, most recently from 2e717ee to e8ef171 Compare May 13, 2024 00:47

minor-fixes marked this pull request as ready for review November 11, 2024 18:52

blobstore: Add metrics_tag config field

a670343

WIP; more description to come

minor-fixes force-pushed the metrics_tag branch from e8ef171 to a670343 Compare November 11, 2024 18:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blobstore: Add `metrics_tag` config field #205

blobstore: Add `metrics_tag` config field #205

minor-fixes commented May 11, 2024 •

edited

Loading

minor-fixes commented May 11, 2024

EdSchouten commented May 25, 2024

minor-fixes commented Nov 11, 2024 •

edited

Loading

blobstore: Add metrics_tag config field #205

Are you sure you want to change the base?

blobstore: Add metrics_tag config field #205

Conversation

minor-fixes commented May 11, 2024 • edited Loading

minor-fixes commented May 11, 2024

EdSchouten commented May 25, 2024

minor-fixes commented Nov 11, 2024 • edited Loading

blobstore: Add `metrics_tag` config field #205

blobstore: Add `metrics_tag` config field #205

minor-fixes commented May 11, 2024 •

edited

Loading

minor-fixes commented Nov 11, 2024 •

edited

Loading