Support for dynamic metrics #676

andrzej-k · 2016-01-25T12:24:01Z

Definition:
Dynamic metrics is a metric which:

can appear or disapear between call to GetMetricTypes and call to CollectMetrics
- example: Linux process, cgroup, VM, container
is not know at the time of GetMetricTypes call (aka. user defined metrics in task manifest)
- example: metric is defined by user in a task manifest - e.g. User defines SQL query as part of plugin config, then is defining metric which is exposing query result
can depend on other dynamic metrics
- example: specific metrics for specific Linux processes

Configuration:
Extensions in per metric (in task manifest), per plugin (in task manifest) and global (config file for snapd) configuration.

config per metric shall support more keys than just Version, at least also Source and Tags
- example: specific metrics retrieved only from specific Sources (hosts), Tags allowing for per metric configuration
in single task manifest support having many "collect" sections to allow publishing one set of metrics to one destination and other metrics to other destination without a need of creating separate task manifest.

Filtering:
Extensions in filtering capabilities.

'*' should not be part of metric namespace, and should not be exposed on a metrics list. It should work on a task manifest level.
- in case user requests all metrics for server (/intel/server) then framework shall trigger metrics gathering from all plugins with namespace starting with /intel/server
- in case user requests filtering on pluign level (/intel/server/disk/*/octets_read) then plugin author shall provide support for it
it should be possible to replace any element of the namespace with * or other wildcard/reg exp
- supported wildcards TBD
it should be possible to use wildcards multiple times in single namespace

Metric listing:
Extensions in metrics listing.

it should be possible for plugin to return more, or different, metrics than exposed in metric catalog
metric listing should support listing all metrics, listing just /namespace_prefix/* is not enough
- possible solution: allow defining namespaces as /intel/procfs/<process_name>/cpu_utilization - then allow to filter on process name and/or return metrics which include names of processes at the time of metric gathering: /intel/procfs/process_A/cpu_utilization, /intel/procfs/process_B/cpu_utilization
for user defined metrics return information that metrics are defined by user or enable reading task manifest in GetMetricTypes()

Returning metrics:

If metrics target (for example: process or cgroup) is not longer available snap should log warning but the task should continue to run

The text was updated successfully, but these errors were encountered:

jcooklin · 2016-01-25T23:05:57Z

(In the Definition section)

is not known at the time of GetMetricTypes call

Should be changed to "may not be known at the time of GetMetricTypes" (a.k.a. at load)

(In the Configuration section)

There is a relationship to #652 here.

in single task manifest support having many "collect"

I feel these are best left as separate tasks.

(In the filtering section)

'*' should not be part of metric namespace ...

How do you propose the plugin author communicate that their plugin accepts a wild card at specific locations in the namespace?

Also, do you mean that a '*' should not be part of the general namespace or in other words that part of the namespace that is shared with multiple plugins (prefix)?

it should be possible to replace any element of the namespace with * or other wildcard/reg exp

I'm not sure how the framework can make this guarantee. Accepting a wildcard is largely up to the plugin writer.

(In the Metric listing section)

Item 1 sounds good to me.
Item 2 is related to the plugin supporting a wildcard/regex.
I'm not sure how Item 3 would work.

andrzej-k · 2016-01-26T13:06:06Z

Hey @jcooklin thanks for comments
Regarding:

Filtering.1&2
Since namespace is kind of a tree:

/intel
     |---> /server
                |---> /disk
                         |---> /sda1
                                  |---> /metric_1
                                  |---> /metric_2
                         |---> /sda2
                                  |---> /metric_1
                                  |---> /metric_2

Then following should be possible:
(framework level) /intel/* would return all metrics from all intel plugins (/intel/server, /intel/docker, etc)
(framework level) /intel/server/* would return all metrics from all intel server plugins (/intel/server/disk, /intel/server/cpu, etc)
(plugin level) /intel/server/disk/* would return all metrics for all disks (sda1 and sda2)
(plugin level) /intel/server/disk/sda1/* would return all metrics for sda1 (metric1 and metric2)

Metric listing.3
So assuming that user has defined some metrics on task manifest level like this:

    "workflow": {
        "collect": {
            "metrics": {
                "/intel/mock/foo": {"tags" : {"metric" : "users_count"} },
                "/intel/mock/bar": {"tags" : {"metric" : "items_count"} },
                "/intel/mock/*/baz": {}
            },
            "config": {
                "/intel/mock": {
                    "db_user": "root",
                    "db_password": "secret",
                    "users_count": "SELECT COUNT(users) FROM some_db;"
                    "items_users": "SELECT COUNT(items) FROM some_db;"
                }
            },

Then, since GetMetricTypes() doesn't have access to task manifest json, it can only return someting like, for example: /intel/server/<user_defined>
but maybe, once task is loaded, GetMetricTypes() could return:
/intel/mock/foo
/intel/mock/bar

sandlbn · 2016-01-26T13:54:50Z

For implementation, my proposition is to extend client interface and add method which can "flush" plugin and return new list of metrics, then when user have a task with wildcard at any level, new instance/container will be monitored from this time. For eg.: /intel/docker/*/memory/free . This method can be triggered at the creation/remove of container (we can have a buffer with container id, counting containers is not enough) same with libvirt.

geauxvirtual · 2016-01-26T16:04:57Z

Bug #654 is related.

jcooklin · 2016-01-27T02:03:30Z

@sandlbn @andrzej-k: will you comment on #679?

mbbroberg added the type/rfc label Mar 8, 2016

IzabellaRaulin mentioned this issue Mar 24, 2016

First implementation of dynamic query support #803

Merged

jcooklin closed this as completed Jun 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for dynamic metrics #676

Support for dynamic metrics #676

andrzej-k commented Jan 25, 2016

jcooklin commented Jan 25, 2016

andrzej-k commented Jan 26, 2016

sandlbn commented Jan 26, 2016

geauxvirtual commented Jan 26, 2016

jcooklin commented Jan 27, 2016

Support for dynamic metrics #676

Support for dynamic metrics #676

Comments

andrzej-k commented Jan 25, 2016

jcooklin commented Jan 25, 2016

andrzej-k commented Jan 26, 2016

sandlbn commented Jan 26, 2016

geauxvirtual commented Jan 26, 2016

jcooklin commented Jan 27, 2016