Skip to content
This repository has been archived by the owner on Nov 8, 2022. It is now read-only.

Support for dynamic metrics #676

Closed
andrzej-k opened this issue Jan 25, 2016 · 5 comments
Closed

Support for dynamic metrics #676

andrzej-k opened this issue Jan 25, 2016 · 5 comments
Labels

Comments

@andrzej-k
Copy link
Contributor

Definition:
Dynamic metrics is a metric which:

  1. can appear or disapear between call to GetMetricTypes and call to CollectMetrics
    • example: Linux process, cgroup, VM, container
  2. is not know at the time of GetMetricTypes call (aka. user defined metrics in task manifest)
    • example: metric is defined by user in a task manifest - e.g. User defines SQL query as part of plugin config, then is defining metric which is exposing query result
  3. can depend on other dynamic metrics
    • example: specific metrics for specific Linux processes

Configuration:
Extensions in per metric (in task manifest), per plugin (in task manifest) and global (config file for snapd) configuration.

  1. config per metric shall support more keys than just Version, at least also Source and Tags
    • example: specific metrics retrieved only from specific Sources (hosts), Tags allowing for per metric configuration
  2. in single task manifest support having many "collect" sections to allow publishing one set of metrics to one destination and other metrics to other destination without a need of creating separate task manifest.

Filtering:
Extensions in filtering capabilities.

  1. '*' should not be part of metric namespace, and should not be exposed on a metrics list. It should work on a task manifest level.
    • in case user requests all metrics for server (/intel/server) then framework shall trigger metrics gathering from all plugins with namespace starting with /intel/server
    • in case user requests filtering on pluign level (/intel/server/disk/*/octets_read) then plugin author shall provide support for it
  2. it should be possible to replace any element of the namespace with * or other wildcard/reg exp
    • supported wildcards TBD
  3. it should be possible to use wildcards multiple times in single namespace

Metric listing:
Extensions in metrics listing.

  1. it should be possible for plugin to return more, or different, metrics than exposed in metric catalog
  2. metric listing should support listing all metrics, listing just /namespace_prefix/* is not enough
    • possible solution: allow defining namespaces as /intel/procfs/<process_name>/cpu_utilization - then allow to filter on process name and/or return metrics which include names of processes at the time of metric gathering: /intel/procfs/process_A/cpu_utilization, /intel/procfs/process_B/cpu_utilization
  3. for user defined metrics return information that metrics are defined by user or enable reading task manifest in GetMetricTypes()

Returning metrics:

  1. If metrics target (for example: process or cgroup) is not longer available snap should log warning but the task should continue to run
@jcooklin
Copy link
Collaborator

(In the Definition section)

  1. is not known at the time of GetMetricTypes call

Should be changed to "may not be known at the time of GetMetricTypes" (a.k.a. at load)

(In the Configuration section)

There is a relationship to #652 here.

in single task manifest support having many "collect"

I feel these are best left as separate tasks.

(In the filtering section)

'*' should not be part of metric namespace ...

How do you propose the plugin author communicate that their plugin accepts a wild card at specific locations in the namespace?

Also, do you mean that a '*' should not be part of the general namespace or in other words that part of the namespace that is shared with multiple plugins (prefix)?

it should be possible to replace any element of the namespace with * or other wildcard/reg exp

I'm not sure how the framework can make this guarantee. Accepting a wildcard is largely up to the plugin writer.

(In the Metric listing section)

Item 1 sounds good to me.
Item 2 is related to the plugin supporting a wildcard/regex.
I'm not sure how Item 3 would work.

@andrzej-k
Copy link
Contributor Author

Hey @jcooklin thanks for comments
Regarding:

Filtering.1&2
Since namespace is kind of a tree:

/intel
     |---> /server
                |---> /disk
                         |---> /sda1
                                  |---> /metric_1
                                  |---> /metric_2
                         |---> /sda2
                                  |---> /metric_1
                                  |---> /metric_2         

Then following should be possible:
(framework level) /intel/* would return all metrics from all intel plugins (/intel/server, /intel/docker, etc)
(framework level) /intel/server/* would return all metrics from all intel server plugins (/intel/server/disk, /intel/server/cpu, etc)
(plugin level) /intel/server/disk/* would return all metrics for all disks (sda1 and sda2)
(plugin level) /intel/server/disk/sda1/* would return all metrics for sda1 (metric1 and metric2)

Metric listing.3
So assuming that user has defined some metrics on task manifest level like this:

    "workflow": {
        "collect": {
            "metrics": {
                "/intel/mock/foo": {"tags" : {"metric" : "users_count"} },
                "/intel/mock/bar": {"tags" : {"metric" : "items_count"} },
                "/intel/mock/*/baz": {}
            },
            "config": {
                "/intel/mock": {
                    "db_user": "root",
                    "db_password": "secret",
                    "users_count": "SELECT COUNT(users) FROM some_db;"
                    "items_users": "SELECT COUNT(items) FROM some_db;"
                }
            },

Then, since GetMetricTypes() doesn't have access to task manifest json, it can only return someting like, for example: /intel/server/<user_defined>
but maybe, once task is loaded, GetMetricTypes() could return:
/intel/mock/foo
/intel/mock/bar

@sandlbn
Copy link
Contributor

sandlbn commented Jan 26, 2016

For implementation, my proposition is to extend client interface and add method which can "flush" plugin and return new list of metrics, then when user have a task with wildcard at any level, new instance/container will be monitored from this time. For eg.: /intel/docker/*/memory/free . This method can be triggered at the creation/remove of container (we can have a buffer with container id, counting containers is not enough) same with libvirt.

@geauxvirtual
Copy link
Contributor

Bug #654 is related.

@jcooklin
Copy link
Collaborator

@sandlbn @andrzej-k: will you comment on #679?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants