diff --git a/content/en/docs/components/katib/user-guides/metrics-collector.md b/content/en/docs/components/katib/user-guides/metrics-collector.md index a18b7c1dcd..9e59f41fae 100644 --- a/content/en/docs/components/katib/user-guides/metrics-collector.md +++ b/content/en/docs/components/katib/user-guides/metrics-collector.md @@ -6,16 +6,23 @@ weight = 40 This guide describes how Katib metrics collector works. -## Metrics Collector +## Overview + +There are two ways to collect metrics: + +1. Pull-based: collects the metrics using a _sidecar_ container. A sidecar is a utility container that supports +the main container in the Kubernetes Pod. + +2. Push-based: users push the metrics directly to Katib DB in the training scripts. In the `metricsCollectorSpec` section of the Experiment YAML configuration file, you can define how Katib should collect the metrics from each Trial, such as the accuracy and loss metrics. -Your training code can record the metrics into `StdOut` or into arbitrary output files. Katib -collects the metrics using a _sidecar_ container. A sidecar is a utility container that supports -the main container in the Kubernetes Pod. +## Pull-based Metrics Collector -To define the metrics collector for your Experiment: +Your training code can record the metrics into `StdOut` or into arbitrary output files. + +To define the pull-based metrics collector for your Experiment: 1. Specify the collector type in the `.collector.kind` field. Katib's metrics collector supports the following collector types: @@ -29,7 +36,7 @@ To define the metrics collector for your Experiment: metrics must be line-separated by `epoch` or `step` as follows, and the key for timestamp must be `timestamp`: - ``` + ```json {"epoch": 0, "foo": "bar", "fizz": "buzz", "timestamp": "2021-12-02T14:27:51"} {"epoch": 1, "foo": "bar", "fizz": "buzz", "timestamp": "2021-12-02T14:27:52"} {"epoch": 2, "foo": "bar", "fizz": "buzz", "timestamp": "2021-12-02T14:27:53"} @@ -51,9 +58,6 @@ To define the metrics collector for your Experiment: in the `.collector.customCollector` field. Check the [custom metrics collector example](https://github.com/kubeflow/katib/blob/ea46a7f2b73b2d316b6b7619f99eb440ede1909b/examples/v1beta1/metrics-collector/custom-metrics-collector.yaml#L14-L36). - - `None`: Specify this value if you don't need to use Katib's metrics collector. For example, - your training code may handle the persistent storage of its own metrics. - 2. Write code in your training container to print or save to the file metrics in the format specified in the `.source.filter.metricsFormat` field. The default metrics format value is: @@ -79,3 +83,46 @@ To define the metrics collector for your Experiment: recall=0.55 precision=.5 ``` + +## Push-based Metrics Collector + +Your training code needs to call [`report_metrics()`](https://github.com/kubeflow/katib/blob/e251a07cb9491e2d892db306d925dddf51cb0930/sdk/python/v1beta1/kubeflow/katib/api/report_metrics.py#L26) function in Python SDK to record metrics. +The `report_metrics()` function works by parsing the metrics in `metrics` field into a gRPC request, automatically adding the current timestamp for users, and sending the request to Katib DB Manager. + +But before that, `kubeflow-katib` package should be installed in your training container. + +To define the push-based metrics collector for your Experiment, you have two options: + +- YAML File + + 1. Specify the collector type `Push` in the `.collector.kind` field. + + 2. Write code in your training container to call `report_metrics()` to report metrics. + +- [`tune`](https://github.com/kubeflow/katib/blob/master/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py#L166) function + + Use tune function and specify the `metrics_collector_config` field. You can reference to the following example: + + ``` + import kubeflow.katib as katib + + def objective(parameters): + import time + import kubeflow.katib as katib + time.sleep(5) + result = 4 * int(parameters["a"]) + # Push metrics to Katib DB. + katib.report_metrics({"result": result}) + + katib.KatibClient(namespace="kubeflow").tune( + name="push-metrics-exp", + objective=objective, + parameters= {"a": katib.search.int(min=10, max=20)} + objective_metric_name="result", + max_trial_count=2, + metrics_collector_config={"kind": "Push"}, + # When SDK is released, replace it with packages_to_install=["kubeflow-katib==0.18.0"]. + # Currently, the training container should have `git` package to install this SDK. + packages_to_install=["git+https://github.com/kubeflow/katib.git@master#subdirectory=sdk/python/v1beta1"], + ) + ```