Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use consistent identity not per-request UUID for worker IDs #3801

Conversation

agrski
Copy link
Contributor

@agrski agrski commented Dec 10, 2021

What this PR does / why we need it:
All the metrics from the alibi-detect-server are unable to be used as-is for monitoring and alerting across the span of multiple events.

The use of a UUID means every request is treated as a new timeseries by Prometheus because the labels are different every time.
Furthermore, it means metrics, whether gauges or counters, can never be reset, which is painful for alerting.

The default worker ID function--the process ID for the worker--will be consistent across requests, which mitigates both of the aforementioned issues.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Fixed alibi-detect-server metrics to allow effective alerting by providing a consistent set of labels across every event.

The use of a UUID means every request is treated as a new timeseries
by Prometheus, because the labels are different every time.
Furthermore, it means metrics, whether gauges or counters,
can never be reset, which is painful for alerting.
The default worker ID function--the process ID for the worker--
will be consistent across requests, which mitigates both the
aforementioned issues.
@seldondev
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign axsaucedo
You can assign the PR to them by writing /assign @axsaucedo in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@axsaucedo
Copy link
Contributor

Nice one
/test integration

@axsaucedo axsaucedo merged commit 34021ee into SeldonIO:master Dec 14, 2021
@seldondev
Copy link
Collaborator

@agrski: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
integration e999345 link /test integration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository. I understand the commands that are listed here.

@agrski
Copy link
Contributor Author

agrski commented Dec 16, 2021

/test integration

@agrski
Copy link
Contributor Author

agrski commented Dec 16, 2021

From testing locally, I can't see anything wrong so not sure why the integration tests complained before:

$ make test
mypy --ignore-missing-imports adserver
Success: no issues found in 29 source files
pytest -W ignore adserver
==================================================================== test session starts ====================================================================
platform linux -- Python 3.8.12, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: <...>/seldon-core/components/alibi-detect-server
plugins: tornasync-0.6.0.post2, requests-mock-1.9.3
collected 14 items                                                                                                                                          

adserver/tests/test_ad_model.py ..                                                                                                                    [ 14%]
adserver/tests/test_cd_model.py ...                                                                                                                   [ 35%]
adserver/tests/test_cm_model.py ...                                                                                                                   [ 57%]
adserver/tests/test_od_model.py ..                                                                                                                    [ 71%]
adserver/tests/test_server.py ....                                                                                                                    [100%]

============================================================== 14 passed in 107.74s (0:01:47) ===============================================================

@agrski agrski deleted the fix-alibi-detect-server-metrics-not-resetting branch December 16, 2021 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants