Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

How to auto monitor the status of tasks. #3602

Closed
IamSunGuangzhi opened this issue Sep 10, 2019 · 7 comments
Closed

How to auto monitor the status of tasks. #3602

IamSunGuangzhi opened this issue Sep 10, 2019 · 7 comments

Comments

@IamSunGuangzhi
Copy link

IamSunGuangzhi commented Sep 10, 2019

Short summary about the issue/question:
I wan to auto monitor the status of tasks. When the tasks happen error, I can get the error right away. And don`t need add the alert manually, when i submit a job. For example, pai can support the alert of dingding。

OpenPAI Environment:

  • OpenPAI version: v0.14.0
  • OS (e.g. from /etc/os-release): ubuntu 16.04
  • Hardware (e.g. core number, memory size, storage size, GPU type etc.): titan Xp

Anything else we need to know:
NULL

@IamSunGuangzhi IamSunGuangzhi changed the title How use grafana to monitor the status of tasks. How use grafana to auto monitor the status of tasks. Sep 10, 2019
@IamSunGuangzhi
Copy link
Author

how to support the task error alerts!
grafana? alertmanager?

@IamSunGuangzhi IamSunGuangzhi changed the title How use grafana to auto monitor the status of tasks. How to auto monitor the status of tasks. Sep 12, 2019
@scarlett2018
Copy link
Member

@IamSunGuangzhi - thanks for raising the feature request. is this request for an PAI end user's daily training job or for a PAI admin?

@IamSunGuangzhi
Copy link
Author

Thanks for your reply, @scarlett2018 . This request is for an PAI end user's daily training job. Because PAI is training platform. PAI can auto monitor the status of tasks, which facilitates task debugging.

@scarlett2018
Copy link
Member

OpenPAI does not have plan to support dingding alike IM integration. But we could think of providing status change subscription for email address, or status feed, etc. Adding to the backlog for feature design and discussion first.

@yqwang-ms
Copy link
Member

Reasonable feature request, but it is not in our planning yet.
We may leverage https://www.elastic.co/what-is/elasticsearch-alerting, alertmanager, or something like https://github.com/bitnami-labs/kubewatch.

For now, you can achieve this by yourself, such as, polling the RestServer and send alert on some conditions.

@IamSunGuangzhi
Copy link
Author

OK, thanks @scarlett2018 @yqwang-ms . I will try.

@scarlett2018
Copy link
Member

Thanks, closing the issue as answer had been taken. Please few free to reopen if you meet any issue while applying the suggestions.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants