You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the solution you'd like
[A clear and concise description of what you want to happen.]
Metrics being targeted to track by HTTP endpoint in Prometheus metrics format:
Report katib metrics:
Katib-controller metrics such as Total number of reconcilation errors per controller, Length of reconcile queue per controller, Reconcilation latency,Usual resource metrics such as CPU, memory usage, file descriptor usage,Go runtime metrics such as number of Go routines, GC duration (the default metrics support has been added in controller-runtime 0.1.8+) Enable prometheus metrics for katib-controller #717
The text was updated successfully, but these errors were encountered:
v0.1.18 controller-runtime had introduced prometheus metrics to internal controller, and v0.1.18 controller-runtime depends on kubernetes-1.12.3 package api as here
for now pytorch-operator and tf-operator depend on kubernetes-1.11.2
If we upgrades controller-runtime to v0.1.18 or above in kabit, it will reduce much effort of this feature. controller-runtime v0.1.18 has conflict dependence with pytorch-operator and tf-operator, which katib depends on both.
/kind feature
Describe the solution you'd like
[A clear and concise description of what you want to happen.]
Metrics being targeted to track by HTTP endpoint in Prometheus metrics format:
Report katib metrics:
The text was updated successfully, but these errors were encountered: