-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support partial Process Stats responses when the stats service (log-cache or metric-proxy) is unavailable #2250
Conversation
lib/cloud_controller/diego/reporters/instances_stats_reporter.rb
Outdated
Show resolved
Hide resolved
so far looks good to me. Is the non-idiomatic part of this you mentioned the multiple return values? If so I don't have any qualms with it. |
Yeah, that was mostly it. I don't think we've done much of it in the codebase so wanted opinions. I think it's better than trying to juggle the exceptions all the way up to the controller in this case, though. |
@sethboyles do you think we should go with the warning header instead of the |
I think for now that might be the better option. I am uncertain about the We are going to discuss warnings in general as a team tomorrow, but I don't want to block this PR--how about we go with the header warning for now, and if we decide later on to use the JSON field, we can put an issue for it in our backlog? |
Sounds good. I'll make that change. 👍 |
Validating StepsHere's how I tested this out manually. CF for VMs
$ cf restart dora
Restarting app dora in org org / space space as admin...
Stopping app...
Waiting for app to start...
Instances starting...
Instances starting...
Instances starting...
Instances starting...
Stats server temporarily unavailable.
Stats server temporarily unavailable.
Stats server temporarily unavailable.
Stats server temporarily unavailable.
Stats server temporarily unavailable.
name: dora
requested state: started
routes: dora.diamond-legend.capi.land
last uploaded: Thu 29 Apr 09:20:20 PDT 2021
stack: cflinuxfs3
buildpacks:
name version detect output buildpack name
ruby_buildpack 1.8.37 ruby ruby
type: web
sidecars:
instances: 1/1
memory usage: 256M
state since cpu memory disk details
#0 running 2021-04-29T16:22:54Z 0.0% 0 of 0 0 of 0
type: worker
sidecars:
instances: 0/0
memory usage: 256M
There are no running instances of this process. Directly hitting the endpoint returns: cf curl /v3/processes/8f79dd94-2f93-427c-b66c-9290b81db79d/stats
{
"resources": [
{
"type": "web",
"index": 0,
"state": "RUNNING",
"host": "10.244.0.138",
"uptime": 258,
"mem_quota": null,
"disk_quota": null,
"fds_quota": 16384,
"isolation_segment": null,
"details": null,
"instance_ports": [
{
"external": 61004,
"internal": 8080,
"external_tls_proxy_port": 61006,
"internal_tls_proxy_port": 61001
},
{
"external": 61005,
"internal": 2222,
"external_tls_proxy_port": 61007,
"internal_tls_proxy_port": 61002
}
],
"usage": {}
}
]
}
Even though the API is returning CF for KubernetesOn Kubernetes the CLI/API experience is the same, just the repro steps are a little different.
|
- this change rescues errors from log-cache and metric-proxy and passes them up as warnings to callers instead of exceptions so that the stats endpoint can continue to function on a best-effort basis - on /v3/processes/:guid/stats these warnings are included in the 'X-Cf-Warnings' header - addresses #2227 Authored-by: Tim Downey <[email protected]>
- this temporary config flag was used to ignore server unavailable errors coming back from log-cache (or metric-proxy in cf-for-k8s) - given the change proposed in #2227 to rescue more errors and serve partial stats responses this config is no longer necessary Authored-by: Tim Downey <[email protected]>
In #2227 we updated the Process Stats endpoint to be able to return partial responses (with a warning) when the stats server is unavailable. Authored-by: Tim Downey <[email protected]>
87be0c7
to
ed75cf6
Compare
@sethboyles I rebased the original commit to use the I'm gonna take this PR out of "draft" state now, I think it's ready. 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I don't quite understand why we ignore the errors in v2, although I'm guessing this is because v2 endpoints should not be modified and they don't support that warnings header?
**mem_quota** | _integer_ | The current maximum memory allocated for the instance | ||
**disk_quota** | _integer_ | The current maximum disk allocated for the instance | ||
**mem_quota** | _integer_ | The current maximum memory allocated for the instance; the value is `null` when memory quota data is unavailable | ||
**disk_quota** | _integer_ | The current maximum disk allocated for the instance; the value is `null` when disk quota data is unavailable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏 thank you for updating the docs!
LGTM!
@tcdowney Let me know if/when you add the warnings to v2, then I'll go ahead and merge this in. Thank you for the PR! |
Authored-by: Tim Downey <[email protected]>
19:04 $ cf curl /v2/apps/$(cf app dora --guid)/stats
{
"0": {
"state": "RUNNING",
"isolation_segment": null,
"stats": {
"name": "dora",
"uris": [
"dora.diamond-legend.capi.land"
],
"host": "10.244.0.138",
"port": 61004,
"uptime": 62,
"fds_quota": 16384,
"mem_quota": null,
"disk_quota": null,
"usage": {}
}
}
}
Stats server temporarily unavailable. Done! 👍 |
This change rescues errors from log-cache and metric-proxy and passes them up as warnings to callers instead of exceptions so that the stats endpoint can continue to function on a best-effort basis. On /v3/processes/:guid/stats these warnings are included in a new "warnings" key on the response, on v2 they are ignored. Alternatively/additionally we could add these to the
'X-Cf-Warnings'
header as @sethboyles mentioned on the issue.The related issue has more information: #2227
I have reviewed the contributing guide
I have viewed, signed, and submitted the Contributor License Agreement
I have made this pull request to the
main
branchI have run all the unit tests using
bundle exec rake
I have run CF Acceptance Tests