Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creation of healthz service for gNOI #48

Merged
merged 5 commits into from
Apr 21, 2021
Merged

Creation of healthz service for gNOI #48

merged 5 commits into from
Apr 21, 2021

Conversation

marcushines
Copy link
Contributor

No description provided.

healthz/BUILD Outdated Show resolved Hide resolved
healthz/healthz.proto Outdated Show resolved Hide resolved
@marcushines marcushines force-pushed the hines-healthz branch 6 times, most recently from da425e8 to 133037a Compare April 2, 2021 17:27
service Healthz {
// Get will get health status for a gNMI path. If no status is available for
// the requested path an error will be returned.
rpc Get(GetRequest) returns (GetResponse) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see this. This service seems like something we've really wanted better defined for the optical space for a while. One of our main use cases is collecting a state-dump from the device. Essentially, most optical vendors offer an ability in which the device internally compiles a large file (i.e. 100MB) either for the entire device or a specific component and then we send that to the vendor.

Today, we are using gNOI.File.TransferToRemote in a bit of a hacky way (using a magic path essentially from the vendor) to initiate the compiling of the state-dump and transferring it. We are planning to switch this to use gNOI.File.Get in upcoming releases so we can stream the data instead.

With that background, would it make sense to have this RPC be more specific so that we could expand this service with additional use cases? One thought would be to switch Get --> Component, then add a new RPC which could stream a file back (perhaps in a future PR).

service Healthz {
rpc Component(ComponentRequest) returns (ComponentResponse) {}
rpc DebugFile(DebugFileRequest) returns (stream DebugFileResponse) {}
}

If this use case doesn't fit well, no problem, just wanted to see. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the proto.Any definition allows for the vendor to decide what types of information are relevant - this could be "debug files" or any other data.

as to if we wanted to allow for "chunking" of the data seems like chunking it by component vs. providing a streaming solution would be easier but we could iterate on that approach if you really feel building a single "response" is too big for a general use case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of just providing a large file back, I think wrapping it in a proto.Any might not be ideal here as we or a vendor would have to define a new proto definition which really only needs to contain one field that would be of type bytes. Certainly do-able, but it's potentially an avoidable extra layer. I think a streaming solution that streams bytes directly (and concludes with a hash for verification) could be a nice simple solution. In my case, these are just large files that we have no insight into what the details are (we are just getting a file to pass along to vendor developers).

I don't mean to hold up this PR, so feel free to proceed as is, just wanted to share as this has been a use case I've been wanting to see get addressed and this service seems like it might be the right place :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to define the point would the vendor can decide what data is relevant and then package it however they want.

Streaming doesn't really "fix anything" as you still have to have the data locally and then send it over the rpc.. the only "optimization" is in the memory needed to serialize the data in memory then write it - we already have a file operation for streaming a file off a box. The proto they define could very well be to be file pointer that can be used to then fetch the file over the file api.

The other implications would be a component by component health check, would be you can say only get me the diagnostic information relevant to the component vs. "show tech" which dumps state about everything.

service Healthz {
// Get will get health status for a gNMI path. If no status is available for
// the requested path an error will be returned.
rpc Get(GetRequest) returns (GetResponse) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golang gRPC implementation has a problem with multiple services using the same name of a RPC call. It would help if it were changed to, say, GetHelthz().

Copy link
Contributor Author

@marcushines marcushines Apr 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you point me at an example of that issue?
It seems fundamentally broken and I would like someone on the gRPC / go teams to comment
If I register two services on to the mux - they better be namespaced by service

Obviously, trying to register two services with the same name should fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

talked internally this is really just a design possible issue where registering multiple services to the server struct may lead to having two service rpc methods trying to registered against that single struct which cannot provide overloading in go - the pattern should be that services are implemented via concrete structs and the server should be a composite of those service structs

Copy link
Contributor

@robshakir robshakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work on this!

@robshakir robshakir merged commit 3e9a99f into master Apr 21, 2021
@marcushines marcushines deleted the hines-healthz branch October 14, 2021 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants