-
Notifications
You must be signed in to change notification settings - Fork 17
Look into changing how the shipper recieves gRPC connection config #225
Comments
I think you are missing a reason, and it is that status is reported per unit. The current set of units provided to the shipper is perhaps awkward from the perspective of configuring the shipper, but seems ideal for reporting status back to the agent and user. We can report the state of the output (ES, LS, Kafka) separately which we always need, but we could also report the connection state of each individual input back to the shipper. This allows us to report only the unit whose connection has timed out as failed, rather than reporting the shipper gRPC server as a whole as failed. For status reporting reasons I think we always want at least two units, one for the input side of the shipper and one for the output side. As you suggested we could combine each of the input units together, but I would only do this if we are convinced it is not valuable or possible to report the status of each expected input connection back to the agent. I at least am not convinced there is no value in this, because quickly scanning the state.yaml file in the diagnostics or the output of
The shippers' only job is to be a queue and publish events to an output, so the majority of the time failures in the output should be transient and we should let the queue fill up and report the output unit as FAILED or DEGRADED.
In my opinion the gRPC state is only about connectivity. Is an input successfully authenticated, connected, and trying to publish events? If yes then the gRPC interface is healthy regardless of if the events are accepted into the queue, and the output unit can capture the rest.
As developers I believe we will. The fine grained status reporting makes the state of the agent much easier to debug, as you can scan a single file and see what the current error condition is. |
The team spoke today and our current conclusion is that we want to simply to a single input unit that represents the shipper's gRPC server. That would leave the shipper with one input unit (for the gRPC server) and one output unit (for ES/LS/Kafka). |
@cmacknz / @leehinman could we close this one with the new shipper plan? |
This can be closed. The new design works with the existing configuration strategy. elastic/beats#35135 (comment) We can open a follow up if this needs to change later. |
Right now, the shipper receives it's initial config in two pieces:
This is a bit of an awkward process for two reasons:
There's two ways to make this a little cleaner:
This will probably generate some discussion on if the shipper should receive pipeline config across multiple units. As far as I can tell, there's only one use case for this: restarting the output while the server and queue remains up. Right now the shipper server isn't designed to operate like this, and I'm not sure how we should prioritize such a feature. Such a feature doesn't necessarily require multiple units, as the shipper could just diff the subsections of the config to see if only the output has changed. Having the shipper pipeline config across two units also creates a number of state questions: should we shut down the queue and output while the gRPC connection remains up? Does a failure of the queue/output also imply that gRPC is in a failed state, as no events can be ACK'ed? From the context of error reporting, will a user find this distinction meaningful?
The text was updated successfully, but these errors were encountered: