-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instrument sink batching #9719
Comments
This would be a very welcome feature for our team. Specially coupled with the recent instrumentation for Buffers. For Batches, probably these metrics should at least be considered:
Due to concurrency and each http request having its own batch to send, maybe instead of gauges they could be histograms, but I'm not sure what you think is better. I don't think batches have the notion of In addition, it would be very useful to also export the configured |
Porting over some of the details from a duplicate ticket that I listed, these are the metrics I would want to see come out of any work to add metrics to the batching process:
(Some of these overlap with @hhromic's comment, obviously.) |
hi! any updates here? this type of metric would be super-useful to us. |
No, we have not yet prioritized this work. |
Suggestions from a user here: #20284 |
I think also docs for buffering and batching should be extended as it is also very difficult to understand relation between buffer and batch limits. If user configures max batch size to be larger than buffer size, what will happen? If there's direct relation, then Vector should throw a warning if buffer size is lower than batch limit and should recommend buffer size being at least same as batch size or in multiples of batch size (eg. batch size * 2 to have some read-ahead from sources). Also how does this work when ARC or concurrency > 1 is being used (then it should be batch size * expected max concurrency). And last thing, if user has one source (eg. kafka) and multiple sinks (ES, S3) with different buffer and batch size. Will sink with smaller buffer throttle the other one? |
Agreed, the docs could be expanded. Putting some responses here in the meanwhile.
Sink buffers are decoupled from batching. That is: the buffer just feeds events into the sink as it gets them, and as the sink fetches them. The sink then batches those events in-memory. There is this diagram that might help: https://vector.dev/docs/reference/configuration/sinks/vector/#buffers-and-batches
Again the buffers are decoupled from the in-memory batching so the buffer size doesn't need to be related to the batch size. You can expect one batch to be created per concurrency though.
If the small buffer is full, yes, it will apply back-pressure before the larger buffer does. Again, though, batching is done in memory and is decoupled from buffering. |
@jszwedko thank you, that is what I barely remember from Discord discussion some time ago. Still I think it might be beneficial for buffer to be larger than batch size to have some read-ahead depending on the source, correct? |
The Thinking about it a bit more, for in-memory buffers, I think I could see it being beneficial to have the buffer be at least as big as the batch size multiplied by the concurrency so that the next set of requests could be buffered in memory while the current set is in flight. For disk buffers, I think having it be 2x would be beneficial since data isn't "deleted" from disk buffers until the sink delivers it. |
Is there any news of the current status for this feature? This metric would be also really beneficial for us. |
Following on our buffer instrumentation, we should also look to instrument batches. Given that batches are flushed on a number of conditions additional insight is helpful for operators to optimize their pipelines.
batch sizing and number in-flight, etc
The text was updated successfully, but these errors were encountered: