Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datadog Logs sink sacrifices 750KB on payload size for throughput and we'd like to avoid that sacrifice. #9202

Closed
Tracked by #9383
blt opened this issue Sep 16, 2021 · 2 comments
Labels
domain: performance Anything related to Vector's performance sink: datadog_logs Anything `datadog_logs` sink related type: enhancement A value-adding code change that enhances its existing functionality.

Comments

@blt
Copy link
Contributor

blt commented Sep 16, 2021

In the Datadog logs sink we have to obey the constraint imposed by the Datadog Logs API that uncompressed, serialized payloads must not be larger than 5MB. The payloads are serialized as JSON and, while we know how large our in-memory Event instances are in bytes and, via the Batcher

let batcher = Batcher::new(
input,
EventPartitioner::default(),
self.timeout,
NonZeroUsize::new(MAX_PAYLOAD_ARRAY).unwrap(),
NonZeroUsize::new(BATCH_GOAL_BYTES),
)
.map(|(maybe_key, batch)| {
let key = maybe_key.unwrap_or_else(|| Arc::clone(&default_api_key));
let request_builder = RequestBuilder::new(encoding.clone(), compression, log_schema);
tokio::spawn(async move { request_builder.build(key, batch) })
})
.buffer_unordered(io_bandwidth);
know that we'll only ever move BATCH_GOAL_BYTES
const BATCH_GOAL_BYTES: usize = 4_250_000;
worth of Event instances down through our code we don't rightly know what the serialized size will be. To avoid making payloads that are too large we have set BATCH_GOAL_BYTES well below the 5MB limit.

We'd like to avoid that, if possible. 750KB is not nothing.

Previously this sink used a partitioning scheme for incoming Event instances that called to_raw_value on every incoming Event. Very useful because we knew exactly how large the JSON serialization was -- we had it -- but CPU intensive. We have considered solutions that would allow us to stream Events into a JSON writer but our implementation currently is based off serde_json which doesn't offer that kind of interface, with feedback on bytes written. We also have the slight difficulty that our tower based service stack here operates in a request/response fashion, meaning we can't easily stream Event instances up to some threshold and push back otherwise. A serde RawValue is likewise not minified, leaving some bytes on the table there as well. The Datadog Logs API accepts logs out-of-order so maintaining order in the sink is a non-goal.

What we would like to be able to achieve, all this said, are Datadog Logs payloads that are closer to 5MB without going over than we presently achieve without having lower throughput than the current method. Any solutions that we've come up with while shooting the breeze are essentially variants on bin-packing but with the serious limitation that we can't know the true size of an item to be binned without doing an expensive computation on it (the serialization).

@blt blt added type: enhancement A value-adding code change that enhances its existing functionality. sink: datadog_logs Anything `datadog_logs` sink related domain: performance Anything related to Vector's performance labels Sep 16, 2021
@jszwedko
Copy link
Member

Closing this since performance seems sufficient even with the sacrifice; we can let future profiling surface it again if needed.

@lukesteensen
Copy link
Member

I was looking into this a bit today as part of #10020 and very quickly ran into the API limitation of 1000 events per payload. With that in mind, the 5MB limit really only comes into play if your log messages are approaching 5KB on average, which seems to me is quite a bit larger than would be common in the wild. We can certainly test with logs of that size, but it does make me question how big of an impact any work here would really have. It seems likely that we're hitting the event limit far more often than the byte limit.

lukesteensen added a commit that referenced this issue Jan 24, 2024
Closes #9202

Since #19189 was merged, this heuristic to try to avoid oversized requests is
no longer necessary from a correctness point of view. The only potential reason
to keep it would be if we expected oversized batches to be common, which could
mean a performance impact if the new batch-splitting code is triggered more
often to avoid oversized requests.

Another option would be to simply reduce buffer we leave ourselves between the
goal and the max, but any analysis of the best value would be entirely
dependent on the format of the event data.

Signed-off-by: Luke Steensen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: performance Anything related to Vector's performance sink: datadog_logs Anything `datadog_logs` sink related type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants