Datadog Logs sink sacrifices 750KB on payload size for throughput and we'd like to avoid that sacrifice. #9202
Labels
domain: performance
Anything related to Vector's performance
sink: datadog_logs
Anything `datadog_logs` sink related
type: enhancement
A value-adding code change that enhances its existing functionality.
In the Datadog logs sink we have to obey the constraint imposed by the Datadog Logs API that uncompressed, serialized payloads must not be larger than 5MB. The payloads are serialized as JSON and, while we know how large our in-memory
Event
instances are in bytes and, via the Batchervector/src/sinks/datadog/logs/sink.rs
Lines 280 to 292 in ee912a2
BATCH_GOAL_BYTES
vector/src/sinks/datadog/logs/sink.rs
Line 46 in ee912a2
Event
instances down through our code we don't rightly know what the serialized size will be. To avoid making payloads that are too large we have setBATCH_GOAL_BYTES
well below the 5MB limit.We'd like to avoid that, if possible. 750KB is not nothing.
Previously this sink used a partitioning scheme for incoming
Event
instances that calledto_raw_value
on every incomingEvent
. Very useful because we knew exactly how large the JSON serialization was -- we had it -- but CPU intensive. We have considered solutions that would allow us to streamEvent
s into a JSON writer but our implementation currently is based offserde_json
which doesn't offer that kind of interface, with feedback on bytes written. We also have the slight difficulty that our tower based service stack here operates in a request/response fashion, meaning we can't easily streamEvent
instances up to some threshold and push back otherwise. A serdeRawValue
is likewise not minified, leaving some bytes on the table there as well. The Datadog Logs API accepts logs out-of-order so maintaining order in the sink is a non-goal.What we would like to be able to achieve, all this said, are Datadog Logs payloads that are closer to 5MB without going over than we presently achieve without having lower throughput than the current method. Any solutions that we've come up with while shooting the breeze are essentially variants on bin-packing but with the serious limitation that we can't know the true size of an item to be binned without doing an expensive computation on it (the serialization).
The text was updated successfully, but these errors were encountered: