Datadog Logs sink sacrifices 750KB on payload size for throughput and we'd like to avoid that sacrifice. #9202

blt · 2021-09-16T22:20:22Z

In the Datadog logs sink we have to obey the constraint imposed by the Datadog Logs API that uncompressed, serialized payloads must not be larger than 5MB. The payloads are serialized as JSON and, while we know how large our in-memory Event instances are in bytes and, via the Batcher

vector/src/sinks/datadog/logs/sink.rs

Lines 280 to 292 in ee912a2

    
           let batcher = Batcher::new( 
        
               input, 
        
               EventPartitioner::default(), 
        
               self.timeout, 
        
               NonZeroUsize::new(MAX_PAYLOAD_ARRAY).unwrap(), 
        
               NonZeroUsize::new(BATCH_GOAL_BYTES), 
        
           ) 
        
           .map(|(maybe_key, batch)| { 
        
               let key = maybe_key.unwrap_or_else(|| Arc::clone(&default_api_key)); 
        
               let request_builder = RequestBuilder::new(encoding.clone(), compression, log_schema); 
        
               tokio::spawn(async move { request_builder.build(key, batch) }) 
        
           }) 
        
           .buffer_unordered(io_bandwidth);

know that we'll only ever move BATCH_GOAL_BYTES

vector/src/sinks/datadog/logs/sink.rs

Line 46 in ee912a2

const BATCH_GOAL_BYTES: usize = 4_250_000;

worth of Event instances down through our code we don't rightly know what the serialized size will be. To avoid making payloads that are too large we have set BATCH_GOAL_BYTES well below the 5MB limit.

We'd like to avoid that, if possible. 750KB is not nothing.

Previously this sink used a partitioning scheme for incoming Event instances that called to_raw_value on every incoming Event. Very useful because we knew exactly how large the JSON serialization was -- we had it -- but CPU intensive. We have considered solutions that would allow us to stream Events into a JSON writer but our implementation currently is based off serde_json which doesn't offer that kind of interface, with feedback on bytes written. We also have the slight difficulty that our tower based service stack here operates in a request/response fashion, meaning we can't easily stream Event instances up to some threshold and push back otherwise. A serde RawValue is likewise not minified, leaving some bytes on the table there as well. The Datadog Logs API accepts logs out-of-order so maintaining order in the sink is a non-goal.

What we would like to be able to achieve, all this said, are Datadog Logs payloads that are closer to 5MB without going over than we presently achieve without having lower throughput than the current method. Any solutions that we've come up with while shooting the breeze are essentially variants on bin-packing but with the serious limitation that we can't know the true size of an item to be binned without doing an expensive computation on it (the serialization).

The text was updated successfully, but these errors were encountered:

jszwedko · 2022-12-29T23:45:17Z

Closing this since performance seems sufficient even with the sacrifice; we can let future profiling surface it again if needed.

lukesteensen · 2023-07-13T20:57:52Z

I was looking into this a bit today as part of #10020 and very quickly ran into the API limitation of 1000 events per payload. With that in mind, the 5MB limit really only comes into play if your log messages are approaching 5KB on average, which seems to me is quite a bit larger than would be common in the wild. We can certainly test with logs of that size, but it does make me question how big of an impact any work here would really have. It seems likely that we're hitting the event limit far more often than the byte limit.

Closes #9202 Since #19189 was merged, this heuristic to try to avoid oversized requests is no longer necessary from a correctness point of view. The only potential reason to keep it would be if we expected oversized batches to be common, which could mean a performance impact if the new batch-splitting code is triggered more often to avoid oversized requests. Another option would be to simply reduce buffer we leave ourselves between the goal and the max, but any analysis of the best value would be entirely dependent on the format of the event data. Signed-off-by: Luke Steensen <[email protected]>

blt added type: enhancement A value-adding code change that enhances its existing functionality. sink: datadog_logs Anything `datadog_logs` sink related domain: performance Anything related to Vector's performance labels Sep 16, 2021

binarylogic mentioned this issue Oct 5, 2021

Cache the byte size of events #9383

Closed

1 task

jszwedko closed this as completed Dec 29, 2022

jszwedko mentioned this issue Jul 6, 2023

Vector batch bytes limits are based on in-memory sizing of events #10020

Open

lukesteensen mentioned this issue Jan 24, 2024

chore(datadog_logs sink): allow batching up to full payload limit #19700

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datadog Logs sink sacrifices 750KB on payload size for throughput and we'd like to avoid that sacrifice. #9202

Datadog Logs sink sacrifices 750KB on payload size for throughput and we'd like to avoid that sacrifice. #9202

blt commented Sep 16, 2021

jszwedko commented Dec 29, 2022

lukesteensen commented Jul 13, 2023

Datadog Logs sink sacrifices 750KB on payload size for throughput and we'd like to avoid that sacrifice. #9202

Datadog Logs sink sacrifices 750KB on payload size for throughput and we'd like to avoid that sacrifice. #9202

Comments

blt commented Sep 16, 2021

jszwedko commented Dec 29, 2022

lukesteensen commented Jul 13, 2023