Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics sent via sendDistributionMetric delayed by several minutes #581

Open
ribaptista opened this issue Mar 7, 2025 · 5 comments
Open

Comments

@ribaptista
Copy link

ribaptista commented Mar 7, 2025

Expected Behavior

I expect the metric values sent via sendDistributionMetric to be immediately reflected in Datadog, without delays. The sum of the values should match what is sent during the Lambda invocations in real time.

Actual Behavior

The metric values are delayed in Datadog and do not appear immediately after they are sent. For example, after 100 invocations of my Lambda function, the sum should be 500 (5 per invocation). However, the sum is roughly 475, and the remaining 25 appears several minutes later in Datadog, associated with a later timestamp, not the time the metrics were actually emitted.

Is there a way to force the Lambda function to flush metrics immediately after each invocation?

Steps to Reproduce the Problem

Set up a Lambda function with serverless-plugin-datadog version 5.83.0 and configure the following environment variables:

DD_TRACE_OTEL_ENABLED: false
DD_PROFILING_ENABLED: false
DD_SERVERLESS_APPSEC_ENABLED: false

Import datadog-lambda-js in your Lambda function and call sendDistributionMetric after each invocation.
Observe that the sum of the metrics in Datadog does not immediately match the values sent, and there is a delay in the appearance of the missing values, which appear several minutes later.

Specifications

Serverless Framework version: 3.39.0
Datadog Serverless Plugin version: 5.83.0
Lambda function runtime: nodejs20.x
@astuyve
Copy link
Contributor

astuyve commented Mar 7, 2025

Hi @ribaptista thanks for reaching out!

Historically the lambda extension didn't support timestamped metrics. We've actually just added this feature last week, but it'll take a bit of time before it makes it into datadog-lambda-js and subsequently this library.

To answer your specific question, the timestamp for metrics has historically been set to when aggregation and flushing is performed. Yes, you can specify DD_SERVERLESS_FLUSH_STRATEGY: end which will flush metrics after every invocation.

That said, we're about to roll out a change in v73 which aggregates metrics into timestamped buckets based on when they are written to the extension (or the timestamp sent to the extension). This should fix the issue you're seeing without necessarily adjusting the flush strategy.

I hope this is helpful!

@lym953
Copy link
Contributor

lym953 commented Mar 7, 2025

After @astuyve released extension v73, I just released plugin v5.84.0. @ribaptista did this solve the problem?

@ribaptista
Copy link
Author

Hi @astuyve!

Thanks for the quick response.
Looking forward to the new feature!
Well, regarding to DD_SERVERLESS_FLUSH_STRATEGY, I tested with end and my function started crashing with:


thread 'main' panicked at library/std/src/time.rs:417:33:
overflow when adding duration to instant
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
RequestId: 0f7041ce-b69f-51e2-b4c0-21887f5870fb Error: signal: aborted
Extension.Crash

Then I tested with periodically,500 and although the metrics appeared a bit more evenly spaced than before I set this environment variable, there were still 10 values that were only flushed at the function spindown (see below).

Image

Can it be the case the library only take this strategy configuration as a "hint", not following it at all times?

@ribaptista
Copy link
Author

Hi @lym953!
Wow, that's great! Let me test it right now

@ribaptista
Copy link
Author

ribaptista commented Mar 8, 2025

After @astuyve released extension v73, I just released plugin v5.84.0. @ribaptista did this solve the problem?

I upgraded to v5.84.0 and tested with DD_SERVERLESS_FLUSH_STRATEGY=periodically,500.
Although the flush behavior was the same as the test with the previous version (it kept 5 values and only flushed them at the function spindown), it honored the original timestamp of these 5 remaining values!!! See screenshot below with cumsum enabled. The last bar only summed to 500 when the function reached spindown (it was 495 before).

Image

This is more than enough for me! I don't need realtime flushing at all times (it is ok if some values are retained until the function spindown, as long as they arrive with the correct timestamp).

Thank you very much @astuyve @lym953!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants