-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak #402
Comments
Were you getting the same memory leak before upgrading to 0.6.1? Which version was used before if any? Do you do custom tracing at all or only rely on the out of the box integrations? Heap dumps would definitely be useful. I would recommend to open a support ticket and submit the heap dumps there. Also, can you try updating to 0.7.2 to see if it fixes the issue? There is also more logging in that version to detect spans that are not properly finished. About outstanding memory leaks, the only one I'm aware of is a bug in Node that affects HTTP clients with keep-alive when used with |
No custom tracing, just out of the box integration for now.
This actually might be the issue. I just took a look at the heapdump diffs again, and it's definitely a list of Any lights you can shed on this known issue? Any mitigations? |
Is this the issue you're referencing? nodejs/node#19859? Curious to why this only affects 2 of our services, and why it's not more widespread. |
Yes, that's the issue I was referencing. It depends on quite a few factors to trigger the leak, so it's possible that only a few services are affected. Of course this is only a theory at this point. A fix was released for this specific bug in Node Can you try to redeploy with an updated version of Node? |
Thanks! I updated one of the affected services with
Could you elaborate on that if you have some info? I can try to generate a new heapdump diff with this new version of node, but I'm expecting to see similar behavior. |
I meant to trigger the leak from the Node bug, but since you updated it means this is not the issue. Were you using a version of If you could share the actual heap dump it would be very helpful. You can send it to me on our public Slack, but I would suggest opening a support ticket as well (you can give me the ticket number on Slack too). One thing that could be interesting to try to narrow down the issue would be disabling some of the plugins to see if it makes a difference. |
I found the same issue. In my case, I updated dd-trace from |
@tomonari-t Are you saying you didn't have any issue before 0.7.2? Most reports so far seem to indicate the issue was present in at least 0.6.0. |
@rochdev Yes. When I used |
@tomonari-t Can you try disabling plugins for promise libraries? So far it seems the leak is caused by the scope manager, which could have an impact on these plugins specifically if that's the case. Since these plugins were added in 0.7.0, it would explain why you are just now getting the issue. The plugins that would have to be disabled are |
@terryma @tomonari-t Can you try with 0.9.0? We have rewritten some core components that were prone to memory leaks. |
Just want to add that we're also seeing clear memory leaks with 0.8.0. Will try with 0.9.0 this week and report back. |
Unfortunately we're still seeing memory issues in one of our applications with 0.9.3. I'll try again with some of the plugins disabled later this week. |
In general the most probable causes of leaks are:
@Chris911 Are you doing manual instrumentation? |
One other thing to keep in mind is some plugins are required to avoid memory leaks. If you are using |
We do use |
@Chris911 Can you try without Another option would be to enable these plugins if any of the corresponding module is used either directly or transitively: |
@Chris911 Also, we are thinking of deprecating the |
When we did our initial testing of the APM a few weeks back we were getting memory leaks on our dev environments. They went away when we disabled all plugins and only enable 1-2. I'll try with Also to answer your previous question we're not doing any custom instrumentation. |
We enabled |
@freewil Which version were you using before? Also, which supported modules are you using? Could you provide a snippet of the tracer initialization? |
We weren't using dd-trace before. Non-native modules in use:
const ddTrace = require("dd-trace")
// https://datadog.github.io/dd-trace-js/#tracer-settings
ddTrace.init({
enabled: true,
debug: false,
service: 'REDACTED',
hostname: 'REDACTED',
port: 'REDACTED',
env: 'REDACTED'
}) |
I will try to add an option for tomorrow's release so you can disable individual plugins. This will allow you to try without individual plugins to try and pinpoint the issue. I'll keep you posted when it's available. |
Cool, thanks! |
Any ETA on this? Was this an option that was removed in a recent release? It appears you can/could do this in the code examples above with |
@freewil The release that was planned last Friday was pushed back and I don't have an ETA right away, but I did release a beta as The reason I prefer to not recommend going the I would say to try |
Ok - thanks for the update and info! |
It would help. We will also release a new version next week that will have a lot of insights into what is happening in the runtime. I'll keep this thread posted, but it should help determine what is causing the leak. |
I encountered the memory leak quickly, only took a few days before Node.js processes crash out of memory. Not running the mysql integration though, running mongodb integration. |
Any updates on this memory leak? Just deployed dd-trace 0.10.2 on Node 8 and I'm still seeing a memory leak. |
It looks like there is not a single cause of memory leak but multiple. I'm currently working on fixing some of them. This is what we found so far:
We have also added a new Runtime Metrics feature which can help us a lot to pinpoint the source of the leak. The feature is in beta and has to be enabled by us for your account. If you want me to investigate your account specifically using Runtime Metrics, please let me know on our public Slack. My handle is rochdev. The above is also valid for everyone in this thread. Finding the source(s) of the memory leak is high priority for us, but it's very difficult to find without working directly with users get access to the account, so definitely reach out in Slack if you want to work with us on this. |
A memory leak in the TCP integration that would impact other integrations has been fixed in 0.11.2, and we have added workarounds for known issues in Node that would cause memory leaks as well in 0.12.0. I would recommend to anyone in this thread to upgrade the tracer to 0.12.0 and report back if the memory leak is still present or not. |
@rochdev I just enabled the Node.js tracer back in our application running I do however see recent activity in APM => Trace list. |
@nodesocket If you haven't been using the tracer for a while, the service list was reset. New services might take a few minutes to appear. |
@rochdev been over 15 minutes, should I wait longer or is something wrong? |
@nodesocket Usually it takes around 5-10 minutes, but we've been experiencing unusual delays today, so it's possible it will take more time. In general if you can see the traces and they have the correct service on them the services should eventually appear. This delay only exists when a new service is added, which is every service when you weren't previously using the tracer. |
I can confirm a pretty significant memory leak is preset in 0.12.1. We deployed a portion of our cluster with the agent disabled with the following results. Using the following code if (process.env.DATADOG_AGENT_HOST) {
const enabled = Math.random() > 0.5 ? true : false;
tracer.init({
hostname: process.env.DATADOG_AGENT_HOST,
enabled: enabled,
runtimeMetrics: enabled,
env: process.env.ENVIRONMENT,
tags: ['app:name', `environment:${process.env.ENVIRONMENT}`],
sampleRate: 0.1
plugins: false,
});
} We're using Node 11.15.0 for this trace, but the issue persists on 10.16.0 also. We've determined that using the noop scope prevents the leak. As such we tried using the async-listener scope which also exhibited the same leak as the default async-hooks scope. |
@xzyfer At this point I would say there are 2 likely causes for leaks:
In both cases, we would need to take a look at the runtime metrics in your account to confirm what is happening. I would recommend to direct message me on our public Slack (I'm @rochdev) or open a support ticket. You can find the documentation to enable runtime metrics here. |
@rochdev interestingly it was the runtime metrics that pointed us to looking at the APM library itself. Note that the async resource counter is never decremented until the instance is OOM killed.
We thought this also so we disabled all plugins and run time metrics just in case. |
After digging deeper into |
This is a very high number of async resources. I have 2 questions that could help narrow down the issue:
|
1. The async resources are predominantly timeouts
2. Not sure how exactly to confirm this?
…On Tue., 9 Jul. 2019, 12:04 am Roch Devost, ***@***.***> wrote:
This is a very high number of async resources. I have 2 questions that
could help narrow down the issue:
1. Which async resources have the highest count?
2. Is there a high count of unfinished spans on the heap or are they
almost all finished?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#402?email_source=notifications&email_token=AAENSWGOBBMFKQ2T65EFTULP6NCP7A5CNFSM4GM4DZOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZNF5BI#issuecomment-509238917>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAENSWH2TWU47DJFJZLQQJLP6NCP7ANCNFSM4GM4DZOA>
.
|
|
|
I think the timeouts Async resource refers to network request timeouts to downstream |
I've continuing to augment our reporting. The current candidate looks like node-fetch's timeout handling |
Ultimately it looks like pm2 is the root cause. There is a known memory leak that was recently patched in https://github.com/keymetrics/pm2-io-apm/releases/tag/4.2.2. Even with this patch applied we're experiencing a leak but much slower than previously. Left: @pm2/[email protected] |
Perfect, this means that it's not spans holding async resources but the other way around.
Does it still seem to be caused by ever increasing async resources? |
Actually I just noticed that the middle line is stable without PM2. This seems to indicate that there is still memory leak in the latest release of PM2 and not in |
I noticed that the async hooks are being enabled but never disabled. Is this intentional? https://github.com/DataDog/dd-trace-js/blob/master/packages/dd-trace/src/scope/async_hooks.js#L31 I'm seeing memory leaks happen only in my jest test environment (node 8.16, dd-trace 0.13.1, jest 23) when watching for changes; it seems that restarting the tests repeatedly (including dd-trace setup) will eventually cause |
@brian-l This might be because of the way Jest works. For example, if it clears the require cache while the tracer is still active, it will just keep adding new tracers constantly without cleaning up the old one, taking up more and more memory. This could be addressed by adding a way to enable/disable the tracer at runtime, which could be added after each run for example. This will be available in the coming weeks. It's possible that it's caused by something else, but in a test environment I would say this is the most likely cause. We will be able to validate this once the functionality to disable lands. In the meantime, I would recommend to simply disable the tracer in your tests by using the |
that didn't change anything unfortunately, however this investigation has revealed other memory leaks that need to be fixed as well so our issues are not caused by this library as far as I can tell. thanks. |
We've had several reports that the issue is fixed completely, and no new report of memory leaks for several months at this point, so closing this issue. |
We're running dd-trace 0.16.1, and can confirm that there is definitely a memory leak. We tried disabling plugins, but disabling tracing completely is the only thing that stopped our services from running out of memory and crashing. Should I log a new, separate issue? |
@devillexio Please open a new issue. It will make it easier to keep any underlying causes separated. |
We've deployed 0.6.1 across our production service fleets. Out of over a dozen services, we're seeing memory leaks with 2 services. All services are more or less node.js/express based with the following configuration for dd-trace:
Are you aware of any outstanding memory leak issues? I've taken two separate heapdumps with a delta containing the leak, showing a lot of
DatadogSpans
being retained. I can provide the heapdumps to help troubleshoot.The text was updated successfully, but these errors were encountered: