Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in resolver.cache for subscribe().once() #95

Closed
efurmantt opened this issue Dec 9, 2014 · 19 comments
Closed

Memory leak in resolver.cache for subscribe().once() #95

efurmantt opened this issue Dec 9, 2014 · 19 comments

Comments

@efurmantt
Copy link
Contributor

This is a very nice library. I have noticed a small issue however. When publishing in tandem with subscribe().once(), there appears to be a memory leak:

var postal = require('postal');
postal.subscribe({ channel: 'ch', topic: 't', callback: function() {}}).once();
postal.publish({ channel: 'ch', topic: 't', data: 'data' });
var cachedKeys = Object.keys(postal.configuration.resolver.cache).length;
console.log(cachedKeys)

In this example, one key is cached, even though no subscriptions remain.

Alternatively, consider this code:

var postal = require('postal');
for(var i = 0; i < 100; i++) {
     postal.subscribe({ channel: 'c', topic: 't' + 1, callback: function() {} }).once();
     postal.publish({ channel: 'c', topic: 't' + i, data: 'data' })
}
var cachedKeys = Object.keys(postal.configuration.resolver.cache).length;
console.log(cachedKeys)

In this example, a hundred keys are cached for subscriptions that are no longer active.

@ifandelse
Copy link
Contributor

Hi @efurmantt - the resolver.cache is just a cache for matching topics to bindings. It's not a memory leak in the technical sense of subscribers being held from garbage collection (the resolver's cache doesn't hold a ref to the subscription), but I agree that it's not ideal from a memory footprint perspective. This is a pretty simple fix, so I should have a patch out sometime tonight. Thanks so much for bringing it to my attention! 😄

@ifandelse
Copy link
Contributor

@efurmantt - I wanted to follow up on this. I understand the point of the example you provided, but I'm curious if you have a real-world use case where this has been an issue? The resolver's cache is part of how postal is optimized for publish speed. Saving a topic/binding comparison result prevents the need for a regex or string match check - and this is useful even in cases where no subscribers exist (e.g. - if it bails on further comparison checks b/c of a cached false result, etc.). Purging these too aggressively begins to undermine the publishing optimizations, since a normal channel/topic "topology" isn't comprised of topics created dynamically in a for loop (instead, you've have a common set of channels/topics that tend to be longer lived). At this point I'm looking at conditionally purging every n-number of unsubscribes, or purging when the resolver cache exceeds a certain length - and also allowing it to be done manually as well (rather than automated). A real world scenario where this is an issue could better inform which approach should be implemented.

@efurmantt
Copy link
Contributor Author

@ifandelse Thank you for your prompt reply. I am working on a codebase that is using postal in an idiosyncratic way. Essentially request/response has been built on top of the publishing mechanism. A requestor publishes to a well-known topic. Then a unique topic is created for the responder to send its response to the requestor. Over time, the cache grows as these unique topics are never removed.

I realize that postal is written with traditional pub/sub in mind and implementing a request/response mechanism over pub/sub is not what it was designed for.

There are other cases in this particular app where postal is used as-designed. Particular topics (for market data) are subscribed, and unsubscribed. This is dynamic and changes according to whatever the user is looking at. Over time this can cause the the cache to grow (but not as markedly as in the request/response case in this app).

For me a simple solution for me would be to set a cache size limit. I'm not sure what's best and I'm curious what you think about my particular case.

As you say, there are multiple ways to approach this problem. I would have opened a PR myself but I'm not sure what implementation you'd prefer.

@ifandelse
Copy link
Contributor

@efurmantt That's very helpful - I suspected if you were running into an issue like this in a real-world scenario, it was something along these lines. The code I have slated to go live soon in v0.12 will include the ability to 1.) manually purge the resolver cache (default), 2.) "compact" it on every unsubscribe or 3.) compact it on every 'n' number of unsubscribes. It's at least a start towards something that could help keep this from growing out of hand in memory in these kinds of dynamic scenarios.

Also - postal was intended to handle request/response - but I chose to put the behavior in an add-on, as opposed to the core lib, since it's a rare use-case compared to typical usage. You may or may not have checked this out, but in case any of it is helpful to you: postal.request-response.

Thanks!

@efurmantt
Copy link
Contributor Author

@ifandelse Cool, thanks for your help. I appreciate it very much. Also, I'll take a look at the request-response plugin - I didn't know about it actually.

@efurmantt
Copy link
Contributor Author

@ifandelse I want to thank you again for your help. I am still running into an issue related to this and wonder if would like to share your opinion.

I see how these latest changes help keep the cache compact. However, I notice something of a performance problem during compaction when there are a fair number of long-lived subscriptions.

Here is some sample code that demonstrates the issue:

var postal = require('postal')

postal.configuration.autoCompactResolver = true

// Create some long-lived subscriptions
for(var i = 0; i < 100; i++) {
    postal.subscribe({ channel: 'eternal', topic: 't' + i, callback: function() {} });
    postal.publish({ channel: 'eternal', topic: 't' + i, data: 'data' });
}

// Create transient subscriptions
for(var i = 0; i < 100; i++) {
    console.time('compaction')
    postal.subscribe({ channel: 'transient', topic: 't' + i, callback: function() {} }).once();
    postal.publish({ channel: 'transient', topic: 't' + i, data: 'data' });
    console.timeEnd('compaction');
}

Each compaction takes about 1.5 seconds in my environment. This delay seems to grow exponentially with the number of subscriptions. For example with 150 subscriptions, I see 5 second delays.

I realize it's possible to limit the frequency of compaction but long delays are potentially still be a problem in that case.

I've been looking at the code and trying to come up with a good, clean solution with low impact on the codebase. I don't have something yet, but if I come up with something, I will submit a PR.

Also, if you like, I would be happy to close this issue and move the performance problem into it's own issue if you think that would be better.

@ifandelse
Copy link
Contributor

@efurmantt Wow - 1.5 to 5 seconds! OUCH. My guess is the real bottleneck there is the getSubscribersFor check going on under the hood, since a quick look at compacting alone shows it's much faster (milliseconds): http://jsfiddle.net/ifandelse/gwc4855k/.

When I made the v0.12 update to postal, I added a 3rd optional arg to the resolver's compare method - which can be used to instruct the resolver to not cache the match. My line of thought was this: If the envelope includes a particular header (something like resolverNoCache), then it won't cache the topic/binding comparison result. Then, the postal.request-reponse add-on could add this header to its req/res envelopes so that the one-off-dynamically-generated topics don't bloat the resolver cache (and your request/response implementation could do the same, etc.).

Between the cache compacting and the above option of opting out of resolver cache altogether, we have the "book-ends" solutions. Once that's in place, we can keep looking into ways to more efficiently purge the resolver cache. (Sorry that's so intensive, btw. @arobson has been playing with some ideas that store the hierarchical topic structures in a trie - it could be an option if we can get certain aspects of it to perform like we expect.)

@ifandelse
Copy link
Contributor

I've updated postal to v0.12.2 and postal.request-response to v0.3.1 - both allow you to leverage the resolverNoCache envelope header. Postal.request-response's reply envelopes (the ones that use the auto-generated topic) bypass the resolver cache altogether.

@efurmantt
Copy link
Contributor Author

@ifandelse I took a look at the resolver cache and I agree that changing how items are stored would make it easier to purge. Currently items are stored using keys { 'x|y': true } whereas storing them as a nested object { x: { y: true } } would make it easier to optimize the compaction code.

I just looked at the request-response code and ran a test on the pattern used there. Unfortunately, I am noticing a problem with this approach as well. (Sorry if you're getting too tired of hearing this from me.) Here is code which represents the case I'm looking at:

postal.subscribe({ channel: 'c', topic: 'temp', callback: function() {} }).once()
postal.publish({ channel: 'c', topic: 'permanent', data: 'data' })
postal.publish({ channel: 'c', topic: 'temp', data: 'data', headers: { resolverNoCache: true } })

console.log(Object.keys(postal.configuration.resolver.cache))

In this case the console spits out [ 'permanent|temp' ]. Because it's request-response, we don't desire any caching for the temp topic, but any publication that happens while the temp subscription is active will create a persistent entry in the cache for the temp subscription.

@efurmantt
Copy link
Contributor Author

I realized it's possible to disable caching altogether by monkey-patching the compare function. I tested the performance impact in the application I'm working on - it's minimal in my case. For my purposes this is a sufficient solution.

@ifandelse
Copy link
Contributor

@efurmantt not tired of you talking about this at all 😄. These kinds of challenges are important to work through. Regarding the caching structure, I think I used to have a nested object structure, but I went with the current approach to save one level of property lookup. However, a lot has changed since then in how I cache, so a nested structure might make more sense. I'll just have to evaluate it once I get some time (and once I get a chance to peek at @arobson's trie approach).

Regarding the snippet you gave as an example - the reason permanent | temp is cached is because they're on the same channel (channel c). As a result, when the first publish occurs, postal compares the permanent topic to all subscriptions on that channel, including temp, and caches the result. Your options here are:

  • have permanent and temp separated onto different channels (FWIW - postal.request-response sends replies on a different channel than requests by default)

  • monkey patch like you're doing (hey, it's JavaScript, that's cool! - Although this makes me want to add an option to turn it off altogether, if that would help)

  • use auto-compacting (which I know isn't a good solution from what you've described, given the perf hit)

  • Manually do a targeted purge once you're ready to retire specific bindings/topics via something like:

    postal.configuration.resolver.purge({ binding: 'temp' });

@efurmantt
Copy link
Contributor Author

@ifandelse I appreciate all your help. If there were an option to just disable the compare cache that would probably be the best option in my case.

@ifandelse
Copy link
Contributor

I think we're good to close this, given that we merged in #104. Just ping me if you need anything else, thanks!

@khamasaki
Copy link

Hi, I am running into this as well, and it is still occurring in 1.0.2. I actually tried upgrading from 0.10.3 to 1.0.2 after we noticed a slow memory leak on our production servers, and it has actually gotten worse in my testing.

For some background, we are heavily using postal.request-response as part of our server side architecture. I also upgraded postal.request-response to 0.3.1 to take advantage of the fix there.

I think I traced the problem to the publish method, particularly this line:

https://github.com/postaljs/postal.js/blob/master/lib/postal.js#L469

It seems that even though postal.request-response is sending the header that tells postal to ignore the cache, the publish method is still putting something into the cache. When I perform a memory dump of the server after it has been running for a while, there are a lot of strings being kept in memory that look like this: "e8ad6cd0-7149-4dff-813d-8383ff1d9e1a|3eb0eb18-7eca-43b0-a73d-8dd7c7bdf902", which seems to correspond to the cacheKey being used in the publish method. After several thousand requests this really starts to eat up a lot of memory.

If I replace line 469 with the following:

            cache = [];

the memory problems go away. Clearly, this is not the correct solution to the problem, but it does indicate that I am looking at the right thing.

Would it be possible to have the publish method not put anything in the cache when the resolverNoCache flag is set? The logic involved there with the cacherFn seems a little complex and I am a little hesitant to attempt a fix myself, so if it would be easy for you to implement I'd appreciate it. If not, I can take a crack at it.

@ifandelse
Copy link
Contributor

Thank you for the detailed explanation! I will definitely take a look at
get back with you ASAP.

Sent from my iPhone, so just assume that auto-correct mangled my words...

On May 6, 2015, at 2:09 AM, khamasaki [email protected] wrote:

Hi, I am running into this as well, and it is still occurring in 1.0.2. I
actually tried upgrading from 0.10.3 to 1.0.2 after we noticed a slow
memory leak on our production servers, and it has actually gotten worse in
my testing.

For some background, we are heavily using postal.request-response as part
of our server side architecture. I also upgraded postal.request-response to
0.3.1 to take advantage of the fix there.

I think I traced the problem to the publish method, particularly this line:

https://github.com/postaljs/postal.js/blob/master/lib/postal.js#L469

It seems that even though postal.request-response is sending the header
that tells postal to ignore the cache, the publish method is still putting
something into the cache. When I perform a memory dump of the server after
it has been running for a while, there are a lot of strings being kept in
memory that look like this:
"e8ad6cd0-7149-4dff-813d-8383ff1d9e1a|3eb0eb18-7eca-43b0-a73d-8dd7c7bdf902",
which seems to correspond to the cacheKey being used in the publish method.
After several thousand requests this really starts to eat up a lot of
memory.

If I replace line 469 with the following:

        cache = [];

the memory problems go away. Clearly, this is not the correct solution to
the problem, but it does indicate that I am looking at the right thing.

Would it be possible to have the publish method not put anything in the
cache when the resolverNoCache flag is set? The logic involved there with
the cacherFn seems a little complex and I am a little hesitant to attempt a
fix myself, so if it would be easy for you to implement I'd appreciate it.
If not, I can take a crack at it.


Reply to this email directly or view it on GitHub
#95 (comment).

@ifandelse
Copy link
Contributor

@khamasaki - I think you're right - I'll get an immediate fix together where postal will check for the resolverNoCache flag, and then longer-term I may refactor that envelope header name to noCache, etc. (since we're beyond just the resolver's cache at this point). Barring any unforeseen kids waking up, meteorites landing on the house, or me crying that the season of Person of Interest is over, I should have this done sometime tonight....

@khamasaki
Copy link

@ifandelse - Thanks so much for looking into this so quickly, really appreciate it. Also, fun show, Person of Interest - I'm behind a couple of episodes so not time to mourn for me yet!

ifandelse added a commit to ifandelse/postal.js that referenced this issue May 7, 2015
ifandelse added a commit that referenced this issue May 7, 2015
Fixed #95. Added JSCS for formatting. Bumped version to v1.0.3
@ifandelse
Copy link
Contributor

@khamasaki see if that works out for you...thanks!

@khamasaki
Copy link

Works great, from what I can tell - both the slow leak from 0.10.3 and the somewhat faster leak from 1.0.2 seem to be gone in 1.0.3. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants