-
-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON::Util::URI.parse memory leak #329
Comments
Nice work, thanks a lot! I agree with your comment wrt the performance vs. memory tradeoff. |
Yes, this is an issue. For performance reasons we cache the result of
parsing a string to an addressable URI. The problem is, we never empty out
that cache, and it's global, so it never gets GC'd (this is separate to the
schema cache, which can be cleared).
Ideally what I'd like to do is centralise all the caches, and make sure
that when you set the clear_cache: true all of the caches (including the
URI parsing cache) get cleared.
|
Actually, scrap that, what I'd really like to do is remove caching
entirely(!!!) but that's a pipe dream right now
|
@iainbeeston In my case the cache was just a couple of dozen keys; the problem seemed to be the use of |
There are two main caches in json-schema. One is for schemas, one is for parsed uris. Previously, setting `:clear_cache` as a validation option would clear the schema cache but not the uri cache. This has been reported to cause a memory leak (see issue voxpupuli#329) as over time the uri cache gets bigger and bigger and is never cleared. This change makes sure that whenever the schema cache is cleared, the uri cache is also cleared. Incidentally this change is not thread safe, and multithreaded applications that clear the cache may see issues. But until we can refactor to either disable caching or get rid of class variables, I don't believe there's much we can (realistically) do about that. This is still an improvement on what's gone before.
I've made a potential fix for this in #361 - it's not a long-term solution but it solves the immediate issue. |
There are two main caches in json-schema. One is for schemas, one is for parsed uris. Previously, setting `:clear_cache` as a validation option would clear the schema cache but not the uri cache. This has been reported to cause a memory leak (see issue voxpupuli#329) as over time the uri cache gets bigger and bigger and is never cleared. This change makes sure that whenever the schema cache is cleared, the uri cache is also cleared. Incidentally this change is not thread safe, and multithreaded applications that clear the cache may see issues. But until we can refactor to either disable caching or get rid of class variables, I don't believe there's much we can (realistically) do about that. This is still an improvement on what's gone before.
There are two main caches in json-schema. One is for schemas, one is for parsed uris. Previously, setting `:clear_cache` as a validation option would clear the schema cache but not the uri cache. This has been reported to cause a memory leak (see issue voxpupuli#329) as over time the uri cache gets bigger and bigger and is never cleared. This change makes sure that whenever the schema cache is cleared, the uri cache is also cleared. Incidentally this change is not thread safe, and multithreaded applications that clear the cache may see issues. But until we can refactor to either disable caching or get rid of class variables, I don't believe there's much we can (realistically) do about that. This is still an improvement on what's gone before.
I'm closing this now #361 has been merged. Should be available in the next release |
thanks! |
hey folks,
I benchmarked my app, which does JSON Schema validation in a tight loop (10ks of validations per minute per thread, 10m of validations per run), and I noticed the following:
Using the (damn wonderful)
stackprof
gem, almost all the Addressable URI objects are created by this helper: https://github.com/ruby-json-schema/json-schema/blob/master/lib/json-schema/util/uri.rb#L26. 500m+ Addressable URI objects are created per run, as my json-schemas use$ref
s quite a bit.But what is odd is none of these objects are being GC'ed. At a cursory glance of the code, it seems that they should be as
#dup
is used. However, if I remove the class-level cache and the use of#dup
, like so:the memory leak is gone. My process went from growing to ~ 1.5GB per run to staying at a consistent ~ 250MB.
I suspect the issue is the
@parse_cache
combined with the dup. The#dup
creates a new object, and withinAddressable::URI
it calls#dup
on the strings. However my understanding is that this is a copy-on-write process (see http://www.justskins.com/forums/memory-behavior-of-string-8195.html), which means that a reference will be kept around to the shared parent object until it is modified. All it would take is one string reference from the parent object (which is never deleted as it is stored within@parse_cache
) to stick around to prevent any of the dupped object from being GC'ed. I haven't dived into this a ton or into the C code (outside of benchmarking and patching) but this is what I suspect is going on.The above patch is marginally slower as it has to parse the strings with regexs (which is slow, plus the regexs used by
Addressable::URI
are not class variables but created every time, which is slower and less efficient). BUT I think the slight speed hit (which I only benchmarked within my app, so while it wasn't in isolation, within my app it caused less than a 0.01% decrease in speed) is more than acceptable in exchange for plugging a memory leak.Thoughts?
The text was updated successfully, but these errors were encountered: