-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
finalize() should not be overridden #83
Comments
Looks like a good improvement to have. I don't have much experience with the Cleaner-s. Any help here will be very welcome. |
Are cleaners needed in this scenario? The finalize method calls |
I have done what's Gzip streams are doing - if you forgot to close the stream the underlying file descriptor will be closed on the next GC cycle when the steam object is collected. I had a bug (#26) that leaked memory and FDs by not closing the streams automatically and users were expecting it. |
BTW, calling .close() is recommended - I have noticed an edge case that may lead to leaked file descriptors: if the file was meanwhile deleted, the fd is not released. It's not specific to the library and can be observed without any compression on top if InputStream |
Gzip stream use Cleaners as mentioned earlier in this post. Finalizers are deprecated. Maybe it'd just be similar to add a shutdown hook to close the resources on closure or similar. |
Yes, it makes sense. May be we were looking before the Inflater was ported to Cleaners. |
Just looked, and Zip/Gzip streams rely on them for closing. BTW, I am open to PR moving us to |
The deprecation of finalizers and cleaners happened after JDK8. |
Yes, but we still support back to JDK6. Is there a way to implement bi-modal behavior depending on which JDK we are loaded in? |
If you look at the Java 9 source code for cleaner it's a Weak/Soft/Phantom list of objects + a background thread or similar that executes the action. You can create a simple daemon thread that does something similar. Make it run on shutdown and/or after quite a long time (so it doesn't cause performance impact). |
Is it possible too use the fork/join pool to do the cleanup? Probably not, WeakReferenceQueue blocks. There are some other ways to do the bi-modal behavior, you can have both implementations in their own private static inner classes that hold static members of an interface you need to use to clean, and try to touch each of these values from static context in turn until one of them works, catching the appropriate class/method/security error. |
@scottcarey this is pretty much what does the code linked in the first message in this thread do. It just probes "sun.misc.Cleaner" and "jdk.internal.ref.Cleaner" directly. "java.lang.ref.Cleaner" can be added to the mix too. |
@leventov Agree with you. I have encountered this problem with zstd-jni. Finalizers are inherently problematic and their use can lead to performance issues, Furthermore the timing of finalization is unpredictable with no guarantee that a finalizer will be called. See https://bugs.openjdk.java.net/browse/JDK-8165641 Use of finalizers is yet another source of potential memory leak issues. Whenever a class’ finalize() method is overridden, then objects of that class aren’t instantly garbage collected. Instead, the GC queues them for finalization, which occurs at a later point in time. Additionally, if the code written in finalize() method is not optimal and if the finalizer queue cannot keep up with the Java garbage collector, then sooner or later, our application is destined to meet an OutOfMemoryError. |
Was hit by this issue in production, so it's not a theoretic problem. We use zstd as a compression format for a large batch of data files using Apache Flink (version 1.9 running on adoptopenjdk 8u222). Sometimes on a job restart (due to crash somewhere in our java code) we experienced JDK crashes with the following backtrace:
Flink creates a separate ClassLoader for each job on a separate thread. When job is done, the classloader is closed and all classes (and JNI libraries) are unloaded. The issue is that (I guess) on jvm8 there is no guarantee that finalizers will be run before the JNI library unload happening. Reproducer code (sorry for scala), which tries to do something similar to Flink: import java.io.{ByteArrayInputStream,ByteArrayOutputStream,File,FileInputStream,InputStream}
import java.net.URLClassLoader
import org.apache.commons.io.IOUtils
object Main {
val url = new File("/home/shutty/Downloads/zstd-jni-1.4.3-1.jar").toURL
val in = IOUtils.toByteArray(new FileInputStream(new File("/home/shutty/Downloads/demo.zst")))
def main(args: Array[String]): Unit = {
(0 to 100000).toList.par.foreach(openclose)
}
def openclose(id: Int) = {
var cl = new URLClassLoader(Array(url))
val result = read(cl)
cl = null
System.gc()
}
def read(cl: ClassLoader): String = {
val zstdClass = cl.loadClass("com.github.luben.zstd.ZstdInputStream")
val constructor = zstdClass.getConstructor(classOf[InputStream])
val zstd = constructor.newInstance(new ByteArrayInputStream(in)).asInstanceOf[InputStream]
val buf2 = new String(IOUtils.toByteArray(zstd))
// yes, we did not close the ZstdInputStream here intentionally
buf2
}
}
On jvm11 everything works fine. The hack we did to solve this issue is to wrap the ZstdInputStream and override its finalizer with empty method. |
Scala is fine. @shuttie, what if we add constructor param (or setter) to make the finalizer noop, something similar with the wrapper you created? In your production code, are you closing the |
They may not interact well with separate class loaders as noted in #83, and also bump the patch version.
@shuttie , in v1.4.4-3 I have added options to all classes no make finalizers noop, e.g. |
If out.write(dst, 0, (int) dstPos); will throw an exception due to underlying OutputStream which happens frequently , |
Finalizers (and other GC magics like phantom refs) still have GC overhead even for users that don't depend on them for cleanup, i.e users that always close their streams. An alternative proposal: factor out the existing streams behavior minus finalize() into new base classes, and leave the finalize in the existing streams classes which extend the new ones. This lets users who always close and want to avoid finalizer overhead opt-in, and doesn't break existing users relying on GC magics for cleanup. |
@raifo , that seems sane approach. I can put it in place shortly. |
Just pushed v1.4.8-4 with the new classes without finalizers. |
FWIW, by having ZstdOutputStream extend ZstdOutputStreamNoFinalizer a number of breaking changes were introduced. Code that compiled on 1.4.8-3 no longer compiles with 1.4.8-4. e.g.
Results in this compiler error:
This means some runtime issues could be introduced as well. If new code is running with 1.4.8-4 any dependencies compiled with 1.4.8-3 will likely break. |
@Logic-32 , that's a good point. There is pro/cons of keeping the API stability vs tying us to upstream versions. Regarding the runtime issue, most likely they would no break but will not run finalizers. I will try some F-bounded generics magic this weekend to see if it applies and solves the problem, but if you have any other suggestion how to remove the finalizers in backward compatible way, please chime. |
In part, it'd be nice to at least see a major version bump if you pursue the current change. That way, consumers know to expect some breakage. Semantic versioning aside, a good solution would depend on how backwards compatible you want to be. If you don't care about JRE's < 9 then you could remove the That way, all existing implementations continue to work (thanks to proper inheritance) and then the constructor for ZstdOutputStreamNoFinalizer can just skip the Cleaner call. |
This would be my preferred solution. It will not even need 2 classes. But we still care about JVM8 and even compile and work on JVM6. Regarding major version bump - it will stop corresponding to the upstream Zstd version and I am a little but hesitant - I would prefer backwards compatible solution. |
Actually looking in the runtime case - it will not be a problem: the vtable is attached to the instance and we don't change it when we modify it with the |
Thanks for the quick turn around! Just reviewing the latest commit I believe you still need to add generics to ZstdOutputStream before that will work. Otherwise, But, I wouldn't do that yet as it makes working with ZstdOutputStreamNoFinalizer a bit awkward. A good static-code analyzer (or even IDE) would recommend declaring any instance of ZstdOutputStreamNoFinalizer as However, I think you can get away with what you're going for, and still be backwards compatible, by overriding each setter in ZstdOutputStream and changing the return type to ZstdOutputStream without relying on the generics. It's not as pleasant to maintain but better from a consumer standpoint. e.g.
Also, while we're on the topic of better for consumers, here's something to debate: how would you feel about moving ZstdOutputStreamNoFinalizer to a different package than ZstdOutputStream and keeping the name ZstdOutputStream? Deprecating the current ZstdOutputStream in favor of the new one should be considered important as relying on the GC to close resources is strongly discouraged since you never know when GC will occur. Since most developers are lazy however, they'll be more inclined to use ZstdOutputStream over ZstdOutputStreamNoFinalizer simply because the name is shorter. The only problem with this approach is that you not only have to come up with a name for the new package but also hope that no one complains about requiring a package name to disambiguate the two. Though, why you'd ever import both in any case is beyond me. I'll try and check in over the weekend but will otherwise probably not be available until next week. Let me know what your thoughts are! |
Hmm, that's interesting idea to use the packages for disambiguation. Though I could not come with better name than Yes, working with the Edit: now reflecting on why I did caught this issue, there are a few circumstances:
|
I went ahead with creating *Base classes that have all the logic and 2 facades - with or without the finalizer that fill the generic param - this should be compatible with both 1.4.8-3 and previous, and with 1.4.8-4. I also made the Base classes package-private so we can remove them if we find better strategy of we migrate fully to Java-9. |
Thanks for the updates @luben. I think those should work out nicely for us. FWIW, we're a java shop that likes to be on the latest version of things. |
@Logic-32 , I have pushed v1.4.8-5 with the latest changes. |
I will keep the issue open to track adding Cleaners and further changes. |
@Logic-32, as you suspected there are also runtime API compatibility problems with v1.4.8-4 and 5. The Avro/Parquet/Spark ecosystem noticed when trying to update dependencies (#161). I have released v1.4.8-6 that achieves the same result but keeping the runtime API compatibility - now the Edit: There is also v1.4.8-7 as I forgot to make the NoFinalizer classes public. This will not affect the previos uses of the library, but may break users of sub-version 4 and 5 that started using them. |
Hi @luben, we saw some finalizer issue from ZstdDecompressCtx. I noticed there's "setFinalize" to disable finalizer for AutoCloseBase. But from our code path
there seems no way to set it without touching
|
Yes, |
The idea of the Compress/Decompress Ctx is that you can create an instance and use them to compress/decompress multiple independent messages. This lowers the overhead. The particular commit forwards the Zstd methods to the new classes as we don't want to maintain duplicate code-paths. |
Gotcha! Thanks for the explanation @luben! |
BTW, is it feasible to add an option like "decompressByteArrayWithoutCtx" in Zstd class if we want to avoid Ctx? Thanks! @luben |
Internally it will create and discard the context - either we do it on the java side, or it will happen on the C side. |
i was thinking more of having an extra java api to directly access the old C code
|
Do we have any estimation on perf penalty for not using ZSTD_decompressDCtx? |
I think they will perform the same. The penalty is maintaining duplicate code. |
I see. It might be cleaner to just add Ctx without finalizer then. BTW, the issue we have is on presto (prestodb/presto#15028). |
Yes, I am thinking we may not even need the finalizer. I just have to make sure we manually close the contexts everywhere we use them. |
Thanks! Let me know if something I could help! |
@lic9, try v1.4.9-2 : it does not have finalizers in the Base class |
Thanks! Just tested it and the issue seemed gone. Will run more tests. |
@luben we have upgraded to v1.4.9-2 in presto: prestodb/presto#15966 |
Just do let you know, finalization will likely be deprecated for removal, see JEP 421. |
Following the pattern set by ZstdInputStreamNoFinalizer and ZstdOutputStreamNoFinalizer, add NoFinalizer variants for the direct buffer compressing and decompressing streams. The original classes with finalizers are reimplemented as simple wrappers over the NoFinalizer variants. luben#83
Following the pattern set by ZstdInputStreamNoFinalizer and ZstdOutputStreamNoFinalizer, add NoFinalizer variants for the direct buffer compressing and decompressing streams. The original classes with finalizers are reimplemented as simple wrappers over the NoFinalizer variants. #83
yes, we will need some kind of major version bump when that comes in effect. And we can simplify a bit |
Overriding
Object.finalize()
method is very bad for GC. Instead,sun.misc.Cleaner
, or intermediate that automatically detects the presence ofjava.lang.ref.Cleaner
and uses it if present (something like this: https://github.com/OpenHFT/Chronicle-Core/blob/067e963a0b79775d9dee33cb13b5fa5b7e49683e/src/main/java/net/openhft/chronicle/core/cleaner/impl/reflect/ReflectionBasedByteBufferCleanerService.java, for example) should be used.The text was updated successfully, but these errors were encountered: