-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase in native image RSS after upgrade to ce 19.3.0 #1984
Comments
The 49172 KByte block could be a part of the normal Java heap, but I have my doubts because the size is unusual (not a multiple of the 1024 KByte aligned chunk size). It could be a single large array allocation though. One way to find out if it is part of the Java heap is to use the Enterprise Edition, which by default uses a contiguous address space for the Java heap (so there is only one large memory mapping for the entire heap). If it is not a Java object, then either Netty or some other framework included is doing a large native memory allocation. Netty by default allocates large buffers in native memory. Older versions of Netty had a Native Image configuration missing and therefore sized buffers based on the heap size of the image generator and not based on the Java heap size of the image at run time. |
FYI the netty issue has been addressed in netty/netty#9515 which was included in netty 4.1. |
@christianwimmer: The netty version in Quarkus 1.1.0.CR1 is |
Thanks for the thoughts. So netty/netty#9515 was included in |
@johnaohara The following commit should also be checked as it was introduced into Quarkus to fix a netty issue with GraalVM 19.3.0: quarkusio/quarkus@3a0599d For the record, it was introduced as a workaround to delay the netty update to |
@gwenneg I reverted that patch and upgraded netty to |
Quick update on where I am with this issue. I traced out native memory allocation, and I do not see netty allocating large chunks of memory at startup. I built the application with debug symbols, and see there is a call to
The instruction being executed is;
with registers;
So it looks like a chunk of memory (30121KB in this case) is being allocated and zero'd with |
Hey @johnaohara, can you add to your tests (if not already) and report whether setting a aggressive |
Setting -Xmx has no effect, the memory is allocated and written to during the initialization phase of the process, before the main() entry point for the application In the executable created with 19.3.0, there is now a new
This is the backtrace of where the allocations are coming from;
And the instruction that is executed that fills the memory region;
I am writing up how to reproduce now |
Interesting that the allocation happens before Java code starts running. So it must be something in the C code of the JDK. One possible cause: http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/a2154c771de1/src/solaris/native/java/net/linux_close.c#l88 allocates a table based on the number of file descriptors. Can you try running the executable with a low number of file descriptors and see if that changes the memory allocation behavior? That seems like the easiest way to confirm. The file descriptor table size was fixed in JDK 9: https://bugs.openjdk.java.net/browse/JDK-8150460 |
If the RSS issue comes from a bug fixed in JDK 9 then it might be interesting to build the application using JDK 11 and see the impact on the memory used. I know it's a very incomplete test but it could give some useful information at a very low cost. |
@christianwimmer right, this makes sense. So I noticed that there was a class containing a HashMap of file descriptors [1] and a JDK11 substitution that that referenced a global Singleton PosixJavaNetClose [2] for cleanly closing As Now the backtrace is;
That backtrace fits with your theory. Reducing the max number of File Descriptors (from 655360, to 2048), the RSS for a simple app dropped from 51.6MB to 21.2MB. What I don't understand atm is;
1 - https://github.com/oracle/graal/blob/vm-19.3.0/substratevm/src/com.oracle.svm.core.posix/src/com/oracle/svm/core/posix/PosixJavaNetClose.java |
@johnaohara the 19.2 vs 19.3 behavior difference is explained by the fact that GraalVM native image has rebased its low level Java support on the JVM C library (sorry fore the layman explanation). All in all, it's a massive increase for GraalVM native image even on small file descriptor size. While being a "drop in the bucket" for OpenJDK. Would be good to try and find some alternative strategies for these different environments. CC @dmlloyd for ref. |
@christianwimmer Yes, that looks very much like it is the culprit. The code that John found which is running before JVM startup (__libc_csu_init) is introduced by the linker as part of the premain execution. It runs static initializations for global (static) data. The premain coce hands it a table of pointers to snippets of compiled C code to run. The table includes pointers to functions marked with attribute(constructor) -- like the init() function you highlighted that is in linux_close.c. Table entries are collected by the linker from all the various libraries and objects linked into the final program. Since 19.3.0 includes static OpenJDK libraries one of them (libnet.a I guess?) will include a global init table entry for that init function. So, if you want to use OpenJDK libs then you either have to build them so as to do this static init differently or else configure the deployment environment to make it less greedy. Luckily, this appears to be the only native constructor function in the OpenJDK code base (I grepped '__attribute(' in the source tree and that is all I could find. @johnaohara in jdk8 it was the case that for many of the java native code subtrees the solaris code was the prototypical version that was used for linux and unices. Note that in jdk11the path to file linux_close.c is subdir src/java.base/linux/native/libnet. |
@adinn Thanks for the explanation wrt to solaris code. Yes, |
I wonder if we could talk to Alan or someone else in the (OpenJDK) net/io area and come up with a better (perhaps pure-Java) alternative solution to this (very old) code; I know Alan has been majorly reworking blocking I/O in any event. Or at least, perhaps the code could provisionally be changed to use |
You might need to square use of mmap with the substrate team. One of the original design goals of substrate was not to futz around with most low-level library calls or system calls. The API footprint on which substrate originally relied was carefully restricted to a tiny set of lib functions so as not to prejudice embedding into an Oracle DB process. @christianwimmer would modifying the OpenJDK code to use malloc be ok? Of course any such change would also need backporting to jdk11u. |
Isn't it enough to just backport https://bugs.openjdk.java.net/browse/JDK-8150460 to JDK 8? That seems like a simple thing that can be done in the next weeks so that the change is in the next release. I agree that the whole JDK code could be improved, but that sounds like a longer-term project. We don't really want to have different C code in the JDK for Native Image vs. the regular OpenJDK. So for a major rewrite, the change would first need to be merged into OpenJDK master, then backported, which for sure will take a long time. |
I agree. I would much prefer go for a quick fix followed by the longer term fix in parallel. |
I presume that patch (https://bugs.openjdk.java.net/browse/JDK-8150460) is in JDK11? I see this issue with JDK8 and JDK11, the problem is this line http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/ee0a64ae78db/src/java.base/linux/native/libnet/linux_close.c#l125 calls calloc, which is allocating and zeroing an array. If the patch was backported, we would still see RSS growth with max number of FD's below INT_MAX |
@christianwimmer yes I think a backport of that patch is the best way forward. I will propose this on the jdk8u list. The code at the line you cite is limited to allocating storage for an array of fdEntry_t structures with at most fdTableMaxSize (i.e. 4K) entries - see http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/ee0a64ae78db/src/java.base/linux/native/libnet/linux_close.c#l78. So, at worst that is roughly 50 * 4Kb i.e. 200Kb. With a large or unlimited fd count there will also be a later allocation of an overflow table which is an array of pointers of type fdEntry_t* -- see http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/ee0a64ae78db/src/java.base/linux/native/libnet/linux_close.c#l139. The amount of pointer storage allocated varies according to the chosen fd count. However, the worst case is when the fd count is RLIM_INFINITY. In that case the overflow table contains (INT_MAX - 0x1000) / fdOverflowTableSlabSize entries. Neglecting the subtraction that is roughly 32K * 64K * 8 / 64K bytes = 256K. So, including both allocations the worst RSS growth at startup ought to be less than half a Mb. |
@adinn I will double check the JDK11 result |
I have checked the quick start with ce 19.3.0 JDK8 and JDK11. With max files 655360, RSS figures are
The patch in JDK11 is reducing the memory allocated by the native binary. I see the block of memory allocated at start up drop from 30736Kb to 12Kb |
@johnaohara Thanks for confirming. So, the jdk8u patch should be all we need. @christianwimmer Apologies for misphrasing the question I asked earlier. I meant to say "would modifying the OpenJDK code to use mmap be ok?" The reason I am pursuing the point is that as far as the OpenJDK lib maintainers are concerned the strategy David proposed of calling mmap with MAP_NORESERVE would have been a perfectly viable alternative to the fix I just pushed. However, if reliance on functions like mmap is an issue for use of these libs in Graal's native images/libs then this raises a flag since there is no OpenJDK policy to ensure that native libs limit their use of system APIs beyond the need to ensure they don't interfere with the JVM library. Was that possibility considered in the switch to use the OpenJDK libs? |
@adinn Substrate VM uses |
One problem with the proposed mmap idea would be that you have to pre-reserve the whole expected range. But its theoretical max size can be very large or is typically even unknown, if RLIM_NO_FILE=0|1. Oversizing it increases virtual process size unnecessarily, which may be less important than RSS but you still do not want it to increase by several GB for such a minor issue. I still think for this case - where in 99% of cases you only have to store a few dozens or hundred fds - a sparse array is the tighter solution. |
Uncommitted memory is, in essence, a sparse array managed by the OS. But, it's only one possible option of course. A sparse array is probably not as good an idea as it seems at first though: the operating system typically will always hand out the lowest-numbered available file descriptor that is free. So a growable array is probably most likely to maximize memory and computational efficiency; in the vast majority of cases, the highest file descriptor that Java is using is generally going to be very close to the total number of file descriptors that Java is handling. Therefore very little space is likely to be wasted by such a scheme. |
A simple growable array is not a good idea IMHO:
For instance, one could have concurrent native non-VM code opening files like crazy and driving the fd counter up, and even though the VM does not care a bit it'd be forced to keep space for the associated fdEntrys of those fds.
|
The best solution would probably be to move the whole logic from C to Java. But I don't see that as necessary, the current JDK 9 code solves the problem. |
@christianwimmer thanks for clarifying the more relaxed constraint. I agree that the jdk9 patch solves the problem perfectly well and, as @tstuefe says, avoids the increment in VMEM size that lazy mmap would introduce. The jdk8u backport has been reviewed. I am waiting for confirmation that I may push. |
The backport of the JDK issue should be in the lastest GraalVM 19.3.1 release. @johnaohara Can you please verify that. |
@johnaohara could we please get a table that lists 19.2.1 (JDK 8) vs 19.3.1 (JDK 8) vs 19.3.1 (JDK 11)? I think only then can we see if there is still a problem (I think there might be). |
I just updated quarkusio/quarkus#6136 From the tests that I ran, I saw the following;
The 19.3.1 test was run against a built from quarkusio/quarkus#6574 |
Thanks John! |
yay! |
@johnaohara one more question - is that Linux? Thanks. |
@dmlloyd yes, should have specified. Those results were on RHEL7.7 (3.10.0-1062.1.1.el7.x86_64) |
@christianwimmer is there a PR that contains this backport? I have been looking for it, but can not find it. Thanks |
@johnaohara We apply the patch when building the static libraries only, until proper backports appear in all the JDKs that we are based on. That allowed us to fix the issue without any further delays. |
@christianwimmer Thanks for the info, is patching the JDK part of the release process for Graal CE? Is there a public build somewhere? |
The patched static libs are included in the binaries at https://github.com/graalvm/openjdk8-jvmci-builder/releases/tag/jvmci-19.3-b07 |
@dougxc thank you |
We have found a significant increase in memory usage of a simple quarkus application [1] built as a native image with ce-19.3.0 compared to ce-19.2.1.
The getting-started application, started with a 2MB heap uses the following memory to bootstrap;
19.2.1 - 13876k
19.3.0 - 63048k
There appears to be a single memory address space that is now using 49MB of memory, compared to 20k previously.
Attached are two pmap outputs showing the memory usage for a build against each version.
The arguments passed to native-image were;
19.2.1;
19.3.0;
Do you know what might be causing this?
19.2.1_2m.pmap.log
19.3.0_2m.pmap.log
1 - https://github.com/quarkusio/quarkus-quickstarts/tree/master/getting-started
The text was updated successfully, but these errors were encountered: