-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce size of output executable #287
Comments
True, I have evaluated the image size and even for an empty main program we get ~5MB of an image. There are a few reasons for that:
On the bright side, if you include much of your code the 5MB overhead will remain the same. So this is an issue only for very small images. |
unused-pkgs-hw.txt These are the packages, classes, and methods that are never invoked. They can use as an indicator for elements that should not be in the image. Some things like the @pejovica thanks for the data. |
The general use-case is to remove a common argument for people to use Go-lang. One specific use-case that this would severely impact is something like implementing many small command line utilities as in Linux. Does that 5 MiB include the GC? At least for simple things like helloworld you can prove you don't need a GC. |
It does, but by looking at the list of included elements, I would not say that GC is the biggest problem. I would rather invest that time to remove things that should not be there by any means. For example, By removing these I am confident that we can reach the size of the GOs "Hello, Word!". At one point we removed all methods that were never executed and the image size was 400 KB. This is the lower bound of course, but could be used as a guideline of what we should reach. |
Thanks @vjovanov, that would be amazing! |
You can also use https://upx.github.io/ as a temporary solution to make compressed binaries. Reduces the size by a lot in my experience. |
Any thoughts on this, guys? I'm targeting Graalvm as the (probably/hopefully) the solution for the long cold-starts in AWS Lambda functions written in Java. Smaller binaries would make our deployments faster. Also, AWS has some limits on deployment size, I'm afraid that binaries would become too big if we have multiple dependencies in our project - which is usually the case when using AWS SDK. I think that's a game changer functionality that would make JVM more attractive to the community, especially those who have been flirting with Go and Rust as an alternative. |
(not issue relevant) @miere , already discovered https://quad.team/blog/Micronaut-to-AWS-Lamda-guide ? |
Now that we have GraalVM building against JDK 11, it's only a matter of time until the native compiler can work with the new modularity. I doubt file sizes will ever be improved on JDK 8 though since the class library was very.... let's say "monolithic" before the Project Jigsaw refactor. So until those native compiler improvements, I suggest updating JDK 8 projects to JDK 11 and making them modular in preparation for that :D Also, see what CremboC said - UPX is pretty good. ~11MB exe down to ~3MB. |
@thomaswue @vjovanov I used the following to achieve this result
Steps
Back in 2015 I shared this idea with RoboVM guys. Here is the link to the discussion
However, soon after the company was sold and then came Xamarin. Much later, GluonVM picked it up, and then later Gluon dropped it own VM and started using GraalVM and only very recently it has started giving tools to create GraalVM powered binaries which even an average developer like me can use to build and run my javafx applications on mobiles (android, iphone) and desktop, everywhere. So I felt it is time I could raise this matter again. To be honest, I don't know how much optimization has been already implemented and put in place in GraalVM. GraalVM is amazing no doubt and performance difference is clearly felt from end user experience point of view, no doubt. I might be over expecting, but I feel, if this size issue/feature is cracked, GraalVM can replace every language/platform/runtime in the world, as the first default choice. So to give a summary, the idea/suggestion is
Please let me know your thoughts. Thank you BTW to additionally mention, I had packaged youtube-dl a python app, with a full python runtime environment (stripped) not more than 3MB (after compression). |
That's an interesting comment. I never used SpyFS but maybe it can help here. Getting a JavaFX app under 5 MB sounds very challenging. Did that include the native libraries (e.g. libglass, libprism_es2 etc?). |
It is possible we make a 400 kB "Hello, World!" (@pejovica did this). But this code is completely unsafe and insecure and can lead to segfaults. This could be made as an experimental feature with a strong emphasis on experimental (use at your own risk). For making it a feature, we would need a very strong use-case. |
Hey sorry, my apologies, I didn't notice your question. Then SpyFS data is used to make a duplicate of this custom bootstrap class bundle in another folder. All the classes which were read ( > 0 bytes) and copied completely, all class files which were opened but not read (total read bytes = 0) are copied like dummy class files of zero size, all class files which were neither read nor opened are not copied. This basically forms the stripped-down runtime bootstrap class bundle for that particular application. It tried it like 5 years ago, and haven't had the opportunity to replicate it, however. The old 2015 example I am not able to run anyway, so probably some native libraries I am guessing it must have been pulling out from somewhere. Now to answer the question regarding the native libraries (e.g. libglass, libprism_es2 etc?), yes it included all of them. During the runtime which libraries are actually loaded and used was separately analyzed and all those libraries were copied and used. I hope I was able to explain the approach. It was a very raw method I can say. Because I had made my own kernel filesystem library (binding) in java, I was able to get this done easily. |
Hey, same problem here, I'm working on small CLI app , the only dependency I have is Jline3 but the final executable weights 14 MB, how could I decrease the size ? (The same app in Golang takes 3 MB). I use Java 11
|
I had a look into the size of the generated binary for a hello world main with `objdump -x` output
The full binary has 6866528 bytes. The biggest contributors to that size are the @vjovanov already commented about the size of the unused code that was included. However, since the initial native heap seems to be even quite a bit bigger than that, it would be interesting to understand why that is the case and what's in there.
abridged `-H:+PrintHeapHistogram` output
|
@jrudolph this is an interesting analysis.
|
I refactored one of Real World app from Spring Boot to Quarkus/Panache. My Quarkus app has Uber jar 43Mb and native linux binary is 82.5Mb! 5 time thinner! Is it because native build do not remove all unused classes and methods and every new jar dependency will just add own size to the final binary? Even if it's true I can't realize why resulting native binary is 2 times bigger that fat jar which contains all classes? May be that is because some testing/debug/diagnose/non-prod option is turned on by default? Is there any ways or plans to do some analysis and do not include the unused code or any other redundant stuff? |
On this point specifically: consider that the native binary is including the whole of all JDK classes and Substrate, the "JVM" runtime. The "fat jar" only includes your application code and its dependencies, so you would need to add the size of the JDK for a fair comparison. A good way to compare is via the (full) disk size of a docker image: in the case of native-image make you can wrap an empty image, while the one with the JDK will need not only the JDK but also the shared libraries to which it depends on. That said, it's of course still interesting to try to get closer to what Go is able to - Just bear in mind that the code is possibly different, such as the Java libraries being much more mature and feature rich, they are likely to need more code to be included. |
@vjovanov in Quarkus we make sure many immutable structures that frameworks needs are initialized as a constant during compilation, so for example many such String and HashMap are "ready to go" and guaranteed immutable. I also noticed these take quite some space; I even had the impression Strings are not de-duplicated - I didn't have time to dig further into detail, but if someone wanted to pursue this I suspect there could be some quick and easy wins via:
I would expect this could also give some good performance boosts: much of our code will read those maps extremely often. I did obtain a minor win by de-duplicating some String instances during bootstrap of the Hibernate ORM metadata; that's why I think de-duplication isn't happening in GraalVM's constant pool - but I might be wrong. |
Just a quick note on de-duplication: one would need to be sure that objects subject to de-duplication/converting are never synchronized or have their identity used. |
@dougxc great point, I hadn't thought of that. Regarding - specifically - Strings, I think we can all agree that people should never do this, but I agree it could still be a thing to consider. Perhaps the safe option would be to de-duplicate the underlying byte array? |
That is not clear for me. I thought one of the purpose to have the new separate VM like Substrate was actually to have ability do not bring ALL JDK classes and unused stuff into the native binary. So basically having AOT we can do static analysis and remove everything unused and that why we have so long build process for native build, I thought. BTW After all
Yes that was exactly I did.
As you may see the
That's actually scare me and why I'm asking :-) |
The thing that has not been mentioned yet is that much of the image size is contributed by the static OpenJDK static libraries that are now linked into every native image. These cannot be pruned during Java code analysis to remove unwanted code or data because they are not Java code. In earlier versions of GraalVM Native the behaviour provided by the OpenJDK static libs was reimplemented as pure Java code and most of it was subsequently optimized out of the generated binary, giving sizes much closer to that of equivalent Go programs. However, maintaining all that re-implemented functionality across multiple JDK versions was determined to be pointless effort for little gain so the OpenJDK libs are now used instead. Note carefully that last qualification. The redundant code and data which are linked into these libraries will not be referenced at runtime. So, it will make very little contribution to text or data segment pages in the running image i.e. the overhead you are so concerned about is essentially going to manifest as little more than some extra storage on disk. I know that's a cost but disk is very, very cheap. If you really care about saving some few 10s of megabytes of disk space in your deployed container well then write your app in Go (including writing a great deal of the standard Java lib functionality you are going to need to implement and test and train your programmers to use). If not then stop comparing disk image sizes and start measuring the resident memory costs that will actualy affect your bottom line. |
@adinn If the problem is Also |
@adinn for Linux we compile the static libs with |
It would be if all the libs were always linked in. I'm not sure if that is the case.
The libs provide code needed for various native methods e.g. io, maths functions etc. So, selective inclusion of libs according to which JDK classes get linked in may account for the disparity.
Jar sizes are a completely specious metric against which to compare executable size. Firstly, the sizes are only very loosely coupled. Most of the content of classes in jar files is Symbols, Strings and numeric Constants (it's usually > 90%). Many of these are repeated across a large number of classes so they end up occupying a much tinier amount of space when they are deduplicated to a single Symbol, String or Constant. How much deduplication arises will depend on how much replication there is. So, there is no fixed divisor to apply. So, if you are seeing 90Mb of executable then that may possibly represent a large amount of Java String data in your heap but that would only be because many different Strings occur in that 43Mb of jar code. Other 43Mb jars might contain only a handful of unique Strings. Secondly, most Symbols and many Strings and Constants can be omitted from the image because the analysis shows they are not needed. Symbols are rarely needed anyway so it is mostly Strings and numeric constants that will add to image size. How much they add, after deduplication, really depends on how many of the classes in the jars are actually referenced by the app. If clases methdos or fields are not used then GrallVM does not include them in th eimage. Once again that depends entirely on how the code in the jar is written in the first place plus what use client code makes of those classes. A 43Mb jar might end up contributing once class and a few methods or hundreds of classes and methods. So, I am sorry but the numbers you are quoting really don't corroborate your story about GraalVM being inefficient. It's more complicated than that.
Startup time is another red herring. If OpenJDK library code is not invoked then it won't slow you down having it in your disk image (you might possibly see slightly worse paging of the text section but thta's going to be micro effect). Perhaps download time and costs are significant for you relative to development and maintenance costs. I find that unlikely but I cannot rule it out. As I said, do switch to Go if it suits your needs better. I am just pointing out that 1) this is not a one-way street but a trade-off and 2) your assumptions about where the costs and opportunities/need for improvement lie were incomplete and missing important elements. |
upx has been mentioned in other threads as well, and I don't mind the large file size. native-image is pretty nice, works well (at the least so far that I have used) and is Still, small is beautiful, and perhaps the GraalVM team could consider integrating I'll explore upx but hopefully the GraalVM team considers this here, even if the cyraid mentioned kotlin, and that's a fine comment, but I would like to add that |
UPX is not a solution. Not only is it an external compressor that has nothing to do with the JVM but executable compression always adds measurable time to decompression, which means the java natives will take longer to startup - greatly diminishing one of the main use cases for native compiled java applets. UPX is widely known about, anybody who knows anything about compression will be familiar with it; its not necessary to pollute GraalVM build system with another dependency that users can easily find and plugin themselves. Size is important, but not at the expense of any performance. UPX is a band-aid, not a solution. One of the main attractions for native executables is embedded systems, where space AND performance are a premium. If you want to design a KIOSK system for example, you always needed to bundle a full JRE with them, which makes deployment more complicated and adds another layer of vulnerability. So it's important that any executable size improvements have zero cost to performance, otherwise what's the point - just use a JRE and get all that advanced JIT and GC goodness tuned up. We all need to remember that this is a pretty crazy project - it can take practically any existing Java code since forever and remove the VM from it, making it run natively. In my opinion, it's pretty amazing that the executables are already this small! Hopefully someone figures out something, but I honestly wouldn't be surprised if this is the best we can get without leaving Java behind. I don't mind the executable size, personally - I've worked around it by using one executable with many entry points rather than compiling many individual executables. EDIT: If you are using Java and don't need polygot in Graal native exe's, consider IBM's Quarkus/Mandrel for smaller exe's (it is a fork of Graal VM): https://quarkus.io/guides/building-native-image - though it is container based so yeah, not as simple. |
Info on UPX: tried UPX on native image GraalVM Hello World app (64-bit Windows) and the UPX compressed EXE does not work (does not print Hello World). |
@cosmicdan The Mandrel distribution of GraalVM does not produce smaller executable files compared to the GraalVM Community Edition. We are currently investigating how we can provide a compression mechanism for executables built into the native image generator that is independent of external tools like UPX. A primary contributor to native image sizes are certain parts of the JDK libraries like for example time zone and localization data. This needs to be taken into account when comparing with "hello world" of for example Kotlin Native as those executables are missing those elements. |
The Avian project used to make a kinda micro-JVM that could (with compression) make GUI binaries that were ~1mb in size. That JVM fully supported Java 8 and had a reasonably sophisticated GC. How:
The interesting part was the lite libraries. There's probably uses for something like that, for smaller apps where you don't need many of the features or can rely on thin wrappers around the OS instead of pure Java reimpls. Kotlin/Native can make smaller binaries because there's virtually no standard library. |
@mikehearn Interesting. It would've been nice to have more awareness around that project. Seems you could also have a standalone executable which ran the micro VM and executed the main method in the same executable. |
@cyraid It could indeed do that (bundle into a single EXE). |
is there has any link? i want to know how he did it (400k helloword)? thanks ! |
+1 on this and @acodervic's comment. Would be curious for a status update on this front. A "Hello, world!" program taking up around ~12MB seems like a significant amount; surely there must be some optimization that is able to be done on the backend that could reduce the output file sizes for situations like that, especially considering we're compiling to native execution. Were this to be a JDK optimizer, I'd say anything else, but I find it interesting the amount of extra data being used for situations like these. Notably, as stated this only really affects smaller projects and programs. With that said, even a program like Notepad (the modern notepad) only uses ~900-1kb as a native executable. Now that's not particularly a fair comparison, considering Notepad utilizes the .NET Framework of the Windows computer running it. Nevertheless, I would consider this to be a comparison since the field we are discussing are native binary images, which at this point a large focus on optimization would be necessary (as, if you're going through the effort to create a native image, then you must need optimization of some form). With all of this said, I've noticed that there are significant performance increases found by compiling to a native image. So, this absolutely is more of a "nice to have" rather than a "need to have." |
Hate to nitpick but this is false, it's not a native app - it's a UWP app. That EXE is just a stub, and the UWP app still depends on the Windows Runtime. Additionally, "classic" Notepad on Windows 10 is a ~200kb exe (plus a ~100kb resource file), and even then I believe it still depends on many other Windows DLL's to actually run (like most Windows EXE's). So this is not a fair comparison, considering native Java executables are completely static and standalone. It's enough of an argument to compare graal EXE's with other language "native EXE's" such as from Kotlin or other Java-AOT-compilers. I believe the actual problem here is already mentioned; the "usage discovery" of the compiler is not aggressive enough in eliminating unused classes. Something like that. I guess it just isn't a huge priority is all.
Only for short-lived or "one shot" type applications. Remember that, compiling to native means you lose all of the benefits that a long-running VM can provide with modern JIT and GC. It could result in less GC pauses (stutter) though, if you've not spent the time tuning GC for your application or doing your own "GC-friendly optimization" on GC-sensitive parts of your codebase. See https://github.com/ByerN/libgdx-graalvm-example for an example with results where a Java game was converted to Native EXE. |
The points-to analysis is pretty smart from what I understand. There might be some more juice there but I doubt it. The big hammers exist but might not make sense given how native-image is used and the cheapness of bandwidth/disk space:
etc |
I don't wish to argue this here, since it's not really the place for it, but I do wish to point out that I explicitly stated that in my reply:
And proceeded to elaborate on why I claim it's a valid comparison:
To elaborate further on why I would say this is a valid claim, you must first consider why people are making native images in the first place. There could be many reasons, but the two primary that I have found in my research are 1) performance and 2) size. Native images come with the benefits of not being platform-independent, meaning that native images have the benefits of utilizing the resources different platforms provide. This does include, in my opinion, things like the Win32 API and other systems' native calls. What would be the purpose of compiling a platform-independent application using native-image? There would be virtually no benefits if the entire JVM needed to be embedded into the compiled executable, considering that very thing is an entirely different field already anyways (obviously, this is an exaggeration, but my point is clear). At that point, you might as well begin working in a language like For a project like GraalVM's |
If indeed this is true, then what else would be the cause of the large output - the only thing I can think of is that the JDK itself is still too tightly-coupled. If so, then I suppose the only way possible to "reduce the exe size" is to go the shared library route and start building individual libraries for each Java Runtime it needs to link against. This would at least reduce the footprint when using many binaries on one system, but introduces a whole new group of problems regarding dependency management... ...I ended up making this problem redundant for my use case; embedded Linux and IoT-ish things (where I only have about 20MB to 100MB of free storage space) - rather than literally compiling a binary for every console tool I desired I just bundled them all into one and used symlinks to achieve the result of "many individual binaries". Latest GraalVM 22.x produces a 5MB or so Hello World executable IIRC; that's not THAT bad. Maybe we're just asking too much from this old language/runtime? 😄
Can completely agree with that. Sounds like the majority of the binary is related to the plumbing behind bytecode-converted-to-native. Maybe there's just no demand from paying Oracle clients to make it any better than it is, and really that's a fair enough reason for their devs to have no time to make this perfect. |
Yes if you have many small CLI tools, having them in one binary is the way to go. It's almost like having shared libs, but simpler. We've done some experimental support for this in Conveyor where it uses a small stub launcher and it works well enough. I guess in your embedded use case, downloading code (pages) on demand isn't possible? |
Oh, w.r.t. why so big. I think (not an expert) it's a combination of:
You could chip away at this in lots of ways, but it's a losing battle to use conventional techniques IMO. My own app has >100mb of just bytecode, god knows how big a native image would be. Java is easy and productive with lots of libs. The feature count of modern apps can grow uncontrollably and that's a good thing, but it means code will always grow faster than you can cut it down. Better to investigate big hammers like partial compression, using bytecode+interp to shrink code, on-demand paging from remote servers etc. |
@mikehearn I believe Thomas Wuerthinger identified the main problem here. There is a still lot of code and data linked into the final image because the JDK runtime requires it to be present in order to ensure that the runtime can execute it to prepare for a host of possible things the app might in principle do but in fact does not actually do. In theory, a deeper analysis could remove a lot of this unnecessary code and data. In practice, the analysis has to complete in an acceptable time and this means that stuff does that could be removed does not always get removed. The problem is exacerbated by the fact that the JDK runtime is a bundle of many different libraries, which depend on each other in a complex, multi-linked network. These libraries are structured in a fairly coarse hierarchy, with a base set of core libraries then other libraries layered over them. This organization is visible in the module files introduced with the Java platform module system. However, the size of java.base makes it very clear that there is no finely graded inclusion model for the runtime. That's hardly surprising given the scale and scope of the runtime and the number of developers working on it. That does not mean that there is no long term goal to minimize the runtime and reduce dependencies across modules and within them. In particular, reduction and simplification of JDK runtime non-API classes has been a goal of the OpenJDK project for a very long time. The problem has been weaning users off relying on internal implementations (by enforcing module restrictions) which has gone very slowly. This is likely happen more and more in upcoming releases and it will very likely provide an opportunity to help with the image size problem Graal faces. |
5 mb and 12 mb |
Native image for .net 8 asp.net core web api "hello world rest app" has only 10mb and fast compilation. |
We are working on both, smaller image sizes (try Oracle GraalVM) and faster compilation (see #7626). It seems the .NET core libraries are quite well designed for AOT use cases. We are trying to achieve the same for the JDK, but that takes time.
Nonetheless, Spring/Quarkus/Micronaut are very popular frameworks, and they run on the OpenJDK and not on .NET. Native Image can already improve all metrics you mentioned when compared with the OpenJDK. Even compilation speed can be better, for example if you consider JIT compilation overheads across potential hundreds of deployments of the same app. Native Image compilation just needs to happen once and at build-time.
The GraalVM project aims to improve the Java ecosystem, and I think the Java community is excited about this. |
The reason dotnet compiles faster and lighter is because it only compiles the code used starting from the Main function, and multi-threaded compilation for aot, |
I've just tried this again after 6 years with 23.0.1-graalce and now hello world is 12.8 MB.
|
In terms of size, there's a few things going on:
The best path to smaller stand-alone hello world binaries remains the same as before, IMHO: develop a system that can download only code that's being actually executed, a bit like how web apps do it, and then size can stop being so important. As adinn has pointed out before, most of the code in the binary is cold, so a demand paging system would yield the same result as lots of size optimization work and is probably easier to build. If I had time, that's the direction I'd go in. But if for some reason that's unthinkable the next best path is to develop a fork of the Java standard library that has fewer features, less internal coupling and which outsources more functionality to the OS at a possible cost of portability (but maybe you don't care so much about that or are willing to work around it at the app level). |
I think @vjovanov 's comment about 400kb said it was achieved by dead code elimination? So nothing unsafe or C like, certainly not removing array bounds checks. |
Hello, the binary has a nice compress ratio with upx and you can cut things you don't need with substitutions. I have the infamous "Hello World" like cli app without GC just under 886K on the latest GraalVM. |
@Karm Can you share how you did that? Does it still suffer the 200ms startup penalty for decompression? |
@ianopolous Lemme clean up the repo and share it. I'll comment here later. |
UPX might be fine to reduce storage footprint but it's not ideal because the binary still needs to be decompressed into memory at runtime.
On 13 November 2024 11:12:25 pm AEDT, Karm Michal Babacek ***@***.***> wrote:
Hello, the binary has a nice compress ratio with upx and you can cut things you don't need with substitutions. I have the infamous "Hello World" like cli app without GC just under 886K on the latest GraalVM.
--
Reply to this email directly or view it on GitHub:
#287 (comment)
You are receiving this because you were mentioned.
Message ID: ***@***.***>
Regards,
Daniel Connolly
|
Compiling hello world with substrate vm on ubuntu results in a 6.1 MiB executable. Is it possible to reduce this? The equivalent in golang is 1.6 MiB or < 1 MiB without debug information.
The text was updated successfully, but these errors were encountered: