Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce the ability to generate AppCDS #9710

Merged
merged 1 commit into from
Jun 19, 2020
Merged

Conversation

geoand
Copy link
Contributor

@geoand geoand commented Jun 1, 2020

No description provided.

@boring-cyborg boring-cyborg bot added area/core area/devtools Issues/PR related to maven, gradle, platform and cli tooling/plugins labels Jun 1, 2020
@geoand
Copy link
Contributor Author

geoand commented Jun 1, 2020

@stuartwdouglas mind taking a quick look at this and let me know if you agree with the approach, or should we try the find a why to generate all class names on our own (which is probably a ton safer since we don't need have access to external services - although with the current approach a non-trivial class list can still be obtained)?
If we do want to generate class names on our own, do we want to use every jar we have? How about JDK classes, do we just add keep so standard list around and reuse it?

If so, I'll proceed to add tests for all three 3 jars types (for tests the JVM needs to be launched with -Xshare:on which will ensure the JVM will fail if the AppCDS are not usable) and then add this integration into Jib as well.

@geoand
Copy link
Contributor Author

geoand commented Jun 1, 2020

@loicmathieu, @gunnarmorling mind giving this a spin?

Copy link
Member

@stuartwdouglas stuartwdouglas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the concept, but in practice I think there will be the potential for a lot of problems, including:

  • Big potential for mismatching versions between container and local host. I am not sure what happens in this case? Do you end up with something that won't start?
  • Production databases need to be available when building, as the build runs in prod profile. There is a good chance in most cases this just won't be possible.

I think that we will likely want an approach that just loads all the classes, rather than explicitly starting the application, as even though it generates a larger image it avoids the services problem.

I also think we will want to generate this as part of image creation, so you know that you will always have the correct JVM version. I think this is easy enough with a Dockerfile, not sure how hard it will be for jib.

type.equalsIgnoreCase(PackageConfig.UBER_JAR));
}

public boolean isFastFar() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor typo


int exitCode;
try {
exitCode = new ProcessBuilder(command)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach is going to fail in a lot of cases, as you need all the production services to be present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could certainly fail. I'm not saying it's a good solution - because it isn't, but even like this you still get a list classes :)

@geoand
Copy link
Contributor Author

geoand commented Jun 2, 2020

I like the concept, but in practice I think there will be the potential for a lot of problems, including:

  • Big potential for mismatching versions between container and local host. I am not sure what happens in this case? Do you end up with something that won't start?

You are right that you have to have exact same version of Java for the AppCDS to work. However, the default behavior of the JVM is to just ignore the AppCDS if there is an issue for any reason. For the JVM to fail when there is an issue with AppCDS, you need to have -Xshare:on set.

  • Production databases need to be available when building, as the build runs in prod profile. There is a good chance in most cases this just won't be possible.

I think that we will likely want an approach that just loads all the classes, rather than explicitly starting the application, as even though it generates a larger image it avoids the services problem.

Yeah, that's definitely the safer approach and for the dependencies it's easy. But what do we do about the JDK classes?

I also think we will want to generate this as part of image creation, so you know that you will always have the correct JVM version. I think this is easy enough with a Dockerfile, not sure how hard it will be for Jib.

Here is the thing about this approach: You need to have docker installed to make this work, in order to run the JVM of the built container image to generate the AppCDS and copy it out of the container.
That doesn't change with Jib, because with Jib all we can do is build the container image, we still need docker to actually run it so we can generate the AppCDS.

In conclusion I agree with you that the idea is interesting, but we do need to think about how to provide the best experience.

cc @maxandersen

@stuartwdouglas
Copy link
Member

Jdk classes should be mostly fine, as loading all the classes will run all the static init code.

@geoand
Copy link
Contributor Author

geoand commented Jun 2, 2020

Jdk classes should be mostly fine, as loading all the classes will run all the static init code.

I am certainly missing something, as I don't get you mean here :)

My question was how would we add the JDK classes to classes.lst. Do we just always add a bunch of classes we know will be used?

@stuartwdouglas
Copy link
Member

What I am talking about is:

  • Create a file at build time that lists all the classes in the runtime deps
  • Create a new main() method that loads the file, uses Class.forName() on every class, then exits
  • this Class.forName() will run the generated static init code, which will load most(all?) of the JDK classes we care about

@geoand
Copy link
Contributor Author

geoand commented Jun 2, 2020

Gotcha, thanks for the explanation

@geoand
Copy link
Contributor Author

geoand commented Jun 2, 2020

I have to prepare for a demo today and I'll most likely be off tomorrow, so I'll probably start looking at this again at the end of the week.

@loicmathieu
Copy link
Contributor

Starting with Java 13 this can be done in one path using dynamic CDS archive (no need to create the classlist upfront): http://openjdk.java.net/jeps/350

I do love the fact that it didn't depends on docker on anything else, but as Stuart said it could fail in a lot of way. We will also need a way to integrate it in our generated Dockerfile.

I need to check again why I use the uberjar in my blogpost, I'm sure the normal jar didn't worked for jlink but I'm not sure for AppCDS. Maybe there is an issue with the normal jar and AppCDS not detecting all classes due to missing libs in the classloader used by AppCDS to create the class list ...

@geoand
Copy link
Contributor Author

geoand commented Jun 2, 2020

Starting with Java 13 this can be done in one path using dynamic CDS archive (no need to create the classlist upfront): http://openjdk.java.net/jeps/350

Yeah. I was thinking thas if we get this in, then as an enhancement later on we could use this if Java 13+ is being used.

I do love the fact that it didn't depends on docker on anything else, but as Stuart said it could fail in a lot of way. We will also need a way to integrate it in our generated Dockerfile.

This is exactly why I did it this way - so users don't need the container image to take advantage of this - although there are a lot of caveats.

I need to check again why I use the uberjar in my blogpost, I'm sure the normal jar didn't worked for jlink but I'm not sure for AppCDS. Maybe there is an issue with the normal jar and AppCDS not detecting all classes due to missing libs in the classloader used by AppCDS to create the class list ...

I was able to make it make it work for all types of jars - although for fast-jar the speedup improvement was very small. For uber-jar and legacy-jar the speedup was very good.

@loicmathieu
Copy link
Contributor

although for fast-jar the speedup improvement was very small

This is what I fear, it means the classlist didn't contains all the needed classes. You can enable class loading logging to check this: -Xlog:class+load:file=cds.log.
Classes lodading from AppCDS will be logged like this:

[info][class,load] java.lang.Object source: shared objects file

@geoand
Copy link
Contributor Author

geoand commented Jun 2, 2020

although for fast-jar the speedup improvement was very small

This is what I fear, it means the classlist didn't contains all the needed classes. You can enable class loading logging to check this: -Xlog:class+load:file=cds.log.
Classes lodading from AppCDS will be logged like this:

[info][class,load] java.lang.Object source: shared objects file

Yeah, I was doing that when I was trying to get all types of jars to work. I just didn't have time to explore why fast-jar in particular did not see a proper speedup (even though the JVM was applying AppCDS). My guess is that it has something to do with the custom ClassLoader that fast-jar uses, but I could be completely off.

@geoand
Copy link
Contributor Author

geoand commented Jun 2, 2020

I just got this crazy idea (although it's not definitely not very user friendly):

What if we used the report of the used classes that GraalVM native-image outputs as the classes list? There is no way we are going to get a more accurate list of classes than that. If we did that, all we would have to do is feed that list into java -Xshare:dump ... and out comes the AppCDS file.

I admit that from a usability standpoint it's not great, but I wanted to put it out there to see what y'all think.

P.S. It would be nice if we didn't have to actually run the whole native-image build process, but instead only utilize it to do static analysis and spit out the used classes report.

@loicmathieu
Copy link
Contributor

@geoand native-image is definitly too slow and would include a lot of classes not really used by the application as it follow all possible code flow. Moreover, we are only interested on the class used during the bootstraping, classes used during the run of the application and not discovered by AppCDS will be loaded over time and it's OK as it will balance the AppCDS size and the bootstrap time.

@geoand
Copy link
Contributor Author

geoand commented Jun 3, 2020

@loicmathieu actually if native image analysis wasn't so slow and require extra tooling, I think it would be the perfect, because it discovers all the classes loaded in the static init phase - I don't see how you can get a better set of classes actually used at runtime without starting the application itself.

But I agree that the current slowness of the process is prohibitive... Which is why I wonder if there is a way for us to only run the static init phase up until the used classes report is generated

@geoand
Copy link
Contributor Author

geoand commented Jun 4, 2020

I'll implement @stuartwdouglas's proposal hopefully next week when done with PTO and I've taken care of some other higher priority loose ends.

@geoand
Copy link
Contributor Author

geoand commented Jun 16, 2020

I overhauled the PR. Now we don't need to start the application at all and the classes list is created by generating a classes.lst file from the combination of JDK's lib/classlist and looking at all the classes of the application and its dependencies.

This process doesn't provide any means of adding the generated AppCDS into a generated container as that requires building a container, running it (to make use of the exact Java version for -Xshare:dump -XX:SharedClassListFile=) to create the AppCDS and then generating another container once we have the AppCDS.
It is doable (with a Docker builder image it should be pretty easy), but I would prefer we first have the raw capability in and get some feedback.

WDYT @stuartwdouglas? If you agree, I'll add some tests and documentation and the PR out of draft.

@gunnarmorling
Copy link
Contributor

I overhauled the PR. Now we don't need to start the application at all and the classes list is created by generating a classes.lst file from the combination of JDK's lib/classlist and looking at all the classes of the application and its dependencies.

Sounds like a good approach to start with. We'd loose the ability to have different "profiles" for different kinds of workloads, but that seems acceptable. Also you'd probably miss things like Lambda proxy classes (as CDS-able as of JDK 15-ea-b27), but that seems acceptable, too.

@geoand
Copy link
Contributor Author

geoand commented Jun 16, 2020

I overhauled the PR. Now we don't need to start the application at all and the classes list is created by generating a classes.lst file from the combination of JDK's lib/classlist and looking at all the classes of the application and its dependencies.

Sounds like a good approach to start with. We'd loose the ability to have different "profiles" for different kinds of workloads, but that seems acceptable. Also you'd probably miss things like Lambda proxy classes (as CDS-able as of JDK 15-ea-b27), but that seems acceptable, too.

Yeah, it can and should definitely be improved in follow up iterations

@loicmathieu
Copy link
Contributor

With this approach you will create a real big classlist as you include all the classes from all the libraries ! You also inlude all the JDK classes.
I'm very concerned about this as it will take time to create the CDS archive and could have a huge impact on the CDS size, the RSS memory needed to load it and the startup time needed to load it.

Did you try it ? Do you know it's impact on disk size, RSS and startup time ?

@geoand
Copy link
Contributor Author

geoand commented Jun 16, 2020

With this approach you will create a real big classlist as you include all the classes from all the libraries ! You also inlude all the JDK classes.

Yup, we definitely want the JDK classes and the dependencies. Short of running the application (which we don't want to do), there is no other way of getting the list of classes.

I'm very concerned about this as it will take time to create the CDS archive and could have a huge impact on the CDS size, the RSS memory needed to load it and the startup time needed to load it.

Did you try it ? Do you know it's impact on disk size, RSS and startup time ?

I did try and and it worked well for what I tested. I would invite you to test this on the application you have in mind and see for yourself :)

@loicmathieu
Copy link
Contributor

I did try and and it worked well for what I tested. I would invite you to test this on the application you have in mind and see for yourself :)

The question is not if it works but the overhead of disk size and memory consumption reagrding an implementation that build the classlist based on the application run.

I'll try to test it Friday, I should have some spare time ...

@loicmathieu
Copy link
Contributor

You can give it less heap space, it will then just GC early on.

Yes, but RSS is not ony heap size, I don't know how the CDS archive is read but it can uses native memory to read it so it will impact RSS size and not a lot of heap size ...

@geoand geoand force-pushed the app-cds branch 2 times, most recently from 4fadfb9 to 1ba53d1 Compare June 18, 2020 07:45
@geoand
Copy link
Contributor Author

geoand commented Jun 18, 2020

@stuartwdouglas since we are getting close to the cut for 1.6, I would be really grateful if you could weigh in on whether the latest implementation can be added (as experimental).

Thanks

Copy link
Member

@stuartwdouglas stuartwdouglas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@geoand geoand marked this pull request as ready for review June 19, 2020 05:48
@geoand
Copy link
Contributor Author

geoand commented Jun 19, 2020

PR updated per review comments. I also added a test.

I still need to write some docs, but I'll do that early next week.

@geoand geoand added triage/waiting-for-ci Ready to merge when CI successfully finishes release/noteworthy-feature labels Jun 19, 2020
@geoand geoand force-pushed the app-cds branch 3 times, most recently from 8484395 to c17581a Compare June 19, 2020 11:23
@boring-cyborg boring-cyborg bot added area/platform Issues related to definition and interaction with Quarkus Platform area/rest-client labels Jun 19, 2020
gsmet
gsmet previously requested changes Jun 19, 2020
Copy link
Member

@gsmet gsmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some proofreading.

return;
}

log.debug("'" + CLASSES_LIST_FILE_NAME + "' successfully created.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debugf maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

@geoand geoand dismissed gsmet’s stale review June 19, 2020 11:48

comments addressed

@gsmet
Copy link
Member

gsmet commented Jun 19, 2020

Could you write a paragraph for the release notes about what it brings to the plate and how to use it? Just paste it in a comment. Thanks!

@geoand
Copy link
Contributor Author

geoand commented Jun 19, 2020

Quarkus 1.6 brings experimental support for [Application Class-Data Sharing (AppCDS)](http://openjdk.java.net/jeps/310). 
AppCDS essentially allow an application to start faster by reducing the work the JVM needs to perform at startup time and Quarkus can now automatically create the AppCDS file as part of the build by simply adding `quarkus.package.create-appcds=true`.
There are various limitations the JVM imposes on the use of AppCDS, so make sure to read our guide before using this feature.

cc @gsmet

@geoand geoand merged commit 070ad08 into quarkusio:master Jun 19, 2020
@geoand geoand deleted the app-cds branch June 19, 2020 14:25
@geoand geoand removed area/rest-client area/maven area/platform Issues related to definition and interaction with Quarkus Platform area/devtools Issues/PR related to maven, gradle, platform and cli tooling/plugins labels Jun 19, 2020
@gsmet gsmet added this to the 1.6.0 - master milestone Jun 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core release/noteworthy-feature triage/waiting-for-ci Ready to merge when CI successfully finishes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants