Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Supply classpaths to java using a pathing jar, to avoid too long command lines. #25582

Closed
1 of 15 tasks
lostluck opened this issue Feb 21, 2023 · 1 comment · Fixed by #26087
Closed
1 of 15 tasks

Comments

@lostluck
Copy link
Contributor

What would you like to happen?

Occasionally a Java SDK worker harness is unable to boot up due to extremely long command line arguments in the form of the list files to include on the classpath.

https://github.com/apache/beam/blob/master/sdks/java/container/boot.go#L236

This can be difficult for end users to work around since they are generally unaware of dependant jars and similar, depending on pipeline construction.

A solution exists by using another jar, known as a pathing jar, which simply refers to the desired contents of the class path. This solution is unfortunately required vs the more updated solution of an @argfile parameter for starting the JVM. However the beam container needs to support back to Java 8, and @argfile wasn't introduced until Java 9.

Since this would need to be authored in Go in order to build the pathing jar with the contents of the provisioned manifest. Unfortunately, there doesn't seem to be existing support for this as a Go package. https://pkg.go.dev/search?q=jar+java&m= the tools are largely about reading Jars rather than creating them.

Fortunately, the Jar specification is relatively simple, as it's generally a Zip file with a Meta-INF directory.

Authoring zip files in Go is robustly supported in the Go standard library: https://pkg.go.dev/archive/zip.

The proposal is to build such a pathing jar in memory from the existing local artifacts, write it out, and then use that as the single class path parameter when invoking java.


For reference:

Gradle itself has support for building pathing jars.

https://github.com/gradle/gradle/pull/10544/files#diff-bda9c25c55281a1f596c7e7892ce79631e74ac6eef32fe06fef664da23759c62R349

https://stackoverflow.com/questions/5434482/how-can-i-create-a-pathing-jar-in-gradle

Java natively can create jars...
https://docs.oracle.com/javase/7/docs/api/java/util/jar/JarOutputStream.html#JarOutputStream(java.io.OutputStream)

Apparently a "pathing jar" requires the files listed to be relative to the location of the jar. That's not too bad. It's a matter of creating the appropriate manifest in the Go boot script however (if not on the sservice side)

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@lostluck
Copy link
Contributor Author

I'm on vacation until mid march, so if this blocks you the current solution is build uber jars and use those as your dependencies.

eg. From Stack Overflow
https://stackoverflow.com/questions/52208667/create-an-uber-jar-for-dataflow-and-apache-beam

I have no idea about uberjars generally. Don't ask me.
I'm the Go guy and since the containers are in Go, I get the fun task of synthesizing a pathing jar in the harness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants