-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Anonymous classes in inline code are loaded too late for serialization #104
Comments
Note that this problem isn't specific to inline-java: it's a problem for any use of JNI's |
Indeed, any preference? |
Since as noted above this is an upstream issue with Spark itself, I have a preference for keeping any workaround in sparkle. We could remove the workaround once the ticket you mention above is resolved. Inline code is just stubs and these stubs are best kept in the executable itself. No one other than the executable should see these stubs, nor should they be able to call them. And that way we don't need to parameterize the The JIRA ticket you mention includes comments from several folks who successfully hooked into |
I didn't try it. Apparently we need to
I don't see how this can be parameterized by the serializer that spark currently uses. We might have to define a different wrapper for each serializer we ever want to use. |
Just like in straight Java, it's perfectly legal in an inline-java quasiquote to create an object of anonymous class. The problem is, such an object can't be deserialized from any process that hasn't yet loaded the wappers for all quasiquotes, since it is the wrappers that "define" the anonymous class. Spark executors can be given a task by the Spark driver that includes such anonymous objects. Without the InlineJavaRegistrator provided here, it is not possible to guarantee that the inline-java wrappers have been loaded *prior* to the task being deserialized. The solution here consists in choosing the Kryo serializer. It's much faster than the default `JavaSerializer` that Spark uses anyways. `KryoSerializer` provides a crucial facility that `JavaSerializer` does not: class registration. Spark furthermore defines "registrator" classes that when invoked perform class registration, or indeed any arbitrary action. We provide an `InlineJavaRegistrator` to inline-java users, which abuses class registration to first load all wrappers. This happens on all executors prior to any work being performed. Fixes #104.
Just like in straight Java, it's perfectly legal in an inline-java quasiquote to create an object of anonymous class. The problem is, such an object can't be deserialized from any process that hasn't yet loaded the wappers for all quasiquotes, since it is the wrappers that "define" the anonymous class. Spark executors can be given a task by the Spark driver that includes such anonymous objects. Without the InlineJavaRegistrator provided here, it is not possible to guarantee that the inline-java wrappers have been loaded *prior* to the task being deserialized. The solution here consists in choosing the Kryo serializer. It's much faster than the default `JavaSerializer` that Spark uses anyways. `KryoSerializer` provides a crucial facility that `JavaSerializer` does not: class registration. Spark furthermore defines "registrator" classes that when invoked perform class registration, or indeed any arbitrary action. We provide an `InlineJavaRegistrator` to inline-java users, which abuses class registration to first load all wrappers. This happens on all executors prior to any work being performed. Fixes #104.
Just like in straight Java, it's perfectly legal in an inline-java quasiquote to create an object of anonymous class. The problem is, such an object can't be deserialized from any process that hasn't yet loaded the wappers for all quasiquotes, since it is the wrappers that "define" the anonymous class. Spark executors can be given a task by the Spark driver that includes such anonymous objects. Without the InlineJavaRegistrator provided here, it is not possible to guarantee that the inline-java wrappers have been loaded *prior* to the task being deserialized. The solution here consists in choosing the Kryo serializer. It's much faster than the default `JavaSerializer` that Spark uses anyways. `KryoSerializer` provides a crucial facility that `JavaSerializer` does not: class registration. Spark furthermore defines "registrator" classes that when invoked perform class registration, or indeed any arbitrary action. We provide an `InlineJavaRegistrator` to inline-java users, which abuses class registration to first load all wrappers. This happens on all executors prior to any work being performed. Fixes #104.
This is not quite as usable as sparkle users would need. When the classes that |
Could we please not unearth old issues from the dead and instead create a new one? |
[java| $rdd.map(new Function<Object,Object>() { Object call(Object x){ return x;} }) |]
doesn't work on multi-node setups.
The problem is that an executor receives the serialized object
new Function<Object,Object>() { Object call(Object x){ return x;} }
, and in order to deserialize it, it needs to load the anonymous class to which it belongs.The executor then notices that no jar and no class in the classpath contains the class definition and therefore it fails. Where is the class then? Currently inline-java embeds the bytecode in the Haskell executable. The embedded bytecode is sent to the JVM at runtime by
Language.Java.Inline.loadJavaWrappers
. But this function is never called on executors.The ideal fix would be for spark to provide some startup hooks, so
Language.Java.Inline.loadJavaWrappers
can be called when the executor starts. But this feature is not implemented.Calling
loadJavaWrappers
when sparkle loads is no good, because upon receiving the serialized object, the executor has no clue that it needs to load sparkle in order to have the class defined.The only workaround I've found so far, is to dump the
.class
files that inline-java produces into a folder and add them to the sparkle application jar. tweag/inline-java#62Any preferences on how to better deal with this?
The text was updated successfully, but these errors were encountered: