Anonymous classes in inline code are loaded too late for serialization #104

facundominguez · 2017-04-11T19:11:21Z

[java| $rdd.map(new Function<Object,Object>() { Object call(Object x){ return x;} }) |]

doesn't work on multi-node setups.

The problem is that an executor receives the serialized object new Function<Object,Object>() { Object call(Object x){ return x;} }, and in order to deserialize it, it needs to load the anonymous class to which it belongs.

The executor then notices that no jar and no class in the classpath contains the class definition and therefore it fails. Where is the class then? Currently inline-java embeds the bytecode in the Haskell executable. The embedded bytecode is sent to the JVM at runtime by Language.Java.Inline.loadJavaWrappers. But this function is never called on executors.

The ideal fix would be for spark to provide some startup hooks, so Language.Java.Inline.loadJavaWrappers can be called when the executor starts. But this feature is not implemented.

Calling loadJavaWrappers when sparkle loads is no good, because upon receiving the serialized object, the executor has no clue that it needs to load sparkle in order to have the class defined.

The only workaround I've found so far, is to dump the .class files that inline-java produces into a folder and add them to the sparkle application jar. tweag/inline-java#62

Any preferences on how to better deal with this?

The text was updated successfully, but these errors were encountered:

mboes · 2017-04-11T19:49:44Z

Note that this problem isn't specific to inline-java: it's a problem for any use of JNI's defineClass, which Spark currently provides no way of performing preemptively at initialization time. It sounds to me like this is an upstream issue. Which we could work around in various ways in inline-java for that particular special case.

facundominguez · 2017-04-11T20:05:05Z

Which we could work around in various ways in inline-java for that particular special case.

Indeed, any preference?

mboes · 2017-04-12T09:19:46Z

Since as noted above this is an upstream issue with Spark itself, I have a preference for keeping any workaround in sparkle. We could remove the workaround once the ticket you mention above is resolved. Inline code is just stubs and these stubs are best kept in the executable itself. No one other than the executable should see these stubs, nor should they be able to call them. And that way we don't need to parameterize the java QQ with a gazillion (aka 1-3) options whose combinations are hard to test exhaustively.

The JIRA ticket you mention includes comments from several folks who successfully hooked into JavaSerializer. We could call loadJavaWrappers once (or all the time), from the serializer. Did the "epic struggle" you mention in inline-java#62 include that already?

facundominguez · 2017-04-12T12:05:56Z

I didn't try it. Apparently we need to

Extend org.apache.spark.Serializer. This is a wrapper that will load sparkle in a static block and will forward calls to the appropriate serializer.
Set our instance with sparkConf.set("spark.serializer","our.serializer.class.name")

I don't see how this can be parameterized by the serializer that spark currently uses. We might have to define a different wrapper for each serializer we ever want to use.

Just like in straight Java, it's perfectly legal in an inline-java quasiquote to create an object of anonymous class. The problem is, such an object can't be deserialized from any process that hasn't yet loaded the wappers for all quasiquotes, since it is the wrappers that "define" the anonymous class. Spark executors can be given a task by the Spark driver that includes such anonymous objects. Without the InlineJavaRegistrator provided here, it is not possible to guarantee that the inline-java wrappers have been loaded *prior* to the task being deserialized. The solution here consists in choosing the Kryo serializer. It's much faster than the default `JavaSerializer` that Spark uses anyways. `KryoSerializer` provides a crucial facility that `JavaSerializer` does not: class registration. Spark furthermore defines "registrator" classes that when invoked perform class registration, or indeed any arbitrary action. We provide an `InlineJavaRegistrator` to inline-java users, which abuses class registration to first load all wrappers. This happens on all executors prior to any work being performed. Fixes #104.

facundominguez · 2017-04-26T15:57:16Z

This is not quite as usable as sparkle users would need. When the classes that loadJavaWrappers loads depend on classes in sparkle.jar, the class loader can't find them when loadJavaWrappers is invoked in InlineJavaRegistrator.java.

mboes · 2017-04-26T16:00:41Z

Could we please not unearth old issues from the dead and instead create a new one?

facundominguez mentioned this issue Apr 11, 2017

inline-java: Allow tweaking aspects of code generation. (#29) tweag/inline-java#62

Closed

mboes changed the title ~~inline-java is difficult to use with sparkle.~~ Anonymous classes in inline code are loaded too late for serialization Apr 11, 2017

mboes mentioned this issue Apr 16, 2017

Support shipping anonymous inline-java objects. #106

Merged

mboes closed this as completed in #106 Apr 17, 2017

facundominguez reopened this Apr 26, 2017

facundominguez mentioned this issue Apr 26, 2017

Anonymous classes defined with inline-java cannot refer to sparkle.jar #108

Closed

facundominguez closed this as completed Apr 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anonymous classes in inline code are loaded too late for serialization #104

Anonymous classes in inline code are loaded too late for serialization #104

facundominguez commented Apr 11, 2017 •

edited

Loading

mboes commented Apr 11, 2017

facundominguez commented Apr 11, 2017

mboes commented Apr 12, 2017

facundominguez commented Apr 12, 2017 •

edited

Loading

facundominguez commented Apr 26, 2017 •

edited

Loading

mboes commented Apr 26, 2017

Anonymous classes in inline code are loaded too late for serialization #104

Anonymous classes in inline code are loaded too late for serialization #104

Comments

facundominguez commented Apr 11, 2017 • edited Loading

mboes commented Apr 11, 2017

facundominguez commented Apr 11, 2017

mboes commented Apr 12, 2017

facundominguez commented Apr 12, 2017 • edited Loading

facundominguez commented Apr 26, 2017 • edited Loading

mboes commented Apr 26, 2017

facundominguez commented Apr 11, 2017 •

edited

Loading

facundominguez commented Apr 12, 2017 •

edited

Loading

facundominguez commented Apr 26, 2017 •

edited

Loading