Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore code generation for language adaptors #204

Closed
benjchristensen opened this issue Mar 21, 2013 · 11 comments
Closed

Explore code generation for language adaptors #204

benjchristensen opened this issue Mar 21, 2013 · 11 comments
Assignees
Milestone

Comments

@benjchristensen
Copy link
Member

RxJava was written from the beginning to target the JVM, not any specific language.

As a side-effect of Java not having lambdas/clojures yet (and other considerations), Netflix used dynamic languages with it predominantly for the year of its existence prior to open sourcing.

To bridge the rxjava-core written in Java with the various languages a FunctionalLanguageAdaptor is registered at runtime for each language of interest.

To enable this language adaptors method overloads with Object exist in the API since Object is the only super-type that works across all languages for their various implementations of lambdas and closures.

This downside of this is that it breaks static typing for Java, Scala and other static-typed languages. More can be read on this issue and discussion of the subject here: https://groups.google.com/forum/#!topic/rxjava/bVZoKSsb1-o

I would like to pursue the following:

  • remove all Object overload methods from rxjava-core so it remains statically typed
  • modify FunctionalLanguageAdaptors to not register with a lookup map but instead to trigger a code generation process at registration time
  • the code generation would create the overload methods on rx.Observable for the types needed by that languages (such as groovy.lang.Closure or clojure.lang.IFn).

The benefits of this are:

  1. Everything is statically typed so compile-time checks for Java, Scala, etc work correctly
  2. Method dispatch is now done via native Java bytecode using types rather than going via Object which then has to do a lookup in a map. Memoization helped with the performance but each method invocation still required looking in a map for the correct adaptor. With this approach the appropriate methods will be compiled into the rx.Observable class to correctly invoke the right adaptor without lookups.

The byte code generation step would happen once the first time a LanguageAdaptor is registered which is normally only at the first point of access to RxJava.

@ghost ghost assigned benjchristensen Mar 21, 2013
@benjchristensen
Copy link
Member Author

Some background on why the current design was chosen rather than having each language with a separate version of Observable:


The approach of having language specific packages/classes was pursued but did not work well because Rx is a composable library. It means that every time an Observable is used it needs to be re-wrapped or un-wrapped by whichever language is using it.

For example ...

From Java a library is exposed that has a method like this:

rx.Observable getData()

From Groovy a library is exposed with a method like:

rx.groovy.GroovyObservable getOtherData()

Then from Scala you need to wrap them again:

rx.scala.ScalaObservable.from(getOtherData())

This means we have an rx.Observable wrapped as rx.groovy.GroovyObservable wrapped as rx.scala.ScalaObservable.

To compose the two we would have:

rx.scala.ScalaObservable.zip(rx.scala.ScalaObservable.from(
      getOtherData()), 
      rx.scala.ScalaObservable.from(getData()),
       ... scala closure here ...);

Now what does ScalaObservable return from its operators? ScalaObservable or Observable?

Should the above zip operator return rx.scala.ScalaObservable or rx.Observable? What happens if this library is consumed from another language?

If Observable each step along the way it must be wrapped yet again. If ScalaObservable it has now changed all of the return types of rx.Observable to a subtype.

In short, for interop between languages it very quickly becomes a mess and our primary polyglot goal was that rx.Observable was usable across all libraries as the single type and because the whole point of Rx is chained composition it's not as simple as just a single decoration at the beginning. It affects every single method in an API and step of the chaining.

For this reason we chose the current language-adaptor model where it is made to understand the closure/lambda types of each of the languages so rx.Observable can remain the sole interface across languages.

The reason we did not use an interface such as FuntionalLanguageAdaptor instead of Object is that not all languages automatically reify to "functional interfaces". Java8 does - but it's already type-safe and works fine with the Func/Action interfaces. Clojure, Groovy, JRuby and Scala all require awkward non-idiomatic java-interop or interface implementation syntax instead of their native lambda/closure syntax - thus Object is unfortunately the first super-type they all have in common - especially to handle the different arities (Func1, Func2, Func3 etc) that can be passed in.

@benjchristensen
Copy link
Member Author

Playing with this the 3 approaches I can see are:

  1. Java agent (can be loaded dynamically without manually injecting -javaagent at command line) to use the Java Attach API to intercept rx.Observable being loaded and do bytecode generation
  2. Custom classloader to intercept rx.Observable being loaded and do bytecode generation
  3. Compile time step to perform bytecode generation after compiling rx.Observable and replace Observable.class with the new typed methods for each language before packaging as a Jar

Options 1 and 2 are nice in that they only add the methods needed by the desired languages at runtime, but they are bad in that they both need an initialization hook - something that is a lot to ask for a plain library, not a full framework controlling entire lifecycle so I don't think they are appropriate here.

Option 3 would include at compile time all of the methods needed for each language. This means that all runtimes would have that bytecode but if they don't have a particular language runtime loaded those methods could never be invoked.

The benefit is that all languages would use static types for method dispatch. The downside is that all the overloads would always exist even if someone is using just one of the languages.

A possible solution would be to generate different versions of the binary for each language - which would work fine if an environment is using only one language, but if more than one is being used it would be very confusing because someone couldn't just load the two separate binaries, they'd collide in the same namespace. A single binary with support for each language would need to be used.

I think the drawbacks of having all overloads for all languages in a single binary are small enough that it's worth pursuing to see how it works.

The only awkward drawback I know of that can't easily be overcome is that while working on the RxJava source itself in an IDE and running Groovy/Clojure/JRuby etc it would be hitting directly compiled classes, not the binaries put through bytecode generation so they would fail. It would mean that tests can only be run from the full build process, or by supporting option 1 or 2 for the development cycle while in an IDE. This impacts a limited number of people though so is not a big enough reason to avoid pursuing the bigger benefits of maintain static typing and avoiding use of instanceof for dynamically determining what language adaptor to use to execute a function.

This is very exploratory and something I'm researching as I go (I generally do not like code generation or bytecode manipulation so have not used this approach before) so feedback is much appreciated.

@jmhofer
Copy link
Contributor

jmhofer commented Mar 24, 2013

If you're going with option 3, why not directly add these methods in the source, instead of adding them after compilation? - Probably I'm missing something here.

Anyway, instead of overloading the method with a single one where closures are mapped to java.lang.Object, you could add a method for each supported JVM language closure type. Let's take subscribe as an example.

Instead of

public Subscription subscribe(final Object o)

for all dynamic languages, you could introduce

public Subscription subscribe(final groovy.lang.Closure<?> cls)

and

public Subscription subscribe(final org.jruby.RubyProc proc)

and

public Subscription subscribe(final clojure.lang.IFn cls)

for each dynamic language.

It wouldn't even be absolutely necessary to support Scala directly, as Scala can always hook itself in via implicits.

Now, undoubtedly that's a lot of work, but I don't think you can get around that anyway. - But probably I'm overlooking something which makes this impossible? - The only big drawback I can think of atm is that it's not possible in that way to add another JVM-based language without touching the core. This doesn't worry me much, but maybe that's just me.

Also, there would be another "easy way out". You probably already thought of that. It's not pretty, though: Keep the methods taking objects, but instead of overloading the typesafe methods, name those differently. For example, pair

public Subscription subscribe(final Object o)

with

public Subscription typedSubscribe(final Action1<T> onNext)

Though it's certainly not ideal, I could live with that...

@benjchristensen
Copy link
Member Author

We could add all of the overloads directly to the source - but we're talking about 48 methods already and eventually probably double that and then multiplying that by the number of languages.

It makes the javadoc, source code and maintenance rather ridiculous and it becomes very error prone that every time a method is added or touched the overload for every language must also be done. It would severely impede people getting involved and touching the Observable.java class.

For these reasons I feel it's not a great approach - though it is a valid one.

I also strongly want to avoid different names - as that breaks the API across languages and doesn't really solve the real problem as people now need to know the different naming conventions in order to stay "type safe".

The amount of code to generate the methods is actually quite small, here's a prototype of me playing with it:

public class LanguageAdaptorCodeGenerator {

    static ClassPool pool = ClassPool.getDefault();

    public static void addSupportFor(Class<?>[] types) {
        CtClass cc;
        try {
            cc = pool.get(rx.Observable.class.getName());
        } catch (NotFoundException e) {
            throw new RuntimeException("Failed to add language adaptor methods as could not find rx.Observable.", e);
        }
        try {
            for (CtMethod method : cc.getMethods()) {
                CtClass[] argTypes = method.getParameterTypes();
                boolean containsFunctionType = false;
                for (CtClass argType : argTypes) {
                    if (isRxFunctionType(argType)) {
                        containsFunctionType = true;
                    }
                }
                if (containsFunctionType) {
                    try {
                        addMethod(cc, method, types);
                    } catch (Exception e) {
                        e.printStackTrace();
                        throw new RuntimeException("Failed to add language adaptor method: " + method.getName(), e);
                    }
                }
            }

            // will need to output to filesystem here
            //cc.toClass();
        } catch (Exception e) {
            e.printStackTrace();
            throw new RuntimeException("Failed to add language adaptor methods.", e);
        }
    }

    private static boolean isRxFunctionType(CtClass type) throws Exception {
        // look for the Function argument
        if (type.getName().equals(Function.class.getName())) {
            return true;
        }
        // look for arguments implementing Function
        for (CtClass implementedInterface : type.getInterfaces()) {
            if (implementedInterface.getName().equals(Function.class.getName())) {
                return true;
            }
        }
        return false;
    }

    private static void addMethod(CtClass ctClass, CtMethod method, Class<?>[] types) throws Exception {
        for (Class<?> cls : types) {
            ArrayList<CtClass> parameters = new ArrayList<CtClass>();
            CtClass[] argTypes = method.getParameterTypes();

            System.out.print(">>>>>> method: " + method.getName() + " args: ");
            for (CtClass argType : argTypes) {
                System.out.print(argType.getName() + " ");
                if (isRxFunctionType(argType)) {
                    // needs conversion
                    parameters.add(pool.get(cls.getName()));
                } else {
                    // no conversion, copy through
                    parameters.add(argType);
                }
            }
            System.out.println("");

            CtClass[] newArgs = parameters.toArray(new CtClass[parameters.size()]);
            String params = "";
            for (CtClass a : newArgs) {
                params += a.getName() + " ";
            }
            System.out.println("Add method => " + method.getName() + " => " + params + "  --> with arg type converted to: " + cls.getName());
            CtMethod newMethod = new CtMethod(method.getReturnType(), method.getName(), newArgs, ctClass);
            newMethod.setModifiers(method.getModifiers());
            newMethod.setBody("{ System.out.println(\"hello world!\");return null; }");
            ctClass.addMethod(newMethod);
        }
    }
}

Here are the methods that get overloads (for just Groovy in this output):

Add method => create => groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => defer => groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => filter => groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => filter => rx.Observable groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => forEach => groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => groupBy => groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => groupBy => groovy.lang.Closure groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => groupBy => rx.Observable groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => groupBy => rx.Observable groovy.lang.Closure groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => last => groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => last => rx.Observable groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => lastOrDefault => java.lang.Object groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => lastOrDefault => rx.Observable java.lang.Object groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => map => groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => map => rx.Observable groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => mapMany => groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => mapMany => rx.Observable groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => onErrorResumeNext => groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => onErrorResumeNext => rx.Observable groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => onErrorReturn => groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => onErrorReturn => rx.Observable groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => reduce => groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => reduce => java.lang.Object groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => reduce => rx.Observable groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => reduce => rx.Observable java.lang.Object groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => scan => groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => scan => java.lang.Object groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => scan => rx.Observable groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => scan => rx.Observable java.lang.Object groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => sequenceEqual => rx.Observable rx.Observable groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => single => groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => single => rx.Observable groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => singleOrDefault => java.lang.Object groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => singleOrDefault => rx.Observable java.lang.Object groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => subscribe => groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => subscribe => groovy.lang.Closure groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => subscribe => groovy.lang.Closure groovy.lang.Closure groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => takeWhile => groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => takeWhile => rx.Observable groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => takeWhileWithIndex => groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => takeWhileWithIndex => rx.Observable groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => toSortedList => groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => toSortedList => rx.Observable groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => where => groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => where => rx.Observable groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => zip => rx.Observable rx.Observable groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => zip => rx.Observable rx.Observable rx.Observable groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure
Add method => zip => rx.Observable rx.Observable rx.Observable rx.Observable groovy.lang.Closure   --> with arg type converted to: groovy.lang.Closure

I haven't yet made the method body do the right thing (it's printing hello world right now), but it's basically taking what is already in the language adaptors and just inlining it.

The part I haven't figured out yet is hooking into the Gradle build process - but it's just busy work I need to spend time on.

@jmhofer
Copy link
Contributor

jmhofer commented Mar 25, 2013

Awesome. - I expected a lot more work to be necessary to generate those additional methods. But this looks quite manageable.

You're right, the code generation approach looks better in this case than manually adding all those methods.

@benjchristensen
Copy link
Member Author

Another option a colleague suggested is to generate the source code before compilation instead of doing bytecode manipulation after. Generating source code isn't as "clean" as using Javassist like that code above, but the benefit would be that source code is then shipped that matches the bytecode so that stacktraces and clicking through to source goes to the right place when debugging.

That may be a strong enough value to do source code generation rather than bytecode generation.

mattrjacobs added a commit to mattrjacobs/RxJava that referenced this issue Mar 28, 2013
 - This allows for type safety in statically-typed languages
 - This prevents dynamically-typed languages from hooking into rxjava-core.
 -- See ReactiveX#204 for details on code generation for dynamic languages

 * Added Scala implicits into rx.lang.scala.RxImplicits
 * Added tests of most methods on Observable using Scala functions
 * Fixed Scala Gradle/ScalaTest build
@mattrjacobs
Copy link
Contributor

I've got some work-in-progress up on my fork at: mattrjacobs@f5f84a8. The commit log describes what's been done and what's left to do before I'll feel comfortable submitting it.

@samhendley
Copy link

Would it be possible to use the @deprecated annotation for the Object versions? This would cause java/scala to atleast warn when we are accidentally using the Object overload but might not translate out to the dynamic languages.

@benjchristensen
Copy link
Member Author

We are getting close to eliminating these methods (week or two more at most
I hope). I was planning on providing an update on it tonight or tomorrow to
give people a heads up.

On Tue, Jul 2, 2013 at 9:55 PM, samhendley [email protected] wrote:

Would it be possible to use the @Deprecatedhttps://github.com/Deprecatedannotation for the Object versions? This would cause java/scala to atleast
warn when we are accidentally using the Object overload but might not
translate out to the dynamic languages.


Reply to this email directly or view it on GitHubhttps://github.com//issues/204#issuecomment-20395690
.

Ben Christensen - API Team
+1-310-781-5511 @benjchristensen

@benjchristensen
Copy link
Member Author

After implementing and throwing away a few different approaches we have landed on a solution we feel will balance the various competing priorities.

Our goals are:

  • support static typing for Java/Scala/Kotlin etc by removing the Object overloads
  • support any JVM language, static or dynamically typed
  • allow all languages to use the same rx.Observable class so that we don't divide libraries with helpers such as GroovyObservable, ClojureObservable etc that then need to be converted back and forth when doing interop
  • do not require special classloaders or agents to enable runtime bytecode generation
  • do not remove static operators to enable proxying
  • small jars and limited or no dependencies

The solution we have arrived at will work as follows:

  • The rxjava-core source code will delete all Object overload methods and be pure static java.
    • Any language that supports functional interfaces directly (such as Java 8 and XTend) can use the Java core version directly.
  • Languages needing specific lambda/clojure type mapping to the Func_/Action_ types will have language specific Jars created via build-time bytecode generation.
    • Any method with a Func_/Action_ argument will be overloaded with a version supporting the language requirements.

For example:

The default Java version:

public static <T> Observable<T> filter(Observable<T> that, Func1<T, Boolean> predicate)

A Groovy version:

public static <T> Observable<T> filter(Observable<T> that, groovy.lang.Closure predicate)
  • A jar per language will be created as follows:
    • rxjava-x-y-z.jar
    • rxjava-groovy-x-y-z.jar
    • rxjava-clojure-x-y-z.jar
    • rxjava-scala-x-y-z.jar
    • rxjava-jruby-x-y-z.jar
    • rxjava-kotlin-x-y-z.jar

A project will include just the jar that meets their language needs, there will no longer be a "core" jar plus the language adaptor.

The drawback of this is that mixing two of these in a classpath will result in non-deterministic loading (whichever is loaded last wins) and that is the version that will be used. This means if a library depends on rxjava.jar but is using Groovy and needs rxjava-groovy.jar it is up to the developer of that project to make sure they have only the rxjava-groovy.jar version. This is not ideal but is a one-time pain setting up a build and is better than the constant pain of missing static typing or converting to/from different Observable implementations for different languages.

  • At this time we are optimizing for projects using a single language or Java + another language. If there are use cases where people are trying to mix multiple languages in a very polyglot manner we have two options:
    • include an rxjava-dynamic.jar version that re-adds the Object overloads
    • include build configs for common combinations of languages such as rxjava-groovy-clojure.jar
  • Language adaptations (such as clojure which has preferred idioms that necessitate wrapping) will still be possible through the language-adaptor projects and be included in the appropriate language jars.

This should not break any code but will require a slight change to the build dependencies in your project when we release this.

We hope that this enables the RxJava project to better achieve its goal of being polyglot and targeting the JVM in general and not any specific languages without sacrificing interop or idiomatic usage in each of the languages.

@benjchristensen
Copy link
Member Author

Byte code generation ended up not being used. We instead went with the solution in #323

rickbw pushed a commit to rickbw/RxJava that referenced this issue Jan 9, 2014
benjchristensen added a commit to ReactiveX/RxGroovy that referenced this issue Aug 19, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants