[MXNET-984] Add Java NDArray and introduce Java Operator Builder class #12772

lanking520 · 2018-10-09T18:36:14Z

Description

This is a PR based on @andrewfayres PR: #12757. Please merge the previous one first and then this one.

In this PR, Java NDArray were introduced as well as a brand new Builder Operators. Deprecated operators were removed from Java API Macros. Please see the unit test for more information.
@andrewfayres @nswamy @yzhliu @gigasquid

Special thanks for @zachgk @piyushghai for guidance and support on Java new API!

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

marcoabreu · 2018-10-09T22:06:13Z

Awesome contribution! Thanks a lot :)

Do you think it's possible to introduce test coverage early on so we can track improvements early on?

lanking520 · 2018-10-09T22:12:52Z

Awesome contribution! Thanks a lot :)

Do you think it's possible to introduce test coverage early on so we can track improvements early on?

Hi @marcoabreu, there is unit test covered on Java side to make sure the handmade operators and generated operators are tested. Unfortunately, we cannot have full unit test coverage for generated operators since there are 200+ and already tested on Python end. I would be grateful if you could request more tests if you find some part of the code is not covered.

marcoabreu · 2018-10-09T22:32:26Z

Sure, I fully understand and agree. I was thinking about targeted coverage measurement for the Java API. The Backend calls would be mocked and we would validate that the passthrough is properly working. The actual logic is still tested under scala.

The idea is to detect if some changes on the scala side broke the compatibility or behaviour of the Java API.

What do you think?

andrewfayres

Minor changes but other that that LGTM

andrewfayres · 2018-10-11T08:21:21Z

scala-package/core/src/main/scala/org/apache/mxnet/javaapi/DType.scala

@@ -24,4 +24,6 @@ object DType extends Enumeration {
  val UInt8 = org.apache.mxnet.DType.UInt8
  val Int32 = org.apache.mxnet.DType.Int32
  val Unknown = org.apache.mxnet.DType.Unknown
+
+  private[mxnet] def numOfBytes(dType: DType): Int = org.apache.mxnet.DType.numOfBytes(dType)


This looks like an unused private method. I think we can get rid of it.

andrewfayres · 2018-10-11T08:40:41Z

scala-package/scalastyle-config.xml

@@ -64,7 +64,7 @@ You can also disable only one rule, by specifying its rule id, as specified in:

  <check level="error" class="org.scalastyle.file.FileLineLengthChecker" enabled="true">
    <parameters>
-      <parameter name="maxLineLength"><![CDATA[100]]></parameter>
+      <parameter name="maxLineLength"><![CDATA[132]]></parameter>


Undo this. I know you picked it up from my change but Naveen wanted us to have this discussion on the dev list before changing it.

andrewfayres · 2018-10-11T08:48:33Z

I like the idea of getting a test coverage measurement working early on the java api. I think that this is something we can add to our plan. Our current testing strategy is that the Java unit tests will be aimed at testing the compatibility of the java/scala interaction and all behavior testing will occur in the Scala side. With that said, the current Scala testing is lacking in some areas and needs be be improved.

andrewfayres

We should aim for having unit tests for all of the methods in the non-generated NDArray code. Given that it's just a wrapper, in early stages, and there are other things waiting on this I won't block the PR for it but we should do a follow up with better test coverage. Other than that LGTM

yzhliu

I haven't look into the details, but a high-level comment:
I feel it is better to follow Spark's approach, name it as JavaNDArray or something, and provide toNDArray(), so that we don't need to duplicate every class in Java - each time an API needs a NDArray, Java users can use JavaNDArray to operate and convert to Scala's NDArray.

And I think Scala's Context, DType, Shape, IO can be re-used, we do not benefit much from inventing new class. So far we duplicate too much.

Context
only thing I see are devtype2str and devstr2type now become Java Map, but who actually will use them? Even though it was used in some unusual case, users can easily invoke .asJava themselves.
DType
Nothing changes
Shape
Java users can already construct Shape easily: Shape.create(1,2,3,4), if you worry about toVector returning Scala Vector, we can add one more function toList.
IO
Nothing new.

yzhliu · 2018-10-12T03:49:24Z

Makefile

@@ -606,7 +606,7 @@ scalaclean:

 scalapkg:
 	(cd $(ROOTDIR)/scala-package; \
-		mvn package -P$(SCALA_PKG_PROFILE),$(SCALA_VERSION_PROFILE) -Dcxx="$(CXX)" \
+		mvn package -P$(SCALA_PKG_PROFILE),$(SCALA_VERSION_PROFILE),integrationtest -Dcxx="$(CXX)" \


why running test here?

No tests are actually triggered in here. This is just to avoid JUnit test running when make scala pkg. The integration test flag will disable the unit test for JUnit

andrewfayres · 2018-10-12T05:26:06Z

A few comments, what this is doing is essentially spark's approach (see their JavaDoubleRDD). They have a JavaDoubleRDD class that they use to just call methods of their Scala RDD and internally handle the conversions. You can also see this in their java examples, throughout all of the examples that I've looked at the Java examples never use any part of the core library outside of the api.java package.

If we don't provide a Java friendly wrapper for all of the classes we want to expose to Java developers it'll lead to a rather confusing experience. They'll be expected to learn what is and is not meant for them to use from Scala's package while also having to know when to use the javaapi package instead.

Take the IO wrapper for example, all the Java wrapper is exposing is a Java friendly version of DataDesc. If we don't wrap that and instead expect them to use the Scala one then they'll also get exposed to all the other classes in IO that aren't intended for them to use (aren't Java friendly). Any methods that we add there to make it Java friendly will pollute the Scala API with stuff that scala developers don't need.

yzhliu · 2018-10-12T06:32:56Z

Thanks for providing your points of view @andrewfayres

IMO it is not always "bad" to expose Scala classes. Scala classes are not necessary a "new language" that requires users to learn. e.g., scala.Tuple, many Java 3rd parties (Guava, Apache Common, javatuples) provide exactly the same thing.

Spark only re-implement a very small set of classes, mostly RDDs, and not every RDD have a Java version.

At least these four classes are already "java-friendly". Duplicating will only increase maintenance efforts. Honestly I don't see why DataDesc is not java-friendly.

It makes sense to me to have Java version of NDArray, (and maybe also Module, etc), because they have too many functions that are not java-friendly. But not every class is in such situation.

yzhliu · 2018-10-12T06:41:44Z

Also I can easily find a java example using Scala object, in the link you mentioned: https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/JavaSparkPi.java#L34-L37

And in their document, they at least recommend java users to use scala.Tuple2: https://spark.apache.org/docs/0.9.1/java-programming-guide.html

And here https://spark.apache.org/examples.html , if you click tab "Java", there's a bunch of usage of Spark-Scala objects: LogisticRegression, Vector, etc.

And let's check their java test case: https://github.com/apache/spark/blob/master/mllib/src/test/java/org/apache/spark/ml/classification/JavaLogisticRegressionSuite.java

I don't think whether they come from core makes difference (or org.apache.spark.sql.XXX can be viewed as "core" of spark sql), the point is more about, whether it is good to expose "Scala object" to Java users. My answer is yes, as long as it turns out to be java-friendly.

andrewfayres · 2018-10-12T08:38:19Z

I'm not taking a position that Scala objects should never be exposed to Java users. I have no objections to doing so when it's appropriate. My objection is about exposing half a package to them because it is already Java friendly and requiring them to know which parts they are expected to cherry pick from.

I think everything you linked to is from outside the core module which I was talking about. The reason I was making that distinction is that it seems to be where they have separate java and scala APIs. At a cursory glance, it looks like the entire mllib module is Java friendly and doesn't have a separate API (like it had been originally designed to be that way). If we'd planned on Java support from the very beginning this is probably how we'd of done things.

I have no objections to the idea that we could have them use scala.Tuple2. It's useful, straight forward, is a concept everyone knows, and has no analog in Java SE.

I agree that DataDesc is Java-friendly (or at least good enough) but there are plenty of other classes (even in the same IO file as DataDesc) that aren't. I think it's an unreasonable ask to expect the Java Developer to know that they're expected to only select certain things from that package but other things are off limits.

I'm not particularly concerned about the maintenance costs because they're just wrappers. The only time I foresee us doing maintenance is if we make a breaking change to the corresponding method of the Scala API which should be very rare (and when it happens there's already so much to update that an extra wrapper should make little difference).

yzhliu · 2018-10-12T17:30:40Z

Then I do not agree the opinion of, even though DataDesc is java-friendly, we still duplicate it because there're other things duplicated.

User types DataDesc and there's only one choice exists - I feel it is way better than two things under different namespaces pop-up, have same API, behave the same, but not compatible to each other.

I think you agree that, if one class is already designed to be java-friendly, then we can just expose it - this is how you took spark-mllib as an example. But why tangle about the namespace.

The main objective of providing "Java API", is not to have a java namespace - as a user, why do I care about which namespace I'm using? As a remedy of not making APIs java-friendly at the very beginning, Spark invents JavaRDD, we are just facing the same problem here.

Also be careful saying maintenance is easy. Count how many lines you wrote for only 4 tiny, simple classes. And when API changes, increases, removes, we'll suffer from maintaining two things - And remember IO/Optimizer/etc are very likely to be extended.

andrewfayres · 2018-10-12T18:47:54Z

In offline discussions we've agreed that we'll collectively craft an email with both positions, send it out for feedback, and go with the consensus that emerges.

In the meantime, development of the Java api will proceed on a new branch.

lanking520 · 2018-10-12T18:49:08Z

Here is the alternative one, closing this now! #12816

lanking520 requested review from nswamy and yzhliu as code owners October 9, 2018 18:36

This was referenced Oct 9, 2018

[MXNET-913] Java API -- NDArray Macros remove Option field #12624

Closed

[MXNET-913] Java API --- Scala NDArray Macros #12608

Closed

[MXNET-913] Java API --- Scala NDArray Improvement #12536

Closed

lanking520 requested a review from szha as a code owner October 9, 2018 21:05

sandeep-krishnamurthy added Scala pr-awaiting-review PR is waiting for code review Java Label to identify Java API component labels Oct 9, 2018

lanking520 force-pushed the java-ndtest2 branch from 0d83d6b to 6e061d4 Compare October 9, 2018 21:37

lanking520 changed the title ~~[MXNET-984][DO NOT MERGE] Add Java NDArray and introduce Java Operator Builder class~~ [MXNET-984] Add Java NDArray and introduce Java Operator Builder class Oct 9, 2018

andrewfayres suggested changes Oct 11, 2018

View reviewed changes

lanking520 force-pushed the java-ndtest2 branch from 82df9fd to 554b5e5 Compare October 11, 2018 17:56

andrewfayres approved these changes Oct 11, 2018

View reviewed changes

yzhliu suggested changes Oct 12, 2018

View reviewed changes

lanking520 added 6 commits October 12, 2018 11:29

clean history and add commit

8573cf5

add lint header

fc70ab4

bypass the java unittest when make the package

b82b772

clean up redundant test

09963c6

clean spacing issue

e486b1b

revert the change

fff5c79

lanking520 added 4 commits October 12, 2018 11:29

clean up

b9449e5

cleanup the JMacros

2f2a4d0

adding line escape

50ffbfb

revert some changes and fix scala style

cd93cc3

lanking520 force-pushed the java-ndtest2 branch from 87d95a2 to cd93cc3 Compare October 12, 2018 18:29

lanking520 mentioned this pull request Oct 12, 2018

[MXNET-984] Add Java NDArray and introduce Java Operator Builder class #12816

Merged

5 tasks

lanking520 closed this Oct 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-984] Add Java NDArray and introduce Java Operator Builder class #12772

[MXNET-984] Add Java NDArray and introduce Java Operator Builder class #12772

lanking520 commented Oct 9, 2018 •

edited

Loading

marcoabreu commented Oct 9, 2018

lanking520 commented Oct 9, 2018

marcoabreu commented Oct 9, 2018 •

edited

Loading

andrewfayres left a comment

andrewfayres Oct 11, 2018

andrewfayres Oct 11, 2018

andrewfayres commented Oct 11, 2018

andrewfayres left a comment •

edited

Loading

yzhliu left a comment •

edited

Loading

yzhliu Oct 12, 2018

lanking520 Oct 12, 2018

andrewfayres commented Oct 12, 2018

yzhliu commented Oct 12, 2018 •

edited

Loading

yzhliu commented Oct 12, 2018 •

edited

Loading

andrewfayres commented Oct 12, 2018

yzhliu commented Oct 12, 2018

andrewfayres commented Oct 12, 2018

lanking520 commented Oct 12, 2018

[MXNET-984] Add Java NDArray and introduce Java Operator Builder class #12772

[MXNET-984] Add Java NDArray and introduce Java Operator Builder class #12772

Conversation

lanking520 commented Oct 9, 2018 • edited Loading

Description

Checklist

Essentials

marcoabreu commented Oct 9, 2018

lanking520 commented Oct 9, 2018

marcoabreu commented Oct 9, 2018 • edited Loading

andrewfayres left a comment

Choose a reason for hiding this comment

andrewfayres Oct 11, 2018

Choose a reason for hiding this comment

andrewfayres Oct 11, 2018

Choose a reason for hiding this comment

andrewfayres commented Oct 11, 2018

andrewfayres left a comment • edited Loading

Choose a reason for hiding this comment

yzhliu left a comment • edited Loading

Choose a reason for hiding this comment

yzhliu Oct 12, 2018

Choose a reason for hiding this comment

lanking520 Oct 12, 2018

Choose a reason for hiding this comment

andrewfayres commented Oct 12, 2018

yzhliu commented Oct 12, 2018 • edited Loading

yzhliu commented Oct 12, 2018 • edited Loading

andrewfayres commented Oct 12, 2018

yzhliu commented Oct 12, 2018

andrewfayres commented Oct 12, 2018

lanking520 commented Oct 12, 2018

lanking520 commented Oct 9, 2018 •

edited

Loading

marcoabreu commented Oct 9, 2018 •

edited

Loading

andrewfayres left a comment •

edited

Loading

yzhliu left a comment •

edited

Loading

yzhliu commented Oct 12, 2018 •

edited

Loading

yzhliu commented Oct 12, 2018 •

edited

Loading