-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scala 2.13 support. #9099
Scala 2.13 support. #9099
Conversation
Thank you for the work on updating the scala version. I will defer the review to @wbo4958 as I'm not familiar with the ecosystem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a review since I also looked into adding support for scala 2.13. But since I not a maintainer for this project I guess you have to take them for what they are.
Also if I run a compile on this branch
$ mvn compile -Pdefault,scala-2.13
...
[ERROR] Failed to execute goal on project xgboost4j-example_2.13: Could not resolve dependencies for project ml.dmlc:xgboost4j-example_2.13:jar:2.0.0-SNAPSHOT: Could not find artifact ml.dmlc:xgboost4j_2.12:jar:2.0.0-SNAPSHOT in central_maven (https://repo1.maven.org/maven2) -> [Help 1]
it fails looking for some 2.12
version of the xgboost4j
jar. Not sure if this just for me or if there is something more that was missed.
jvm-packages/README.md
Outdated
</dependency> | ||
<dependency> | ||
<groupId>ml.dmlc</groupId> | ||
<artifactId>xgboost4j-spark_2.12</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<artifactId>xgboost4j-spark_2.12</artifactId> | |
<artifactId>xgboost4j-spark_2.13</artifactId> |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eejbyfeldt unfortunately, you have to do mvn install -Pdefault,scala-2.13
instead of simply mvn compile
:(
Thanks a lot for your comments. I totally missed few important things :(
jvm-packages/README.md
Outdated
@@ -47,7 +60,7 @@ libraryDependencies ++= Seq( | |||
|
|||
For the latest release version number, please check [here](https://github.com/dmlc/xgboost/releases). | |||
|
|||
To enable the GPU algorithm (`tree_method='gpu_hist'`), use artifacts `xgboost4j-gpu_2.12` and `xgboost4j-spark-gpu_2.12` instead. | |||
To enable the GPU algorithm (`tree_method='gpu_hist'`), use artifacts `xgboost4j-gpu_2.12`(`xgboost4j-gpu_2.13`) and `xgboost4j-spark-gpu_2.12`(`xgboost4j-spark-gpu_2.13`) instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since xgboost4j-spark-gpu_2.13
depends on rapids-4-spark
which only has a 2.12 version https://mvnrepository.com/artifact/com.nvidia/rapids-4-spark I do not think it possible to build the xgboost4j-spark-gpu
package for 2.13 yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. NVIDIA/spark-rapids#1525
I will fix the docs, thanks a lot!
<version>2.0.0-SNAPSHOT</version> | ||
</parent> | ||
<artifactId>xgboost4j-spark_2.12</artifactId> | ||
<name>xgboost4j-spark</name> | ||
<artifactId>xgboost4j-spark_${scala.binary.version}</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a property here leads to maven producing this warning:
[WARNING] Some problems were encountered while building the effective model for ml.dmlc:xgboost4j-spark_2.13:jar:2.0.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ ml.dmlc:xgboost4j-spark_${scala.binary.version}:2.0.0-SNAPSHOT, /home/eejbyfeldt/dev/dmlc/xgboost/jvm-packages/xgboost4j-spark/pom.xml, line 12, column 17
is that really save to ignore?
Spark that also uses maven as build system has a dedicated script (https://github.com/apache/spark/blob/master/dev/change-scala-version.sh) for changing the scala version. I am not very familiar with maven but maybe there are some good reasons they chose that approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is safe for now. Sadly, making the script as spark adds more complexity than just using profiles.
I've reused the approach discussed in the davidB/scala-maven-plugin#177
Spark approach originates in 2015 and maven changed since. I can make a script in place of maven profiles, if necessary.
jvm-packages/xgboost4j/src/main/scala/ml/dmlc/xgboost4j/scala/XGBoost.scala
Outdated
Show resolved
Hide resolved
You are more than welcome to contribute through any means. Also, we are looking for new maintainers for jvm packages and R. |
8004451
to
3cbd533
Compare
@trivialfis I have few things I'd like to do in the |
Thank you for sharing your interest! Let's work through those items first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few questions, still trying to make time to learn about the spark ecosystem. ;-)
Otherwise looks fine to me. cc @wbo4958 .
@@ -96,7 +121,9 @@ libraryDependencies ++= Seq( | |||
|
|||
For the latest release version number, please check [the repository listing](https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/list.html). | |||
|
|||
### GPU algorithm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wbo4958 Could you please help confirm this is still the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sadly, the NVIDIA/spark-rapids#1525 is a blocker for the GPU-enabled modules. I do not think this has to be a showstopper for 2.13.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems yes.
@@ -5,7 +5,7 @@ | |||
<modelVersion>4.0.0</modelVersion> | |||
|
|||
<groupId>ml.dmlc</groupId> | |||
<artifactId>xgboost-jvm_2.12</artifactId> | |||
<artifactId>xgboost-jvm</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the newbie question. Is this changing the jar name? If so, is it considered a breaking change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is not changing the jar name, as the relevant project is not published as jar.
<packaging>pom</packaging>
Pom-packaging is a way to specify the container of submodules. It does not lead to a jar.
jvm-packages
only produce jars for xgboost4j-spark
, xgboost4j
, xgboost4j-flink
, xgboost4j-example
and the gpu-enabled modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the information!
jvm-packages/xgboost4j/src/main/scala/ml/dmlc/xgboost4j/scala/EvalTrait.scala
Show resolved
Hide resolved
great job. I will review it. Sorry for that |
Really great job. LGTM assuming the CI can pass |
Thanks a lot @wbo4958 I've updated the |
I am not sure whether I can do anything for docs and python tests. |
Resolved conflicts. |
Will merge once all tests are green. Thank you for the work on supporting the newer Scala! |
1fd2b3e
to
4fa4417
Compare
@trivialfis had to modify the docker image used for the integration tests. Should be green now. |
a595a85
to
29f2380
Compare
d8e1088
to
6b3c971
Compare
updated the logic, used to build images for the integration tests + rebased |
Thank you for the update! Are you able to reproduce this error: https://buildkite.com/xgboost/xgboost-ci-multi-gpu/builds/2184#01881fd5-fa29-4540-92f4-eb8861dab13b ? Feel free to let me know if there's anything I can help. |
Oops, spark-rapids 23.04.0 doesn't support spark3.4. I will fix it. |
@wbo4958 thanks a lot! Reproducing some things locally requires patching, so it takes more time than just writing code. |
@trivialfis, @wbo4958 I have rebased the branch again, so all changes should be included. Do think, I should do anything else before this MR can be merged? |
Restarted the CI due to a network error. Maven seems quite unstable on the CI recently. |
47d809a
to
821b39d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me, one minor issue in the comments.
Thank you for the nice work on not only updating the Scala version but also handling the tests!
...boost4j-example/src/test/scala/ml/dmlc/xgboost4j/scala/example/spark/SparkExamplesTest.scala
Outdated
Show resolved
Hide resolved
1. Updated the test logic 2. Added smoke tests for Spark examples. 3. Added integration tests for Spark with Scala 2.13
@trivialfis Thanks a lot ! Are there any release plans? |
xgboost4j support of scala 2.13 #8650 and #6596