-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0 #32415
Conversation
Factor how the various implementations are run.
/cc @srowen |
pom.xml
Outdated
@@ -172,7 +172,7 @@ | |||
<fasterxml.jackson.version>2.12.2</fasterxml.jackson.version> | |||
<snappy.version>1.1.8.4</snappy.version> | |||
<netlib.java.version>1.1.2</netlib.java.version> | |||
<netlib.ludovic.dev.version>1.3.2</netlib.ludovic.dev.version> | |||
<netlib.ludovic.dev.version>2-SNAPSHOT</netlib.ludovic.dev.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we don't depend on a snapshot version - fine if this is in progress and 2.0 will be the final version referenced here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As soon as this PR is good to merge, I’ll release 2.0.0 of dev.ludovic.netlib, change it here and remove the references to the Snapshot repositories.
Jenkins test this please |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #138138 has finished for PR 32415 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self to make sure we don't merge something we don't want.
/cc @srowen I have release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good pending one more round of tests
@@ -456,7 +456,6 @@ org.antlr:ST4 | |||
org.antlr:stringtemplate | |||
org.antlr:antlr4-runtime | |||
antlr:antlr | |||
com.github.fommil.netlib:core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, this wasn't in the dev/deps files to begin with. Not totally sure why, but, yes then it only needs to come out here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You bring up a good point: com.github.fommil.netlib
is still a transient dependency of Spark through org.scalanlp:breeze
. I already submitted and merged a PR there to replace the dependency (scalanlp/breeze#811), but it hasn't been released yet.
For the sake of not regressing performance, let me re-add the dependency on com.github.fommil.netlib:all
in pom.xml
when enabling-Pnetlib-lgpl
and leave a comment to explain when we can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, breeze uses it. OK yes let's keep this aspect then.
Jenkins retest this please |
@fommil @srowen I got
We can observe there is no performance difference between |
nice, great work @luhenry ! |
Just catching up on the state here - so we need to put back the netlib-lgpl profile? anything else pending? |
@srowen the profile is already back at https://github.com/apache/spark/pull/32415/files#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8R3499. I think it's all done from my end, and tests are passing. Would you like to see something else before it gets merged? |
Oh I see it now, you've replaced two instances of the profile with one in the parent. That's probably fine; OK to leave it as it was too as the profile won't matter outside those two modules. Avoiding the redundancy seems fine; I'm trying to think through implications of all modules having this dependency then if it's turned on though. So the We can run tests again here to see if that fails or not. |
Jenkins test this please |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #138339 has finished for PR 32415 at commit
|
|
OK, in any event |
It does show at
|
Merged to master. |
@srowen makes perfect sense that such a change gets thoroughly tested! Next step is to add support for Sparse matrices and vectors. In your experience, how widely used are SparseMatrix and SparseVector used in Spark? |
It comes up a lot. Sparse is important at scale. Anywhere that plugs into native code it has to be made dense, so can't be applied in some cases. Anything that can operate on sparse reps could open up more possibilities to apply native acceleration. |
https://github.com/fommil/matrix-toolkits-java has a bunch of sparse stuff in it. Check out the references to the "Sparse Storage" in the README. LinkedSparseMatrix has served me well over the years for unstructured sparse matrices but there are no silver bullets. I don't know much about Spark's Sparse representations, but in my experience the kind of sparsity is what really matters. |
IIUC, Spark uses the CSC representation. @fommil is that format represented in MTJ as well? |
one of the many data structures. |
+1 Sparse cases are really general in ML, for many algorithms like (Logistic Regression/etc), the input datasets are expected to be sparse. |
Hi @luhenry, it seems we have dev.ludovic.netlib.InstanceBuilder duplicated in arpack, blas and lapack? Is this going to be a problem? |
What changes were proposed in this pull request?
Bump to
dev.ludovic.netlib:2.0
which provides JNI-based wrappers for BLAS, ARPACK, and LAPACK. Theseare not taking dependencies on GPL or LGPL libraries, allowing to provide out-of-the-box support for hardware acceleration when a native library is present (this is still up to the end-user to install such library on their system, like OpenBLAS, Intel MKL, and libarpack2).Why are the changes needed?
Great performance improvement for ML-related workload on vanilla-distributions of Spark.
Does this PR introduce any user-facing change?
Users now take advantage of hardware acceleration as long as a native library is installed (like OpenBLAS, Intel MKL and libarpack2).
How was this patch tested?
Spark test-suite + dev.ludovic.netlib testsuite.
JDK8:
JDK11:
JDK16:
TODO:
docs/
anddocs/ml-linalg-guide.md
referingcom.github.fommil.netlib
pom.xml
andproject/SparkBuild.scala
.