-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable Avro support when spark-avro classes not loadable by Shim classloader [databricks] #5716
Conversation
Signed-off-by: Raza Jafri <[email protected]>
Signed-off-by: Raza Jafri <[email protected]>
build |
@@ -121,6 +129,10 @@ object ExternalSource { | |||
} | |||
|
|||
def getScans: Map[Class[_ <: Scan], ScanRule[_ <: Scan]] = { | |||
if (showErrorMessage) { | |||
println("WARNING: Avro library isn't loaded by the RAPIDS plugin. Please use --jars to " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be using System.err.println()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be using the logging framework logWarning/Error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this was a quick test that I forgot to replace with logError. Thanks
@@ -121,6 +129,10 @@ object ExternalSource { | |||
} | |||
|
|||
def getScans: Map[Class[_ <: Scan], ScanRule[_ <: Scan]] = { | |||
if (showErrorMessage) { | |||
println("WARNING: Avro library isn't loaded by the RAPIDS plugin. Please use --jars to " + | |||
"load the plugin if you haven't done so") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is optional, I think the warning could be worded better.
Maybe something like: "WARNING: Avro library not found by the RAPIDS plugin. If needed, please use --jars to add it"
Should we include the full classpath in the error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change that
/** spark-avro is an optional package for Spark, so the RAPIDS Accelerator | ||
* must run successfully without it. */ | ||
Try(loader.loadClass("org.apache.spark.sql.v2.avro.AvroScan")) match { | ||
Try(ShimLoader.loadClass("org.apache.spark.sql.v2.avro.AvroScan")) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference is not to change this logic until we have a grasp of the root cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us check if #5723 changes anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the same thing. I can quickly test that
Looks like #5723 is a better solution for the `NoClassDefFoundError". I will close this in it's favor |
Signed-off-by: Raza Jafri <[email protected]>
@gerashegalov @jbrennan333 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me other than the nit. I defer to @gerashegalov for approval.
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/ExternalSource.scala
Outdated
Show resolved
Hide resolved
@@ -121,6 +130,10 @@ object ExternalSource { | |||
} | |||
|
|||
def getScans: Map[Class[_ <: Scan], ScanRule[_ <: Scan]] = { | |||
if (showErrorMessage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should consolidate all this logic in hasSparkAvroJar. I'll make a PR to your PR with a suggestion razajafri#1
@@ -121,6 +130,10 @@ object ExternalSource { | |||
} | |||
|
|||
def getScans: Map[Class[_ <: Scan], ScanRule[_ <: Scan]] = { | |||
if (showErrorMessage) { | |||
logWarning("WARNING: Avro library not found by the RAPIDS plugin. If needed, please use " + | |||
"--jars to add it") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to document that --jars or --packages is the preferred way of deploying the plugin jar, specifically for Avro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us drop WARNING because it's already part of the logging category. The message should talk about the plugin jars, since this is the actual problem, not Avro:
"Avro library not found by the RAPIDS plugin. The Plugin jars are likely deployed using a static classpath spark.driver/executor.extraClassPath. Consider using --jars or --packages instead."
Signed-off-by: Gera Shegalov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a note in docs and adjust the error message
Review suggestions
Signed-off-by: Raza Jafri <[email protected]>
build |
Signed-off-by: Raza Jafri <[email protected]>
build |
1 similar comment
build |
Utils.classIsLoadable(avroScanClassName) && { | ||
Try(ShimLoader.loadClass(avroScanClassName)).map(_ => true) | ||
.getOrElse { | ||
logWarning("WARNING: Avro library not found by the RAPIDS plugin. If needed, please use " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address PR checks
scalastyle check flagged on the PR
error file=/home/runner/work/spark-rapids/spark-rapids/sql-plugin/src/main/scala/org/apache/spark/sql/rapids/ExternalSource.scala message=File line length exceeds 100 characters line=42
Previous comment not addressed yet
#5716 (comment)
docs/FAQ.md
Outdated
@@ -57,6 +57,10 @@ More information about cards that support forward compatibility can be found | |||
|
|||
### How can I check if the RAPIDS Accelerator is installed and which version is running? | |||
|
|||
Using the `--jars` or `--packages` option followed by the file path or maven path to RAPIDS jar is the preferred way to run RAPIDS accelerator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we have lots of places in the doc advertising extraClassPath https://github.com/NVIDIA/spark-rapids/search?q=%22extraClassPath%22&type=code
Not sure if this is the right place to document it. At any rate it deserves a dedicated FAQ question for the warning being introduced in this PR.
What to do when I see the warning ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sameerz @viadea
Please advise on the documentation comment by @gerashegalov
Signed-off-by: Raza Jafri <[email protected]>
docs/FAQ.md
Outdated
@@ -57,6 +57,10 @@ More information about cards that support forward compatibility can be found | |||
|
|||
### How can I check if the RAPIDS Accelerator is installed and which version is running? | |||
|
|||
Using the `--jars` or `--packages` option followed by the file path or maven path to RAPIDS jar is the preferred way to run RAPIDS accelerator. | |||
If RAPIDS jar is copied directly to the `$SPARK_HOME/jars` folder, it might result in the ClassLoader not setting up classes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having rapids jars as part of extraClassPath triggers the same Exception as copying to $SPARK_HOME/jars
.
I would not even mention latter because it seems like a user error.
Not sure what we want to call extraClassPath officially. I used the to use the term static classpath because they are prepended independently of job classes (--jars, --packages). @tgravescs might want to chime in
Signed-off-by: Raza Jafri <[email protected]>
build |
build |
Signed-off-by: Raza Jafri <[email protected]>
@tgravescs @gerashegalov PTAL |
build |
This PR uses the ShimLoader to load the AvroScan class to make sure it's on the classpath. It also uses the
SparkClassLoader
to warn the user if he wants to use the avro lib but RAPIDS plugin won't see the avro lib.