Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable Avro support when spark-avro classes not loadable by Shim classloader [databricks] #5716

Merged
merged 11 commits into from
Jun 7, 2022

Conversation

razajafri
Copy link
Collaborator

This PR uses the ShimLoader to load the AvroScan class to make sure it's on the classpath. It also uses the SparkClassLoader to warn the user if he wants to use the avro lib but RAPIDS plugin won't see the avro lib.

@razajafri
Copy link
Collaborator Author

build

@@ -121,6 +129,10 @@ object ExternalSource {
}

def getScans: Map[Class[_ <: Scan], ScanRule[_ <: Scan]] = {
if (showErrorMessage) {
println("WARNING: Avro library isn't loaded by the RAPIDS plugin. Please use --jars to " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be using System.err.println()?

Copy link
Collaborator

@gerashegalov gerashegalov Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be using the logging framework logWarning/Error

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this was a quick test that I forgot to replace with logError. Thanks

@@ -121,6 +129,10 @@ object ExternalSource {
}

def getScans: Map[Class[_ <: Scan], ScanRule[_ <: Scan]] = {
if (showErrorMessage) {
println("WARNING: Avro library isn't loaded by the RAPIDS plugin. Please use --jars to " +
"load the plugin if you haven't done so")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is optional, I think the warning could be worded better.
Maybe something like: "WARNING: Avro library not found by the RAPIDS plugin. If needed, please use --jars to add it"
Should we include the full classpath in the error message?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change that

@tgravescs tgravescs requested review from gerashegalov and removed request for gerashegalov June 2, 2022 17:27
/** spark-avro is an optional package for Spark, so the RAPIDS Accelerator
* must run successfully without it. */
Try(loader.loadClass("org.apache.spark.sql.v2.avro.AvroScan")) match {
Try(ShimLoader.loadClass("org.apache.spark.sql.v2.avro.AvroScan")) match {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is not to change this logic until we have a grasp of the root cause.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us check if #5723 changes anything

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same thing. I can quickly test that

@razajafri
Copy link
Collaborator Author

Looks like #5723 is a better solution for the `NoClassDefFoundError".

I will close this in it's favor

@razajafri razajafri closed this Jun 2, 2022
@razajafri razajafri reopened this Jun 3, 2022
@razajafri
Copy link
Collaborator Author

Looks like #5723 is a better solution for the `NoClassDefFoundError".

I will close this in it's favor

Reopening as #5723 doesn't fix the bug as I originally thought

Signed-off-by: Raza Jafri <[email protected]>
@razajafri
Copy link
Collaborator Author

@gerashegalov @jbrennan333
Can we take another look at this?

Copy link
Contributor

@jbrennan333 jbrennan333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me other than the nit. I defer to @gerashegalov for approval.

@@ -121,6 +130,10 @@ object ExternalSource {
}

def getScans: Map[Class[_ <: Scan], ScanRule[_ <: Scan]] = {
if (showErrorMessage) {
Copy link
Collaborator

@gerashegalov gerashegalov Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consolidate all this logic in hasSparkAvroJar. I'll make a PR to your PR with a suggestion razajafri#1

@@ -121,6 +130,10 @@ object ExternalSource {
}

def getScans: Map[Class[_ <: Scan], ScanRule[_ <: Scan]] = {
if (showErrorMessage) {
logWarning("WARNING: Avro library not found by the RAPIDS plugin. If needed, please use " +
"--jars to add it")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to document that --jars or --packages is the preferred way of deploying the plugin jar, specifically for Avro.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us drop WARNING because it's already part of the logging category. The message should talk about the plugin jars, since this is the actual problem, not Avro:

"Avro library not found by the RAPIDS plugin. The Plugin jars are likely deployed using a static classpath spark.driver/executor.extraClassPath. Consider using --jars or --packages instead."

Signed-off-by: Gera Shegalov <[email protected]>
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a note in docs and adjust the error message

razajafri and others added 2 commits June 3, 2022 14:27
@razajafri
Copy link
Collaborator Author

build

Signed-off-by: Raza Jafri <[email protected]>
@razajafri
Copy link
Collaborator Author

build

1 similar comment
@sameerz
Copy link
Collaborator

sameerz commented Jun 6, 2022

build

Utils.classIsLoadable(avroScanClassName) && {
Try(ShimLoader.loadClass(avroScanClassName)).map(_ => true)
.getOrElse {
logWarning("WARNING: Avro library not found by the RAPIDS plugin. If needed, please use " +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address PR checks

scalastyle check flagged on the PR

error file=/home/runner/work/spark-rapids/spark-rapids/sql-plugin/src/main/scala/org/apache/spark/sql/rapids/ExternalSource.scala message=File line length exceeds 100 characters line=42

Previous comment not addressed yet
#5716 (comment)

docs/FAQ.md Outdated
@@ -57,6 +57,10 @@ More information about cards that support forward compatibility can be found

### How can I check if the RAPIDS Accelerator is installed and which version is running?

Using the `--jars` or `--packages` option followed by the file path or maven path to RAPIDS jar is the preferred way to run RAPIDS accelerator.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we have lots of places in the doc advertising extraClassPath https://github.com/NVIDIA/spark-rapids/search?q=%22extraClassPath%22&type=code

Not sure if this is the right place to document it. At any rate it deserves a dedicated FAQ question for the warning being introduced in this PR.

What to do when I see the warning ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sameerz @viadea
Please advise on the documentation comment by @gerashegalov

Signed-off-by: Raza Jafri <[email protected]>
docs/FAQ.md Outdated
@@ -57,6 +57,10 @@ More information about cards that support forward compatibility can be found

### How can I check if the RAPIDS Accelerator is installed and which version is running?

Using the `--jars` or `--packages` option followed by the file path or maven path to RAPIDS jar is the preferred way to run RAPIDS accelerator.
If RAPIDS jar is copied directly to the `$SPARK_HOME/jars` folder, it might result in the ClassLoader not setting up classes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having rapids jars as part of extraClassPath triggers the same Exception as copying to $SPARK_HOME/jars.
I would not even mention latter because it seems like a user error.

Not sure what we want to call extraClassPath officially. I used the to use the term static classpath because they are prepended independently of job classes (--jars, --packages). @tgravescs might want to chime in

@sameerz sameerz modified the milestones: May 23 - Jun 3, Jun 6 - Jun 17 Jun 6, 2022
Signed-off-by: Raza Jafri <[email protected]>
@razajafri
Copy link
Collaborator Author

build

@razajafri
Copy link
Collaborator Author

build

docs/FAQ.md Show resolved Hide resolved
@razajafri
Copy link
Collaborator Author

@tgravescs @gerashegalov PTAL

@gerashegalov gerashegalov changed the title Use ShimLoader to load the Avro lib [databricks] Disable Avro support when spark-avro classes not loadable by Shim classloader [databricks] Jun 6, 2022
@gerashegalov
Copy link
Collaborator

build

@sameerz sameerz merged commit 8137990 into NVIDIA:branch-22.06 Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants