Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable Avro support when spark-avro classes not loadable by Shim classloader [databricks] #5716

Merged
merged 11 commits into from
Jun 7, 2022
4 changes: 4 additions & 0 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,10 @@ More information about cards that support forward compatibility can be found

### How can I check if the RAPIDS Accelerator is installed and which version is running?

Using the `--jars` or `--packages` option followed by the file path or maven path to RAPIDS jar is the preferred way to run RAPIDS accelerator.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we have lots of places in the doc advertising extraClassPath https://github.com/NVIDIA/spark-rapids/search?q=%22extraClassPath%22&type=code

Not sure if this is the right place to document it. At any rate it deserves a dedicated FAQ question for the warning being introduced in this PR.

What to do when I see the warning ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sameerz @viadea
Please advise on the documentation comment by @gerashegalov

If RAPIDS jar is copied directly to the `$SPARK_HOME/jars` folder, it might result in the ClassLoader not setting up classes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having rapids jars as part of extraClassPath triggers the same Exception as copying to $SPARK_HOME/jars.
I would not even mention latter because it seems like a user error.

Not sure what we want to call extraClassPath officially. I used the to use the term static classpath because they are prepended independently of job classes (--jars, --packages). @tgravescs might want to chime in

properly to gain full advantage of the GPU.

On startup the RAPIDS Accelerator will log a warning message on the Spark driver showing the
version with a message that looks something like this:
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,12 @@

package org.apache.spark.sql.rapids

import scala.util.{Failure, Success, Try}
import scala.util.Try

import com.nvidia.spark.rapids._

import org.apache.spark.broadcast.Broadcast
import org.apache.spark.internal.Logging
import org.apache.spark.sql.avro.{AvroFileFormat, AvroOptions}
import org.apache.spark.sql.connector.read.{PartitionReaderFactory, Scan}
import org.apache.spark.sql.execution.FileSourceScanExec
Expand All @@ -29,16 +30,20 @@ import org.apache.spark.sql.sources.Filter
import org.apache.spark.sql.v2.avro.AvroScan
import org.apache.spark.util.{SerializableConfiguration, Utils}

object ExternalSource {
object ExternalSource extends Logging {
val avroScanClassName = "org.apache.spark.sql.v2.avro.AvroScan"

lazy val hasSparkAvroJar = {
val loader = Utils.getContextOrSparkClassLoader

/** spark-avro is an optional package for Spark, so the RAPIDS Accelerator
* must run successfully without it. */
Try(loader.loadClass("org.apache.spark.sql.v2.avro.AvroScan")) match {
case Failure(_) => false
case Success(_) => true
Utils.classIsLoadable(avroScanClassName) && {
Try(ShimLoader.loadClass(avroScanClassName)).map(_ => true)
.getOrElse {
logWarning("Avro library not found by the RAPIDS plugin. The Plugin jars are " +
"likely deployed using a static classpath spark.driver/executor.extraClassPath. " +
"Consider using --jars or --packages instead.")
false
}
}
}

Expand Down