Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable Avro support when spark-avro classes not loadable by Shim classloader [databricks] #5716

Merged
merged 11 commits into from
Jun 7, 2022
6 changes: 6 additions & 0 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -513,6 +513,12 @@ Below are some troubleshooting tips on GPU query performance issue:
`spark.sql.files.maxPartitionBytes` and `spark.rapids.sql.concurrentGpuTasks` as these configurations can affect performance of queries significantly.
Please refer to [Tuning Guide](./tuning-guide.md) for more details.

### Why is Avro library not found by RAPIDS?
tgravescs marked this conversation as resolved.
Show resolved Hide resolved

If you are getting a warning `Avro library not found by the RAPIDS plugin.` or if you are getting the
`java.lang.NoClassDefFoundError: org/apache/spark/sql/v2/avro/AvroScan` error, make sure you ran the
Spark job by using the `--jars` or `--packages` option followed by the file path or maven path to
RAPIDS jar since that is the preferred way to run RAPIDS accelerator.

### What is the default RMM pool allocator?

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,12 @@

package org.apache.spark.sql.rapids

import scala.util.{Failure, Success, Try}
import scala.util.Try

import com.nvidia.spark.rapids._

import org.apache.spark.broadcast.Broadcast
import org.apache.spark.internal.Logging
import org.apache.spark.sql.avro.{AvroFileFormat, AvroOptions}
import org.apache.spark.sql.connector.read.{PartitionReaderFactory, Scan}
import org.apache.spark.sql.execution.FileSourceScanExec
Expand All @@ -29,16 +30,20 @@ import org.apache.spark.sql.sources.Filter
import org.apache.spark.sql.v2.avro.AvroScan
import org.apache.spark.util.{SerializableConfiguration, Utils}

object ExternalSource {
object ExternalSource extends Logging {
val avroScanClassName = "org.apache.spark.sql.v2.avro.AvroScan"

lazy val hasSparkAvroJar = {
val loader = Utils.getContextOrSparkClassLoader

/** spark-avro is an optional package for Spark, so the RAPIDS Accelerator
* must run successfully without it. */
Try(loader.loadClass("org.apache.spark.sql.v2.avro.AvroScan")) match {
case Failure(_) => false
case Success(_) => true
Utils.classIsLoadable(avroScanClassName) && {
Try(ShimLoader.loadClass(avroScanClassName)).map(_ => true)
.getOrElse {
logWarning("Avro library not found by the RAPIDS plugin. The Plugin jars are " +
"likely deployed using a static classpath spark.driver/executor.extraClassPath. " +
"Consider using --jars or --packages instead.")
false
}
}
}

Expand Down