Skip to content

Commit

Permalink
Disable Avro support when spark-avro classes not loadable by Shim cla…
Browse files Browse the repository at this point in the history
…ssloader [databricks] (#5716)

* use the shim to load the avro class

Signed-off-by: Raza Jafri <[email protected]>

* Use ShimLoader to load avro class and to check

Signed-off-by: Raza Jafri <[email protected]>

* addressed review comments

Signed-off-by: Raza Jafri <[email protected]>

* Review suggestions

Signed-off-by: Gera Shegalov <[email protected]>

* added documentation

Signed-off-by: Raza Jafri <[email protected]>

* modified documentation

Signed-off-by: Raza Jafri <[email protected]>

* addressed build failure

Signed-off-by: Raza Jafri <[email protected]>

* addressed comments

Signed-off-by: Raza Jafri <[email protected]>

* added the exact error and warning

Signed-off-by: Raza Jafri <[email protected]>

Co-authored-by: Raza Jafri <[email protected]>
Co-authored-by: Gera Shegalov <[email protected]>
  • Loading branch information
3 people authored Jun 7, 2022
1 parent b7649ec commit 8137990
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 7 deletions.
6 changes: 6 additions & 0 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -513,6 +513,12 @@ Below are some troubleshooting tips on GPU query performance issue:
`spark.sql.files.maxPartitionBytes` and `spark.rapids.sql.concurrentGpuTasks` as these configurations can affect performance of queries significantly.
Please refer to [Tuning Guide](./tuning-guide.md) for more details.

### Why is Avro library not found by RAPIDS?

If you are getting a warning `Avro library not found by the RAPIDS plugin.` or if you are getting the
`java.lang.NoClassDefFoundError: org/apache/spark/sql/v2/avro/AvroScan` error, make sure you ran the
Spark job by using the `--jars` or `--packages` option followed by the file path or maven path to
RAPIDS jar since that is the preferred way to run RAPIDS accelerator.

### What is the default RMM pool allocator?

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,12 @@

package org.apache.spark.sql.rapids

import scala.util.{Failure, Success, Try}
import scala.util.Try

import com.nvidia.spark.rapids._

import org.apache.spark.broadcast.Broadcast
import org.apache.spark.internal.Logging
import org.apache.spark.sql.avro.{AvroFileFormat, AvroOptions}
import org.apache.spark.sql.connector.read.{PartitionReaderFactory, Scan}
import org.apache.spark.sql.execution.FileSourceScanExec
Expand All @@ -29,16 +30,20 @@ import org.apache.spark.sql.sources.Filter
import org.apache.spark.sql.v2.avro.AvroScan
import org.apache.spark.util.{SerializableConfiguration, Utils}

object ExternalSource {
object ExternalSource extends Logging {
val avroScanClassName = "org.apache.spark.sql.v2.avro.AvroScan"

lazy val hasSparkAvroJar = {
val loader = Utils.getContextOrSparkClassLoader

/** spark-avro is an optional package for Spark, so the RAPIDS Accelerator
* must run successfully without it. */
Try(loader.loadClass("org.apache.spark.sql.v2.avro.AvroScan")) match {
case Failure(_) => false
case Success(_) => true
Utils.classIsLoadable(avroScanClassName) && {
Try(ShimLoader.loadClass(avroScanClassName)).map(_ => true)
.getOrElse {
logWarning("Avro library not found by the RAPIDS plugin. The Plugin jars are " +
"likely deployed using a static classpath spark.driver/executor.extraClassPath. " +
"Consider using --jars or --packages instead.")
false
}
}
}

Expand Down

0 comments on commit 8137990

Please sign in to comment.