[QST] SparkException ERROR ContextCleaner: Error cleaning broadcast #5328
-
Hi, it reports many errors when running spark with rapids on data generated by TPC-DS.
2.start spark standalone ( 1 master and 3 works on the same machine)
import com.databricks.spark.sql.perf.tpcds.TPCDS // Note: Declare "sqlContext" for Spark 2.x version val tpcds = new TPCDS (sqlContext = sqlContext)
22/04/22 00:35:27 WARN BlockManagerMaster: Failed to remove broadcast 1881 with removeFromMaster = true - org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already stopped. |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments
-
what is the spark version? can you please try spark-3.2.1-bin-hadoop3.3? |
Beta Was this translation helpful? Give feedback.
-
spark version is 3.2.1 Only spark is using GPUs. The memory size of GPU is 16GB, and we have 2GPU on the machine. the log for some executors: 22/04/22 19:01:10 INFO RapidsExecutorPlugin: RAPIDS Accelerator build: {version=22.02.0, user=, url=https://github.com/NVIDIA/spark-rapids.git, date=2022-02-14T10:37:11Z, revision=a32ec69e67ff0f8adf85ab6b2665f6a1c751dac0, cudf_version=22.02.0, branch=HEAD} for other executors: 22/04/22 19:10:11 INFO RapidsExecutorPlugin: RAPIDS Accelerator build: {version=22.02.0, user=, url=https://github.com/NVIDIA/spark-rapids.git, date=2022-02-14T10:37:11Z, revision=a32ec69e67ff0f8adf85ab6b2665f6a1c751dac0, cudf_version=22.02.0, branch=HEAD} 3. nvidia-smi +-----------------------------------------------------------------------------+ +-----------------------------------------------------------------------------+ |
Beta Was this translation helpful? Give feedback.
-
It sounds like your environment is not setup properly. it looks like you are not properly using Spark GPU scheduling and executors are trying to use the same GPU. Alternatively if you didn't want to use GPU scheduling you could set Gpus up in process exclusive mode. But since you are using standalone its probably easiest just to configure it to do GPU scheduling.
Is there a reason you are using 3 workers on the same machine? You should just use a single worker since you only have 1 GPU. please see instructions here: https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html#spark-standalone-cluster Specifically worker setup section:
If you setup the workers to see the GPUs then you can request them with your spark-shell command line by specifying the parameters:
With the above setup you should be able to see the GPUs resources in the Spark Master UI. |
Beta Was this translation helpful? Give feedback.
-
It works well with a single worker. (standalone mode) |
Beta Was this translation helpful? Give feedback.
-
thanks for the confirmation, I'm going to close this then. If you have more questions, you can open another issue or use our discussions board: https://github.com/NVIDIA/spark-rapids/discussions |
Beta Was this translation helpful? Give feedback.
It sounds like your environment is not setup properly. it looks like you are not properly using Spark GPU scheduling and executors are trying to use the same GPU. Alternatively if you didn't want to use GPU scheduling you could set Gpus up in process exclusive mode. But since you are using standalone its probably easiest just to configure it to do GPU scheduling.
Is there a reason you are using 3 workers on the same machine? You should just use a single worker since you only have 1 GPU.
please see instructions here: https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html#spark-standalone-clu…