Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark Lineage zstd-jni conflict #5979

Closed
danielli-ziprecruiter opened this issue Sep 19, 2022 · 2 comments · Fixed by #6787
Closed

Spark Lineage zstd-jni conflict #5979

danielli-ziprecruiter opened this issue Sep 19, 2022 · 2 comments · Fixed by #6787
Labels
bug Bug report ingestion PR or Issue related to the ingestion of metadata

Comments

@danielli-ziprecruiter
Copy link
Contributor

danielli-ziprecruiter commented Sep 19, 2022

Describe the bug
This occurred using spark 3.1.2 and datahub-spark-lineage 0.8.43.
Spark lineage is relocating zstd-jni classes. According to zstd-jni documentation, this should not be done since it can cause native library issues. This leads to intermittent errors depending on if .so files in datahub-spark-lineage-0.8.43.jar or zstd-jni-1.4.8-1.jar (included with spark 3.1) gets chosen by Java.

java.lang.UnsatisfiedLinkError: com.github.luben.zstd.Zstd.setCompressionLevel(JI)I
	at com.github.luben.zstd.Zstd.setCompressionLevel(Native Method)
	at com.github.luben.zstd.ZstdOutputStream.<init>(ZstdOutputStream.java:67)
	at org.apache.spark.io.ZStdCompressionCodec.compressedOutputStream(CompressionCodec.scala:223)
	at org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:910)
	at org.apache.spark.ShuffleStatus.$anonfun$serializedMapStatus$2(MapOutputTracker.scala:233)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.spark.ShuffleStatus.withWriteLock(MapOutputTracker.scala:72)
	at org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:230)
	at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:466)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
@danielli-ziprecruiter danielli-ziprecruiter added the bug Bug report label Sep 19, 2022
@github-actions
Copy link

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

@github-actions github-actions bot added the stale label Oct 23, 2022
@laulpogan
Copy link
Contributor

Ditto'd in the community channel :https://datahubspace.slack.com/archives/CUMUWQU66/p1669017837146189

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
2 participants