Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4056] Upgrade snappy-java to 1.1.1.5 #2911

Closed
wants to merge 2 commits into from

Conversation

JoshRosen
Copy link
Contributor

This upgrades snappy-java to 1.1.1.5, which improves error messages when attempting to deserialize empty inputs using SnappyInputStream (see xerial/snappy-java#89).

@SparkQA
Copy link

SparkQA commented Oct 23, 2014

QA tests have started for PR 2911 at commit cc953d6.

  • This patch merges cleanly.

@rxin
Copy link
Contributor

rxin commented Oct 23, 2014

LGTM.

@SparkQA
Copy link

SparkQA commented Oct 23, 2014

Tests timed out for PR 2911 at commit cc953d6 after a configured wait of 120m.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22077/
Test FAILed.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Oct 23, 2014

QA tests have started for PR 2911 at commit cc953d6.

  • This patch merges cleanly.

@JoshRosen
Copy link
Contributor Author

It looks like these test failures might be due missing classes in the snappy-java 1.1.1.4 JAR: xerial/snappy-java#90

@SparkQA
Copy link

SparkQA commented Oct 23, 2014

Tests timed out for PR 2911 at commit cc953d6 after a configured wait of 120m.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22084/
Test FAILed.

@xerial
Copy link

xerial commented Oct 24, 2014

Please use snappy-java-1.1.1.5, which fixes the broken build.

@JoshRosen JoshRosen changed the title [SPARK-4056] Upgrade snappy-java to 1.1.1.4 [SPARK-4056] Upgrade snappy-java to 1.1.1.5 Oct 24, 2014
@SparkQA
Copy link

SparkQA commented Oct 24, 2014

QA tests have started for PR 2911 at commit adec96c.

  • This patch merges cleanly.

@JoshRosen
Copy link
Contributor Author

@xerial Thanks for fixing that so quickly!

@SparkQA
Copy link

SparkQA commented Oct 24, 2014

Tests timed out for PR 2911 at commit adec96c after a configured wait of 120m.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22100/
Test FAILed.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Oct 24, 2014

QA tests have started for PR 2911 at commit adec96c.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 24, 2014

QA tests have finished for PR 2911 at commit adec96c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22106/
Test FAILed.

@xerial
Copy link

xerial commented Oct 24, 2014

@JoshRosen If you have the stack trace of this error, please let me know. I would like to check it.

@SparkQA
Copy link

SparkQA commented Oct 24, 2014

QA tests have started for PR 2911 at commit adec96c.

  • This patch merges cleanly.

@JoshRosen
Copy link
Contributor Author

@xerial Here's a link to the exception from that most recent test failure:

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22106/testReport/junit/org.apache.spark.util.collection/ExternalAppendOnlyMapSuite/spilling_with_compression/

In case that link breaks, here's the driver stacktrace:

sbt.ForkMain$ForkError: Test failed with compression using codec org.apache.spark.io.SnappyCompressionCodec:

Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 14, localhost): java.io.IOException: unexpected exception type
        java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538)
        java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1025)
        java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
        java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
        org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:163)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1191)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1180)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1179)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1179)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:694)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:694)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:694)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1397)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1352)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
    at akka.actor.ActorCell.invoke(ActorCell.scala:487)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
    at akka.dispatch.Mailbox.run(Mailbox.scala:220)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

This isn't the full stacktrace of the actual error, which took place in an executor. Until we merge #2845, I don't think that I'll have an easy way to grab the full executor logs from Jenkins.

I wasn't able to reproduce this failure locally. I'm going to try our new experimental "Deflake build" Jenkins button, which reruns only the failing tests, in order to see if I can reproduce this. If so, I'll SSH in and grab the full logs.

@JoshRosen
Copy link
Contributor Author

Actually, I don't think that "deflake build" plugin will necessarily work as expected given all of the customization in our build; I guess it was added for another project that shares the Jenkins server with us.

@SparkQA
Copy link

SparkQA commented Oct 24, 2014

QA tests have finished for PR 2911 at commit adec96c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Hmm, looks like that might have been a transient failure. Just to be sure, though, I'm going to run this one more time to make sure that it still passes, then merge it (since I don't think that any issues we'll observe will be caused by this small snappy-java version bump).

Jenkins, retest this please.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Oct 24, 2014

Test build #22143 has started for PR 2911 at commit adec96c.

  • This patch merges cleanly.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22143/
Test FAILed.

@pwendell
Copy link
Contributor

Jenkins, retest this please (sorry I had to abort this to clean the workspace).

@SparkQA
Copy link

SparkQA commented Oct 24, 2014

Test build #22144 has started for PR 2911 at commit adec96c.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 24, 2014

Test build #22144 has finished for PR 2911 at commit adec96c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22144/
Test FAILed.

@JoshRosen
Copy link
Contributor Author

The "IOException: unexpected exception type" error here is actually masking the real error; this is due to us throwing something other than IOException from read/writeExternal; I've opened https://issues.apache.org/jira/browse/SPARK-4080 to fix this (working on a patch now).

I'm pretty sure that this isn't caused by Snappy, but instead is an instance of some longer-standing non-deterministic serialization issue in our code.

@JoshRosen
Copy link
Contributor Author

Now that I've fixed the IOException issue via #2932, let's retest this to see if I get a more informative error message. I'm almost positive that this isn't a Snappy issue; I'm just curious to see what happens.

Jenkins, retest this please.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Oct 24, 2014

Test build #22178 has started for PR 2911 at commit adec96c.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 25, 2014

Test build #22178 has finished for PR 2911 at commit adec96c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22178/
Test FAILed.

@JoshRosen
Copy link
Contributor Author

Those failures are related to known issues from the Hive 13 PR, not this, so I'm going to merge this across all of our backport branches in order to improve our error reporting.

@asfgit asfgit closed this in 898b22a Oct 25, 2014
asfgit pushed a commit that referenced this pull request Oct 25, 2014
This upgrades snappy-java to 1.1.1.5, which improves error messages when attempting to deserialize empty inputs using SnappyInputStream (see xerial/snappy-java#89).

Author: Josh Rosen <[email protected]>
Author: Josh Rosen <[email protected]>

Closes #2911 from JoshRosen/upgrade-snappy-java and squashes the following commits:

adec96c [Josh Rosen] Use snappy-java 1.1.1.5
cc953d6 [Josh Rosen] [SPARK-4056] Upgrade snappy-java to 1.1.1.4

(cherry picked from commit 898b22a)
Signed-off-by: Josh Rosen <[email protected]>

Conflicts:
	pom.xml
@JoshRosen
Copy link
Contributor Author

Since merging this PR, I've started noticing some OOM failures in a test that uses SnappyOutputStream:

- partial aggregation without spill
*** RUN ABORTED ***
  java.lang.OutOfMemoryError: Java heap space
  at org.xerial.snappy.buffer.CachedBufferAllocator.allocate(CachedBufferAllocator.java:48)
  at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:96)
  at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:83)
  at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
  at org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1084)
  at org.apache.spark.storage.BlockManager$$anonfun$7.apply(BlockManager.scala:579)
  at org.apache.spark.storage.BlockManager$$anonfun$7.apply(BlockManager.scala:579)
  at org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:126)
  at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192)
  at org.apache.spark.util.collection.ExternalSorter.spillToMergeableFile(ExternalSorter.scala:300)

@xerial do you think there could be a memory leak in the new CachedBufferAllocator? xerial/snappy-java@1.1.1.3...develop

@JoshRosen
Copy link
Contributor Author

I've reverted this in commit 898b22a until we figure out what's going on.

JoshRosen added a commit to JoshRosen/spark that referenced this pull request Nov 15, 2014
We previously tried up upgrade to 1.1.1.5 in apache#2911 but reverted that
patch after discovering a memory leak in snappy-java.  This should
leak have been fixed in 1.1.1.6, though.
asfgit pushed a commit that referenced this pull request Nov 16, 2014
This upgrades snappy-java to 1.1.1.6, which includes a patch that improves error messages when attempting to deserialize empty inputs using SnappyInputStream (see xerial/snappy-java#89).

We previously tried up upgrade to 1.1.1.5 in #2911 but reverted that patch after discovering a memory leak in snappy-java.  This should leak have been fixed in 1.1.1.6, though (see xerial/snappy-java#92).

Author: Josh Rosen <[email protected]>

Closes #3287 from JoshRosen/SPARK-4419 and squashes the following commits:

5d6f4cc [Josh Rosen] [SPARK-4419] Upgrade snappy-java to 1.1.1.6.
asfgit pushed a commit that referenced this pull request Nov 16, 2014
This upgrades snappy-java to 1.1.1.6, which includes a patch that improves error messages when attempting to deserialize empty inputs using SnappyInputStream (see xerial/snappy-java#89).

We previously tried up upgrade to 1.1.1.5 in #2911 but reverted that patch after discovering a memory leak in snappy-java.  This should leak have been fixed in 1.1.1.6, though (see xerial/snappy-java#92).

Author: Josh Rosen <[email protected]>

Closes #3287 from JoshRosen/SPARK-4419 and squashes the following commits:

5d6f4cc [Josh Rosen] [SPARK-4419] Upgrade snappy-java to 1.1.1.6.

(cherry picked from commit 7d8e152)
Signed-off-by: Reynold Xin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants