[SPARK-7862] [SQL] Disable the error message redirect to stderr #6882

chenghao-intel · 2015-06-18T12:23:12Z

This is a follow up of #6404, the ScriptTransformation prints the error msg into stderr directly, probably be a disaster for application log.

SparkQA · 2015-06-18T14:40:49Z

Test build #35129 has finished for PR 6882 at commit 402f746.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class ElementwiseProduct(VectorTransformer):
- case class CreateStruct(children: Seq[Expression]) extends Expression
- case class Logarithm(left: Expression, right: Expression)
- case class SetCommand(kv: Option[(String, Option[String])]) extends RunnableCommand with Logging

marmbrus · 2015-06-18T19:53:18Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala

@@ -175,6 +173,19 @@ case class ScriptTransformation(
        }
      }).start()

+      // Consume the error stream from the pipeline, otherwise it will be blocked if
+      // the pipeline is full.
+      new Thread(new Runnable() {


can we add names to both of these threads?

marmbrus · 2015-06-18T19:54:53Z

Thanks for fixing this! Minor suggestion otherwise LGTM. It would also be good if follow-ups like these got a new JIRA. Its hard to track progress on tickets that are already closed.

marmbrus · 2015-06-18T19:55:28Z

For this case I can just reopen the JIRA though.

chenghao-intel · 2015-06-19T00:34:00Z

Thank you @marmbrus that's a good idea, updated!.

SparkQA · 2015-06-19T02:34:10Z

Test build #35199 has finished for PR 6882 at commit bf3d592.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2015-06-19T12:54:57Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala

+
+      // Consume the error stream from the pipeline, otherwise it will be blocked if
+      // the pipeline is full.
+      new Thread(new Runnable() {


Instead of writing another thread class, can you use RedirectThread, which is used for the same purpose elsewhere, to dump the data?
Also, it's slow to read a byte at a time here.

chenghao-intel · 2015-06-19T15:22:56Z

Thank you @srowen that's a good suggestion, I've also move the CircularBuffer into the Utils.

SparkQA · 2015-06-19T17:44:41Z

Test build #35274 has finished for PR 6882 at commit 09dd5b9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2015-06-19T18:49:52Z

core/src/main/scala/org/apache/spark/util/Utils.scala

@@ -2333,3 +2333,34 @@ private[spark] class RedirectThread(
    }
  }
 }
+
+/**
+ * Circular buffer, which consume all of the data write to it.


An [[OutputStream]] that will store the last 10 kilobytes written to it in a circular buffer. The current contents of the buffer can be accessed using the toString method.

… output until #6882 is merged Currently [the test case for SPARK-7862] [1] writes 100,000 lines of integer triples to stderr and makes Jenkins build output unnecessarily large and it's hard to debug other build errors. A proper fix is on the way in #6882. This PR ignores this test case temporarily until #6882 is merged. [1]: https://github.com/apache/spark/pull/6404/files#diff-1ea02a6fab84e938582f7f87cc4d9ea1R641 Author: Cheng Lian <[email protected]> Closes #6925 from liancheng/spark-8508 and squashes the following commits: 41e5b47 [Cheng Lian] Ignores the test case until #6882 is merged

yhuai · 2015-06-22T15:21:55Z

btw, let's enable "test script transform for stderr" in this pr.

SparkQA · 2015-06-23T07:21:25Z

Test build #35515 has finished for PR 6882 at commit 5ccbfaa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2015-06-23T18:29:51Z

core/src/main/scala/org/apache/spark/util/Utils.scala

+ */
+private[spark] class CircularBuffer(sizeInByte: Int = 10240) extends java.io.OutputStream {
+  var pos: Int = 0
+  var buffer = new Array[Int](sizeInByte / 4)


Why are you dividing by 4? That means you are not actually storing the promised number of bytes from the output stream.

The buffer actually the Array of Int, not array of Byte, that's why I did this.

Okay, that is correct but the class comments do not line up with what the implementation does anymore. The contract of an OutputStream is that each time you call write() it takes in a byte value (a number between 0-255). The 24 high-order bits of the value are ignored. The JVM represents this as an Int only to mirror the InputStream, which needs to distinguish -1 from 255.

Those details aside, if you tell me that you are going to store the last 10 kilobytes, I would expect that you are going to store the input I gave you as a result of the last 10240 invocations of the write function call. I don't care how you are actually representing the data internally. As such, it seems that you are lying to the user and only storing a 25% of bytes that you promised to.

SparkQA · 2015-06-24T05:35:54Z

Test build #35640 has finished for PR 6882 at commit ed8f875.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

chenghao-intel · 2015-06-24T06:15:52Z

Seems not related to my change.
retest this please

chenghao-intel · 2015-06-24T07:19:10Z

retest this please

SparkQA · 2015-06-24T09:56:29Z

Test build #35659 has finished for PR 6882 at commit ed8f875.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-06-26T10:49:35Z

Test build #35848 has finished for PR 6882 at commit 4316d07.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

chenghao-intel · 2015-06-28T02:44:49Z

retest this please

SparkQA · 2015-06-28T05:12:40Z

Test build #35912 has finished for PR 6882 at commit 4316d07.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-06-29T07:00:43Z

Test build #35955 has finished for PR 6882 at commit bfedd77.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2015-06-29T19:47:06Z

Thanks, merged to master.

marmbrus reviewed Jun 18, 2015
View reviewed changes

srowen reviewed Jun 19, 2015
View reviewed changes

marmbrus reviewed Jun 19, 2015
View reviewed changes

chenghao-intel mentioned this pull request Jun 21, 2015

[SPARK-8508] [SQL] Ignores a test case to cleanup unnecessary testing output until #6882 is merged #6925

Closed

liancheng added a commit to liancheng/spark that referenced this pull request Jun 21, 2015

Ignores the test case until apache#6882 is merged

41e5b47

chenghao-intel force-pushed the verbose branch from 09dd5b9 to 5ccbfaa Compare June 23, 2015 04:54

marmbrus reviewed Jun 23, 2015
View reviewed changes

chenghao-intel added 3 commits June 28, 2015 21:20

disable the error message redirection for stderr

8536e81

naming the threads in ScriptTransformation

1de771d

Use the RedirectThread instead

47e0970

chenghao-intel added 3 commits June 28, 2015 21:22

check the process exitValue for ScriptTransform

692b19e

update the CircularBuffer

76ff46b

revert the write

bfedd77

chenghao-intel force-pushed the verbose branch from 4316d07 to bfedd77 Compare June 29, 2015 04:36

asfgit closed this in c6ba2ea Jun 29, 2015

chenghao-intel deleted the verbose branch July 2, 2015 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-7862] [SQL] Disable the error message redirect to stderr #6882

[SPARK-7862] [SQL] Disable the error message redirect to stderr #6882

chenghao-intel commented Jun 18, 2015

SparkQA commented Jun 18, 2015

marmbrus Jun 18, 2015

marmbrus commented Jun 18, 2015

marmbrus commented Jun 18, 2015

chenghao-intel commented Jun 19, 2015

SparkQA commented Jun 19, 2015

srowen Jun 19, 2015

chenghao-intel commented Jun 19, 2015

SparkQA commented Jun 19, 2015

marmbrus Jun 19, 2015

yhuai commented Jun 22, 2015

SparkQA commented Jun 23, 2015

marmbrus Jun 23, 2015

chenghao-intel Jun 24, 2015

marmbrus Jun 24, 2015

SparkQA commented Jun 24, 2015

chenghao-intel commented Jun 24, 2015

chenghao-intel commented Jun 24, 2015

SparkQA commented Jun 24, 2015

SparkQA commented Jun 26, 2015

chenghao-intel commented Jun 28, 2015

SparkQA commented Jun 28, 2015

SparkQA commented Jun 29, 2015

marmbrus commented Jun 29, 2015

[SPARK-7862] [SQL] Disable the error message redirect to stderr #6882

[SPARK-7862] [SQL] Disable the error message redirect to stderr #6882

Conversation

chenghao-intel commented Jun 18, 2015

SparkQA commented Jun 18, 2015

marmbrus Jun 18, 2015

Choose a reason for hiding this comment

marmbrus commented Jun 18, 2015

marmbrus commented Jun 18, 2015

chenghao-intel commented Jun 19, 2015

SparkQA commented Jun 19, 2015

srowen Jun 19, 2015

Choose a reason for hiding this comment

chenghao-intel commented Jun 19, 2015

SparkQA commented Jun 19, 2015

marmbrus Jun 19, 2015

Choose a reason for hiding this comment

yhuai commented Jun 22, 2015

SparkQA commented Jun 23, 2015

marmbrus Jun 23, 2015

Choose a reason for hiding this comment

chenghao-intel Jun 24, 2015

Choose a reason for hiding this comment

marmbrus Jun 24, 2015

Choose a reason for hiding this comment

SparkQA commented Jun 24, 2015

chenghao-intel commented Jun 24, 2015

chenghao-intel commented Jun 24, 2015

SparkQA commented Jun 24, 2015

SparkQA commented Jun 26, 2015

chenghao-intel commented Jun 28, 2015

SparkQA commented Jun 28, 2015

SparkQA commented Jun 29, 2015

marmbrus commented Jun 29, 2015