Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7862] [SQL] Disable the error message redirect to stderr #6882

Closed
wants to merge 6 commits into from

Conversation

chenghao-intel
Copy link
Contributor

This is a follow up of #6404, the ScriptTransformation prints the error msg into stderr directly, probably be a disaster for application log.

@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35129 has finished for PR 6882 at commit 402f746.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ElementwiseProduct(VectorTransformer):
    • case class CreateStruct(children: Seq[Expression]) extends Expression
    • case class Logarithm(left: Expression, right: Expression)
    • case class SetCommand(kv: Option[(String, Option[String])]) extends RunnableCommand with Logging

@@ -175,6 +173,19 @@ case class ScriptTransformation(
}
}).start()

// Consume the error stream from the pipeline, otherwise it will be blocked if
// the pipeline is full.
new Thread(new Runnable() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add names to both of these threads?

@marmbrus
Copy link
Contributor

Thanks for fixing this! Minor suggestion otherwise LGTM. It would also be good if follow-ups like these got a new JIRA. Its hard to track progress on tickets that are already closed.

@marmbrus
Copy link
Contributor

For this case I can just reopen the JIRA though.

@chenghao-intel
Copy link
Contributor Author

Thank you @marmbrus that's a good idea, updated!.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35199 has finished for PR 6882 at commit bf3d592.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


// Consume the error stream from the pipeline, otherwise it will be blocked if
// the pipeline is full.
new Thread(new Runnable() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of writing another thread class, can you use RedirectThread, which is used for the same purpose elsewhere, to dump the data?
Also, it's slow to read a byte at a time here.

@chenghao-intel
Copy link
Contributor Author

Thank you @srowen that's a good suggestion, I've also move the CircularBuffer into the Utils.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35274 has finished for PR 6882 at commit 09dd5b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -2333,3 +2333,34 @@ private[spark] class RedirectThread(
}
}
}

/**
* Circular buffer, which consume all of the data write to it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An [[OutputStream]] that will store the last 10 kilobytes written to it in a circular buffer. The current contents of the buffer can be accessed using the toString method.

liancheng added a commit to liancheng/spark that referenced this pull request Jun 21, 2015
asfgit pushed a commit that referenced this pull request Jun 21, 2015
… output until #6882 is merged

Currently [the test case for SPARK-7862] [1] writes 100,000 lines of integer triples to stderr and makes Jenkins build output unnecessarily large and it's hard to debug other build errors. A proper fix is on the way in #6882. This PR ignores this test case temporarily until #6882 is merged.

[1]: https://github.com/apache/spark/pull/6404/files#diff-1ea02a6fab84e938582f7f87cc4d9ea1R641

Author: Cheng Lian <[email protected]>

Closes #6925 from liancheng/spark-8508 and squashes the following commits:

41e5b47 [Cheng Lian] Ignores the test case until #6882 is merged
@yhuai
Copy link
Contributor

yhuai commented Jun 22, 2015

btw, let's enable "test script transform for stderr" in this pr.

@SparkQA
Copy link

SparkQA commented Jun 23, 2015

Test build #35515 has finished for PR 6882 at commit 5ccbfaa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

*/
private[spark] class CircularBuffer(sizeInByte: Int = 10240) extends java.io.OutputStream {
var pos: Int = 0
var buffer = new Array[Int](sizeInByte / 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you dividing by 4? That means you are not actually storing the promised number of bytes from the output stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buffer actually the Array of Int, not array of Byte, that's why I did this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that is correct but the class comments do not line up with what the implementation does anymore. The contract of an OutputStream is that each time you call write() it takes in a byte value (a number between 0-255). The 24 high-order bits of the value are ignored. The JVM represents this as an Int only to mirror the InputStream, which needs to distinguish -1 from 255.

Those details aside, if you tell me that you are going to store the last 10 kilobytes, I would expect that you are going to store the input I gave you as a result of the last 10240 invocations of the write function call. I don't care how you are actually representing the data internally. As such, it seems that you are lying to the user and only storing a 25% of bytes that you promised to.

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35640 has finished for PR 6882 at commit ed8f875.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor Author

Seems not related to my change.
retest this please

@chenghao-intel
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35659 has finished for PR 6882 at commit ed8f875.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 26, 2015

Test build #35848 has finished for PR 6882 at commit 4316d07.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 28, 2015

Test build #35912 has finished for PR 6882 at commit 4316d07.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 29, 2015

Test build #35955 has finished for PR 6882 at commit bfedd77.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

Thanks, merged to master.

@asfgit asfgit closed this in c6ba2ea Jun 29, 2015
@chenghao-intel chenghao-intel deleted the verbose branch July 2, 2015 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants