-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7862] [SQL] Disable the error message redirect to stderr #6882
Conversation
Test build #35129 has finished for PR 6882 at commit
|
@@ -175,6 +173,19 @@ case class ScriptTransformation( | |||
} | |||
}).start() | |||
|
|||
// Consume the error stream from the pipeline, otherwise it will be blocked if | |||
// the pipeline is full. | |||
new Thread(new Runnable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add names to both of these threads?
Thanks for fixing this! Minor suggestion otherwise LGTM. It would also be good if follow-ups like these got a new JIRA. Its hard to track progress on tickets that are already closed. |
For this case I can just reopen the JIRA though. |
Thank you @marmbrus that's a good idea, updated!. |
Test build #35199 has finished for PR 6882 at commit
|
|
||
// Consume the error stream from the pipeline, otherwise it will be blocked if | ||
// the pipeline is full. | ||
new Thread(new Runnable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of writing another thread class, can you use RedirectThread
, which is used for the same purpose elsewhere, to dump the data?
Also, it's slow to read a byte at a time here.
Thank you @srowen that's a good suggestion, I've also move the |
Test build #35274 has finished for PR 6882 at commit
|
@@ -2333,3 +2333,34 @@ private[spark] class RedirectThread( | |||
} | |||
} | |||
} | |||
|
|||
/** | |||
* Circular buffer, which consume all of the data write to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An [[OutputStream]] that will store the last 10 kilobytes written to it in a circular buffer. The current contents of the buffer can be accessed using the toString
method.
… output until #6882 is merged Currently [the test case for SPARK-7862] [1] writes 100,000 lines of integer triples to stderr and makes Jenkins build output unnecessarily large and it's hard to debug other build errors. A proper fix is on the way in #6882. This PR ignores this test case temporarily until #6882 is merged. [1]: https://github.com/apache/spark/pull/6404/files#diff-1ea02a6fab84e938582f7f87cc4d9ea1R641 Author: Cheng Lian <[email protected]> Closes #6925 from liancheng/spark-8508 and squashes the following commits: 41e5b47 [Cheng Lian] Ignores the test case until #6882 is merged
btw, let's enable "test script transform for stderr" in this pr. |
Test build #35515 has finished for PR 6882 at commit
|
*/ | ||
private[spark] class CircularBuffer(sizeInByte: Int = 10240) extends java.io.OutputStream { | ||
var pos: Int = 0 | ||
var buffer = new Array[Int](sizeInByte / 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you dividing by 4? That means you are not actually storing the promised number of bytes from the output stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The buffer
actually the Array of Int, not array of Byte, that's why I did this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that is correct but the class comments do not line up with what the implementation does anymore. The contract of an OutputStream
is that each time you call write()
it takes in a byte value (a number between 0-255). The 24 high-order bits of the value are ignored. The JVM represents this as an Int
only to mirror the InputStream, which needs to distinguish -1 from 255.
Those details aside, if you tell me that you are going to store the last 10 kilobytes
, I would expect that you are going to store the input I gave you as a result of the last 10240 invocations of the write function call. I don't care how you are actually representing the data internally. As such, it seems that you are lying to the user and only storing a 25% of bytes that you promised to.
Test build #35640 has finished for PR 6882 at commit
|
Seems not related to my change. |
retest this please |
Test build #35659 has finished for PR 6882 at commit
|
Test build #35848 has finished for PR 6882 at commit
|
retest this please |
Test build #35912 has finished for PR 6882 at commit
|
Test build #35955 has finished for PR 6882 at commit
|
Thanks, merged to master. |
This is a follow up of #6404, the ScriptTransformation prints the error msg into stderr directly, probably be a disaster for application log.