-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7180][SPARK-8090][SPARK-8091] Fix a number of SerializationDebugger bugs and limitations #6625
Conversation
@andrewor14 Please take a look. |
@rxin Take a look. Adding comments, so slightly WIP. |
} else { | ||
val fields: Array[ObjectStreamField] = slotDesc.getFields | ||
val objFieldValues: Array[Object] = new Array[Object](slotDesc.getNumObjFields) | ||
val numPrims = fields.length - objFieldValues.length | ||
desc.getObjFieldValues(finalObj, objFieldValues) | ||
slotDesc.getObjFieldValues(finalObj, objFieldValues) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bug fix for SPARK-7180
|
||
private def visitSerializableWithWriteObjectMethod( | ||
o: Object, slotDesc: ObjectStreamClass, stack: List[String]): List[String] = { | ||
println(">>> processing serializable with writeObject" + o) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stray println
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already removed, waiting for more feedback to accumulate before pushing.
As discussed offline, this looks great. I am looking forward to your land mine of comments... |
Test build #34125 has finished for PR 6625 at commit
|
Test build #34139 has finished for PR 6625 at commit
|
@andrewor14 How about them comments? Also added more tests. 93% line coverage. |
Test build #34145 has finished for PR 6625 at commit
|
@@ -62,7 +62,7 @@ private[spark] object SerializationDebugger extends Logging { | |||
* | |||
* It does not yet handle writeObject override, but that shouldn't be too hard to do either. | |||
*/ | |||
def find(obj: Any): List[String] = { | |||
private[serializer] def find(obj: Any): List[String] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so that everyone uses improveException
which is safe to call.
Test build #34138 has finished for PR 6625 at commit
|
* Visit an externalizable object. | ||
* Since writeExternal() can choose add arbitrary objects at the time of serialization, | ||
* the only way to capture all the objects it will serialize is by using a | ||
* dummy ObjectOutput object that captures all the inner objects, and then visit all the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really long sentence and fairly low level. How about:
Since writeExternal() can choose add arbitrary objects at the time of serialization,
the only way to capture all the objects it will serialize is to manually write the
object into our own dummy ObjectOutput that collects the relevant fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I can shorten it for this, as well as for the new visit function.
Test build #34146 has finished for PR 6625 at commit
|
Test build #34154 has finished for PR 6625 at commit
|
} | ||
|
||
// Every class is associated with one or more "slots", each slot is related to the parent | ||
// classes of this class. These slots are used by the ObjectOutputStream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grammar nit: each slot refers to the parent class of this class
I left a bunch of english nits, but this is mergeable as is. LGTM. |
Bump @tdas @andrewor14, do we want to merge this now or should we wait for the comment changes? We may want to retest with Jenkins first, just in case there's any new style rules that this might fail. |
retest this please |
Test build #34851 has finished for PR 6625 at commit
|
Gentle bump again: do we want to merge this as-is and fix the comments later? |
@tdas is working on this right now I believe |
Test build #35156 has finished for PR 6625 at commit
|
@andrewor14 This is good to go. |
Ok, merging into master. |
I also merged this into 1.4.1. |
…ebugger bugs and limitations This PR solves three SerializationDebugger issues. * SPARK-7180 - SerializationDebugger fails with ArrayOutOfBoundsException * SPARK-8090 - SerializationDebugger does not handle classes with writeReplace correctly * SPARK-8091 - SerializationDebugger does not handle classes with writeObject method The solutions for each are explained as follows * SPARK-7180 - The wrong slot desc was used for getting the value of the fields in the object being tested. * SPARK-8090 - Test the type of the replaced object. * SPARK-8091 - Use a dummy ObjectOutputStream to collect all the objects written by the writeObject() method, and then test those objects as usual. I also added more tests in the testsuite to increase code coverage. For example, added tests for cases where there are not serializability issues. Author: Tathagata Das <[email protected]> Closes #6625 from tdas/SPARK-7180 and squashes the following commits: c7cb046 [Tathagata Das] Addressed comments on docs ae212c8 [Tathagata Das] Improved docs 304c97b [Tathagata Das] Fixed build error 26b5179 [Tathagata Das] more tests.....92% line coverage 7e2fdcf [Tathagata Das] Added more tests d1967fb [Tathagata Das] Added comments. da75d34 [Tathagata Das] Removed unnecessary lines. 50a608d [Tathagata Das] Fixed bugs and added support for writeObject
…ebugger bugs and limitations This PR solves three SerializationDebugger issues. * SPARK-7180 - SerializationDebugger fails with ArrayOutOfBoundsException * SPARK-8090 - SerializationDebugger does not handle classes with writeReplace correctly * SPARK-8091 - SerializationDebugger does not handle classes with writeObject method The solutions for each are explained as follows * SPARK-7180 - The wrong slot desc was used for getting the value of the fields in the object being tested. * SPARK-8090 - Test the type of the replaced object. * SPARK-8091 - Use a dummy ObjectOutputStream to collect all the objects written by the writeObject() method, and then test those objects as usual. I also added more tests in the testsuite to increase code coverage. For example, added tests for cases where there are not serializability issues. Author: Tathagata Das <[email protected]> Closes apache#6625 from tdas/SPARK-7180 and squashes the following commits: c7cb046 [Tathagata Das] Addressed comments on docs ae212c8 [Tathagata Das] Improved docs 304c97b [Tathagata Das] Fixed build error 26b5179 [Tathagata Das] more tests.....92% line coverage 7e2fdcf [Tathagata Das] Added more tests d1967fb [Tathagata Das] Added comments. da75d34 [Tathagata Das] Removed unnecessary lines. 50a608d [Tathagata Das] Fixed bugs and added support for writeObject
This PR solves three SerializationDebugger issues.
The solutions for each are explained as follows
I also added more tests in the testsuite to increase code coverage. For example, added tests for cases where there are not serializability issues.