Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7180][SPARK-8090][SPARK-8091] Fix a number of SerializationDebugger bugs and limitations #6625

Closed
wants to merge 8 commits into from

Conversation

tdas
Copy link
Contributor

@tdas tdas commented Jun 3, 2015

This PR solves three SerializationDebugger issues.

  • SPARK-7180 - SerializationDebugger fails with ArrayOutOfBoundsException
  • SPARK-8090 - SerializationDebugger does not handle classes with writeReplace correctly
  • SPARK-8091 - SerializationDebugger does not handle classes with writeObject method

The solutions for each are explained as follows

  • SPARK-7180 - The wrong slot desc was used for getting the value of the fields in the object being tested.
  • SPARK-8090 - Test the type of the replaced object.
  • SPARK-8091 - Use a dummy ObjectOutputStream to collect all the objects written by the writeObject() method, and then test those objects as usual.

I also added more tests in the testsuite to increase code coverage. For example, added tests for cases where there are not serializability issues.

@tdas tdas changed the title [SPARK-7180][SPARK-8090][SPARK-8191] [SPARK-7180][SPARK-8090][SPARK-8191] Fix a number of SerializableDebugger bugs and limitations Jun 3, 2015
@tdas tdas changed the title [SPARK-7180][SPARK-8090][SPARK-8191] Fix a number of SerializableDebugger bugs and limitations [SPARK-7180][SPARK-8090][SPARK-8191] Fix a number of SerializationDebugger bugs and limitations Jun 3, 2015
@tdas
Copy link
Contributor Author

tdas commented Jun 3, 2015

@andrewor14 Please take a look.
I am still adding more comments, so this is sort-of WIP.

@tdas
Copy link
Contributor Author

tdas commented Jun 3, 2015

@rxin Take a look. Adding comments, so slightly WIP.

} else {
val fields: Array[ObjectStreamField] = slotDesc.getFields
val objFieldValues: Array[Object] = new Array[Object](slotDesc.getNumObjFields)
val numPrims = fields.length - objFieldValues.length
desc.getObjFieldValues(finalObj, objFieldValues)
slotDesc.getObjFieldValues(finalObj, objFieldValues)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bug fix for SPARK-7180


private def visitSerializableWithWriteObjectMethod(
o: Object, slotDesc: ObjectStreamClass, stack: List[String]): List[String] = {
println(">>> processing serializable with writeObject" + o)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray println

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already removed, waiting for more feedback to accumulate before pushing.

@andrewor14
Copy link
Contributor

As discussed offline, this looks great. I am looking forward to your land mine of comments...

@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #34125 has finished for PR 6625 at commit da75d34.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #34139 has finished for PR 6625 at commit 7e2fdcf.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • // class ParentClass(parentField: Int)
    • // class ChildClass(childField: Int) extends Parent(1)
    • // If the class type corresponding to current slot has writeObject() defined,
    • // then its not obvious which fields of the class will be serialized as the writeObject()

@tdas
Copy link
Contributor Author

tdas commented Jun 4, 2015

@andrewor14 How about them comments? Also added more tests. 93% line coverage.

screen shot 2015-06-03 at 6 21 33 pm

@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #34145 has finished for PR 6625 at commit 26b5179.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • // class ParentClass(parentField: Int)
    • // class ChildClass(childField: Int) extends Parent(1)
    • // If the class type corresponding to current slot has writeObject() defined,
    • // then its not obvious which fields of the class will be serialized as the writeObject()

@@ -62,7 +62,7 @@ private[spark] object SerializationDebugger extends Logging {
*
* It does not yet handle writeObject override, but that shouldn't be too hard to do either.
*/
def find(obj: Any): List[String] = {
private[serializer] def find(obj: Any): List[String] = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so that everyone uses improveException which is safe to call.

@tdas tdas changed the title [SPARK-7180][SPARK-8090][SPARK-8191] Fix a number of SerializationDebugger bugs and limitations [SPARK-7180][SPARK-8090][SPARK-8091] Fix a number of SerializationDebugger bugs and limitations Jun 4, 2015
@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #34138 has finished for PR 6625 at commit d1967fb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • // class ParentClass(parentField: Int)
    • // class ChildClass(childField: Int) extends Parent(1)
    • // If the class type corresponding to current slot has writeObject() defined,
    • // then its not obvious which fields of the class will be serialized as the writeObject()

* Visit an externalizable object.
* Since writeExternal() can choose add arbitrary objects at the time of serialization,
* the only way to capture all the objects it will serialize is by using a
* dummy ObjectOutput object that captures all the inner objects, and then visit all the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really long sentence and fairly low level. How about:

Since writeExternal() can choose add arbitrary objects at the time of serialization,
the only way to capture all the objects it will serialize is to manually write the
object into our own dummy ObjectOutput that collects the relevant fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can shorten it for this, as well as for the new visit function.

@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #34146 has finished for PR 6625 at commit 304c97b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • // class ParentClass(parentField: Int)
    • // class ChildClass(childField: Int) extends Parent(1)
    • // If the class type corresponding to current slot has writeObject() defined,
    • // then its not obvious which fields of the class will be serialized as the writeObject()

@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #34154 has finished for PR 6625 at commit ae212c8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • // class ParentClass(parentField: Int)
    • // class ChildClass(childField: Int) extends ParentClass(1)
    • // If the class type corresponding to current slot has writeObject() defined,
    • // then its not obvious which fields of the class will be serialized as the writeObject()

}

// Every class is associated with one or more "slots", each slot is related to the parent
// classes of this class. These slots are used by the ObjectOutputStream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grammar nit: each slot refers to the parent class of this class

@andrewor14
Copy link
Contributor

I left a bunch of english nits, but this is mergeable as is. LGTM.

@JoshRosen
Copy link
Contributor

Bump @tdas @andrewor14, do we want to merge this now or should we wait for the comment changes? We may want to retest with Jenkins first, just in case there's any new style rules that this might fail.

@andrewor14
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jun 14, 2015

Test build #34851 has finished for PR 6625 at commit ae212c8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • // class ParentClass(parentField: Int)
    • // class ChildClass(childField: Int) extends ParentClass(1)
    • // If the class type corresponding to current slot has writeObject() defined,
    • // then its not obvious which fields of the class will be serialized as the writeObject()

@JoshRosen
Copy link
Contributor

Gentle bump again: do we want to merge this as-is and fix the comments later?

@andrewor14
Copy link
Contributor

@tdas is working on this right now I believe

@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35156 has finished for PR 6625 at commit c7cb046.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • // class ParentClass(parentField: Int)
    • // class ChildClass(childField: Int) extends ParentClass(1)
    • // If the class type corresponding to current slot has writeObject() defined,
    • // then its not obvious which fields of the class will be serialized as the writeObject()

@tdas
Copy link
Contributor Author

tdas commented Jun 18, 2015

@andrewor14 This is good to go.

@andrewor14
Copy link
Contributor

Ok, merging into master.

@asfgit asfgit closed this in 866816e Jun 19, 2015
@andrewor14
Copy link
Contributor

I also merged this into 1.4.1.

asfgit pushed a commit that referenced this pull request Jun 19, 2015
…ebugger bugs and limitations

This PR solves three SerializationDebugger issues.
* SPARK-7180 - SerializationDebugger fails with ArrayOutOfBoundsException
* SPARK-8090 - SerializationDebugger does not handle classes with writeReplace correctly
* SPARK-8091 - SerializationDebugger does not handle classes with writeObject method

The solutions for each are explained as follows
* SPARK-7180 - The wrong slot desc was used for getting the value of the fields in the object being tested.
* SPARK-8090 - Test the type of the replaced object.
* SPARK-8091 - Use a dummy ObjectOutputStream to collect all the objects written by the writeObject() method, and then test those objects as usual.

I also added more tests in the testsuite to increase code coverage. For example, added tests for cases where there are not serializability issues.

Author: Tathagata Das <[email protected]>

Closes #6625 from tdas/SPARK-7180 and squashes the following commits:

c7cb046 [Tathagata Das] Addressed comments on docs
ae212c8 [Tathagata Das] Improved docs
304c97b [Tathagata Das] Fixed build error
26b5179 [Tathagata Das] more tests.....92% line coverage
7e2fdcf [Tathagata Das] Added more tests
d1967fb [Tathagata Das] Added comments.
da75d34 [Tathagata Das] Removed unnecessary lines.
50a608d [Tathagata Das] Fixed bugs and added support for writeObject
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 22, 2015
…ebugger bugs and limitations

This PR solves three SerializationDebugger issues.
* SPARK-7180 - SerializationDebugger fails with ArrayOutOfBoundsException
* SPARK-8090 - SerializationDebugger does not handle classes with writeReplace correctly
* SPARK-8091 - SerializationDebugger does not handle classes with writeObject method

The solutions for each are explained as follows
* SPARK-7180 - The wrong slot desc was used for getting the value of the fields in the object being tested.
* SPARK-8090 - Test the type of the replaced object.
* SPARK-8091 - Use a dummy ObjectOutputStream to collect all the objects written by the writeObject() method, and then test those objects as usual.

I also added more tests in the testsuite to increase code coverage. For example, added tests for cases where there are not serializability issues.

Author: Tathagata Das <[email protected]>

Closes apache#6625 from tdas/SPARK-7180 and squashes the following commits:

c7cb046 [Tathagata Das] Addressed comments on docs
ae212c8 [Tathagata Das] Improved docs
304c97b [Tathagata Das] Fixed build error
26b5179 [Tathagata Das] more tests.....92% line coverage
7e2fdcf [Tathagata Das] Added more tests
d1967fb [Tathagata Das] Added comments.
da75d34 [Tathagata Das] Removed unnecessary lines.
50a608d [Tathagata Das] Fixed bugs and added support for writeObject
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants