Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12084][Core]Fix codes that uses ByteBuffer.array incorrectly #10083

Closed
wants to merge 4 commits into from
Closed

[SPARK-12084][Core]Fix codes that uses ByteBuffer.array incorrectly #10083

wants to merge 4 commits into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Dec 2, 2015

ByteBuffer doesn't guarantee all contents in ByteBuffer.array are valid. E.g, a ByteBuffer returned by ByteBuffer.slice. We should not use the whole content of ByteBuffer unless we know that's correct.

This patch fixed all places that use ByteBuffer.array incorrectly.

ByteBuffer doesn't guarantee all contents in `ByteBuffer.array` are valid. E.g, a ByteBuffer returned by ByteBuffer.slice. We should not use the whole content of `ByteBuffer` unless we know that's correct.

This patch fixed all places that use `ByteBuffer.array` incorrectly.
@zsxwing
Copy link
Member Author

zsxwing commented Dec 2, 2015

/cc @andrewor14 @JoshRosen @tdas @vanzin @srowen

It's better to have more eyes review this one since it touches a lot of files.

@SparkQA
Copy link

SparkQA commented Dec 2, 2015

Test build #47020 has finished for PR 10083 at commit 93b68de.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 2, 2015

Test build #47022 has finished for PR 10083 at commit bfb3360.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -79,7 +79,10 @@ object AvroConversionUtil extends Serializable {

def unpackBytes(obj: Any): Array[Byte] = {
val bytes: Array[Byte] = obj match {
case buf: java.nio.ByteBuffer => buf.array()
case buf: java.nio.ByteBuffer =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't use bufferToArray here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in examples. So I don't want to use private API.

@srowen
Copy link
Member

srowen commented Dec 2, 2015

Looking good. Besides those comments, the changes all looked sound to me.

@@ -81,7 +81,10 @@ private[serializer] class GenericAvroSerializer(schemas: Map[Long, String])
* seen values so to limit the number of times that decompression has to be done.
*/
def decompress(schemaBytes: ByteBuffer): Schema = decompressCache.getOrElseUpdate(schemaBytes, {
val bis = new ByteArrayInputStream(schemaBytes.array())
val bis = new ByteArrayInputStream(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because decompressCache puts ByteBuffer as a key, here should not change the schemaBytes's position, so cannot use ByteBufferInputStream here.

@@ -307,7 +307,7 @@ private[spark] class KryoSerializerInstance(ks: KryoSerializer) extends Serializ
override def deserialize[T: ClassTag](bytes: ByteBuffer): T = {
val kryo = borrowKryo()
try {
input.setBuffer(bytes.array)
input.setBuffer(bytes.array(), bytes.arrayOffset() + bytes.position(), bytes.remaining())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary for this change, but at some point it might be worth it to change this to use Kryo's ByteBufferInput.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kryo will use the array as an internal buffer. Why it's not necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying that the change I proposed is not necessary, not that your change is not necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying that the change I proposed is not necessary, not that your change is not necessary.

Got it. Sorry for my misunderstanding.

@vanzin
Copy link
Contributor

vanzin commented Dec 2, 2015

LGTM.

@SparkQA
Copy link

SparkQA commented Dec 2, 2015

Test build #47077 has finished for PR 10083 at commit a5d965c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 2, 2015

Test build #47083 has started for PR 10083 at commit 81d1812.

@zsxwing
Copy link
Member Author

zsxwing commented Dec 3, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Dec 3, 2015

Test build #47126 has finished for PR 10083 at commit 81d1812.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Dec 5, 2015

Ok, merging to master. I assume we don't want this in 1.6 at this point?

@asfgit asfgit closed this in 3af53e6 Dec 5, 2015
@srowen
Copy link
Member

srowen commented Dec 5, 2015

@vanzin @zsxwing it looked like enough of an important fix to go into 1.6.x -- any strong objections to that?

@vanzin
Copy link
Contributor

vanzin commented Dec 5, 2015

@srowen I think this is not a bug currently, because the code works based on the buffers being created according to the assumptions being made. But this is needed to unblock SPARK-12060, which breaks the assumption.

@zsxwing zsxwing deleted the bytebuffer-array branch December 6, 2015 23:09
asfgit pushed a commit that referenced this pull request Dec 7, 2015
…lize

Merged #10051 again since #10083 is resolved.

This reverts commit 328b757.

Author: Shixiong Zhu <[email protected]>

Closes #10167 from zsxwing/merge-SPARK-12060.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants