[SPARK-12084][Core]Fix codes that uses ByteBuffer.array incorrectly #10083

zsxwing · 2015-12-02T00:25:15Z

ByteBuffer doesn't guarantee all contents in ByteBuffer.array are valid. E.g, a ByteBuffer returned by ByteBuffer.slice. We should not use the whole content of ByteBuffer unless we know that's correct.

This patch fixed all places that use ByteBuffer.array incorrectly.

ByteBuffer doesn't guarantee all contents in `ByteBuffer.array` are valid. E.g, a ByteBuffer returned by ByteBuffer.slice. We should not use the whole content of `ByteBuffer` unless we know that's correct. This patch fixed all places that use `ByteBuffer.array` incorrectly.

zsxwing · 2015-12-02T00:26:34Z

/cc @andrewor14 @JoshRosen @tdas @vanzin @srowen

It's better to have more eyes review this one since it touches a lot of files.

SparkQA · 2015-12-02T02:54:29Z

Test build #47020 has finished for PR 10083 at commit 93b68de.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-02T03:12:19Z

Test build #47022 has finished for PR 10083 at commit bfb3360.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2015-12-02T10:13:06Z

examples/src/main/scala/org/apache/spark/examples/pythonconverters/AvroConverters.scala

@@ -79,7 +79,10 @@ object AvroConversionUtil extends Serializable {

  def unpackBytes(obj: Any): Array[Byte] = {
    val bytes: Array[Byte] = obj match {
-      case buf: java.nio.ByteBuffer => buf.array()
+      case buf: java.nio.ByteBuffer =>


You can't use bufferToArray here too?

This is in examples. So I don't want to use private API.

srowen · 2015-12-02T10:16:17Z

Looking good. Besides those comments, the changes all looked sound to me.

zsxwing · 2015-12-02T18:11:39Z

core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala

@@ -81,7 +81,10 @@ private[serializer] class GenericAvroSerializer(schemas: Map[Long, String])
   * seen values so to limit the number of times that decompression has to be done.
   */
  def decompress(schemaBytes: ByteBuffer): Schema = decompressCache.getOrElseUpdate(schemaBytes, {
-    val bis = new ByteArrayInputStream(schemaBytes.array())
+    val bis = new ByteArrayInputStream(


Because decompressCache puts ByteBuffer as a key, here should not change the schemaBytes's position, so cannot use ByteBufferInputStream here.

vanzin · 2015-12-02T19:40:40Z

core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala

@@ -307,7 +307,7 @@ private[spark] class KryoSerializerInstance(ks: KryoSerializer) extends Serializ
  override def deserialize[T: ClassTag](bytes: ByteBuffer): T = {
    val kryo = borrowKryo()
    try {
-      input.setBuffer(bytes.array)
+      input.setBuffer(bytes.array(), bytes.arrayOffset() + bytes.position(), bytes.remaining())


Not necessary for this change, but at some point it might be worth it to change this to use Kryo's ByteBufferInput.

Kryo will use the array as an internal buffer. Why it's not necessary?

I'm saying that the change I proposed is not necessary, not that your change is not necessary.

I'm saying that the change I proposed is not necessary, not that your change is not necessary.

Got it. Sorry for my misunderstanding.

vanzin · 2015-12-02T19:48:17Z

LGTM.

SparkQA · 2015-12-02T20:08:28Z

Test build #47077 has finished for PR 10083 at commit a5d965c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

… position

SparkQA · 2015-12-02T21:06:49Z

Test build #47083 has started for PR 10083 at commit 81d1812.

zsxwing · 2015-12-03T07:39:31Z

retest this please

SparkQA · 2015-12-03T10:00:53Z

Test build #47126 has finished for PR 10083 at commit 81d1812.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2015-12-05T01:01:39Z

Ok, merging to master. I assume we don't want this in 1.6 at this point?

srowen · 2015-12-05T14:26:48Z

@vanzin @zsxwing it looked like enough of an important fix to go into 1.6.x -- any strong objections to that?

vanzin · 2015-12-05T22:33:49Z

@srowen I think this is not a bug currently, because the code works based on the buffers being created according to the assumptions being made. But this is needed to unblock SPARK-12060, which breaks the assumption.

…lize Merged #10051 again since #10083 is resolved. This reverts commit 328b757. Author: Shixiong Zhu <[email protected]> Closes #10167 from zsxwing/merge-SPARK-12060.

Fix KryoSerializer

bfb3360

srowen reviewed Dec 2, 2015
View reviewed changes

Fix GenericAvroSerializer and address comments

a5d965c

zsxwing reviewed Dec 2, 2015
View reviewed changes

zsxwing mentioned this pull request Dec 2, 2015

[SPARK-12078][Core]Fix ByteBuffer.limit misuse #10076

Closed

vanzin reviewed Dec 2, 2015
View reviewed changes

Fix the bug that BatchedWriteAheadLog.deaggregate doesn't restore the…

81d1812

… position

asfgit closed this in 3af53e6 Dec 5, 2015

zsxwing deleted the bytebuffer-array branch December 6, 2015 23:09

zsxwing mentioned this pull request Dec 7, 2015

[SPARK-12060][CORE] Avoid memory copy in JavaSerializerInstance.serialize #10167

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-12084][Core]Fix codes that uses ByteBuffer.array incorrectly #10083

[SPARK-12084][Core]Fix codes that uses ByteBuffer.array incorrectly #10083

zsxwing commented Dec 2, 2015

zsxwing commented Dec 2, 2015

SparkQA commented Dec 2, 2015

SparkQA commented Dec 2, 2015

srowen Dec 2, 2015

zsxwing Dec 2, 2015

srowen commented Dec 2, 2015

zsxwing Dec 2, 2015

vanzin Dec 2, 2015

zsxwing Dec 2, 2015

vanzin Dec 2, 2015

zsxwing Dec 2, 2015

vanzin commented Dec 2, 2015

SparkQA commented Dec 2, 2015

SparkQA commented Dec 2, 2015

zsxwing commented Dec 3, 2015

SparkQA commented Dec 3, 2015

vanzin commented Dec 5, 2015

srowen commented Dec 5, 2015

vanzin commented Dec 5, 2015

[SPARK-12084][Core]Fix codes that uses ByteBuffer.array incorrectly #10083

[SPARK-12084][Core]Fix codes that uses ByteBuffer.array incorrectly #10083

Conversation

zsxwing commented Dec 2, 2015

zsxwing commented Dec 2, 2015

SparkQA commented Dec 2, 2015

SparkQA commented Dec 2, 2015

srowen Dec 2, 2015

Choose a reason for hiding this comment

zsxwing Dec 2, 2015

Choose a reason for hiding this comment

srowen commented Dec 2, 2015

zsxwing Dec 2, 2015

Choose a reason for hiding this comment

vanzin Dec 2, 2015

Choose a reason for hiding this comment

zsxwing Dec 2, 2015

Choose a reason for hiding this comment

vanzin Dec 2, 2015

Choose a reason for hiding this comment

zsxwing Dec 2, 2015

Choose a reason for hiding this comment

vanzin commented Dec 2, 2015

SparkQA commented Dec 2, 2015

SparkQA commented Dec 2, 2015

zsxwing commented Dec 3, 2015

SparkQA commented Dec 3, 2015

vanzin commented Dec 5, 2015

srowen commented Dec 5, 2015

vanzin commented Dec 5, 2015