Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-15285][SQL] Generated SpecificSafeProjection.apply method grows beyond 64 KB #13243

Closed
wants to merge 10 commits into from
Original file line number Diff line number Diff line change
Expand Up @@ -232,27 +232,47 @@ case class NewInstance(

override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
val javaType = ctx.javaType(dataType)
val argGen = arguments.map(_.genCode(ctx))
val argString = argGen.map(_.value).mkString(", ")
val argIsNulls = ctx.freshName("argIsNulls")
ctx.addMutableState("boolean[]", argIsNulls,
s"$argIsNulls = new boolean[${arguments.size}];")
val argValues = arguments.zipWithIndex.map { case (e, i) =>
val argValue = ctx.freshName("argValue")
ctx.addMutableState(ctx.javaType(e.dataType), argValue, "")
argValue
}

val argCodes = arguments.zipWithIndex.map { case (e, i) =>
val expr = e.genCode(ctx)
expr.code + s"""
$argIsNulls[$i] = ${expr.isNull};
${argValues(i)} = ${expr.value};
"""
}
val argCode = ctx.splitExpressions(ctx.INPUT_ROW, argCodes)

val outer = outerPointer.map(func => Literal.fromObject(func()).genCode(ctx))

var isNull = ev.isNull
val setIsNull = if (propagateNull && arguments.nonEmpty) {
s"final boolean $isNull = ${argGen.map(_.isNull).mkString(" || ")};"
s"""
boolean $isNull = false;
for (int idx = 0; idx < ${arguments.length}; idx++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have some benchmark number for it? Especially for 1 or 2 arguments. If there is a lot of improvement, we can specialize the code when there are 1 or 2 arguments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was afraid of increasing code size. However, in the case of 1 or 2 arguments, the code size of generating argument value are relatively small than the cases with large number of arguments. As a result, code size here would not be an issue.
Thus, the latest version always uses for-loop.

if ($argIsNulls[idx]) { $isNull = true; break; }
}
"""
} else {
isNull = "false"
""
}

val constructorCall = outer.map { gen =>
s"""${gen.value}.new ${cls.getSimpleName}($argString)"""
s"""${gen.value}.new ${cls.getSimpleName}(${argValues.mkString(", ")})"""
}.getOrElse {
s"new $className($argString)"
s"new $className(${argValues.mkString(", ")})"
}

val code = s"""
${argGen.map(_.code).mkString("\n")}
$argCode
${outer.map(_.code).getOrElse("")}
$setIsNull
final $javaType ${ev.value} = $isNull ? ${ctx.defaultValue(javaType)} : $constructorCall;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,4 +58,38 @@ class DataFrameComplexTypeSuite extends QueryTest with SharedSQLContext {
val nullIntRow = df.selectExpr("i[1]").collect()(0)
assert(nullIntRow == org.apache.spark.sql.Row(null))
}

test("SPARK-15285 Generated SpecificSafeProjection.apply method grows beyond 64KB") {
val ds100_5 = Seq(S100_5()).toDS()
ds100_5.show
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not use show in test, can collect reproduce this bug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use collect cannot reproduce this bug. I think that to use show involves code generation for Projection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this is weird, the main logic in show is calling collect to get the data, can you double check it? and how about ds.rdd.collect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I conformed that to call collect cannot reproduce this bug in the latest master branch. However, I confirmed that to call ds.rdd.collect can reproduce this bug. The latest code uses ds.rdd.collect.

}
}

case class S100(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scala 2.10 doesn't support large case class. We can create a new test suite under scala-2.11/src/test and put this test there, so that we only run it under scala 2.10. repl module is a good example about it. cc @kiszk do you mind resend this PR with the fix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take this approach for running test suite without scala 2.10.

s1: String = "1", s2: String = "2", s3: String = "3", s4: String = "4",
s5: String = "5", s6: String = "6", s7: String = "7", s8: String = "8",
s9: String = "9", s10: String = "10", s11: String = "11", s12: String = "12",
s13: String = "13", s14: String = "14", s15: String = "15", s16: String = "16",
s17: String = "17", s18: String = "18", s19: String = "19", s20: String = "20",
s21: String = "21", s22: String = "22", s23: String = "23", s24: String = "24",
s25: String = "25", s26: String = "26", s27: String = "27", s28: String = "28",
s29: String = "29", s30: String = "30", s31: String = "31", s32: String = "32",
s33: String = "33", s34: String = "34", s35: String = "35", s36: String = "36",
s37: String = "37", s38: String = "38", s39: String = "39", s40: String = "40",
s41: String = "41", s42: String = "42", s43: String = "43", s44: String = "44",
s45: String = "45", s46: String = "46", s47: String = "47", s48: String = "48",
s49: String = "49", s50: String = "50", s51: String = "51", s52: String = "52",
s53: String = "53", s54: String = "54", s55: String = "55", s56: String = "56",
s57: String = "57", s58: String = "58", s59: String = "59", s60: String = "60",
s61: String = "61", s62: String = "62", s63: String = "63", s64: String = "64",
s65: String = "65", s66: String = "66", s67: String = "67", s68: String = "68",
s69: String = "69", s70: String = "70", s71: String = "71", s72: String = "72",
s73: String = "73", s74: String = "74", s75: String = "75", s76: String = "76",
s77: String = "77", s78: String = "78", s79: String = "79", s80: String = "80",
s81: String = "81", s82: String = "82", s83: String = "83", s84: String = "84",
s85: String = "85", s86: String = "86", s87: String = "87", s88: String = "88",
s89: String = "89", s90: String = "90", s91: String = "91", s92: String = "92",
s93: String = "93", s94: String = "94", s95: String = "95", s96: String = "96",
s97: String = "97", s98: String = "98", s99: String = "99", s100: String = "100")
case class S100_5(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a blank line between these 2 classes.

s1: S100 = S100(), s2: S100 = S100(), s3: S100 = S100(), s4: S100 = S100(), s5: S100 = S100())