-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refine decimal multiply, avoid cast to wider type #6331
Conversation
override def resultDecimalType(p1: Int, s1: Int, p2: Int, s2: Int): DecimalType = {
val resultScale = s1 + s2
val resultPrecision = p1 + p2 + 1
if (allowPrecisionLoss) {
DecimalType.adjustPrecisionScale(resultPrecision, resultScale)
} else {
DecimalType.bounded(resultPrecision, resultScale)
}
}
private lazy val numeric = TypeUtils.getNumeric(dataType, failOnError)
protected override def nullSafeEval(input1: Any, input2: Any): Any = dataType match {
case DecimalType.Fixed(precision, scale) =>
checkDecimalOverflow(numeric.times(input1, input2).asInstanceOf[Decimal], precision, scale)
case _: IntegerType if failOnError =>
MathUtils.multiplyExact(
input1.asInstanceOf[Int],
input2.asInstanceOf[Int],
getContextOrNull())
case _: LongType if failOnError =>
MathUtils.multiplyExact(
input1.asInstanceOf[Long],
input2.asInstanceOf[Long],
getContextOrNull())
case _ => numeric.times(input1, input2)
} |
override def checkInputDataTypes(): TypeCheckResult = (left.dataType, right.dataType) match {
case (l: DecimalType, r: DecimalType) if inputType.acceptsType(l) && inputType.acceptsType(r) =>
// We allow decimal type inputs with different precision and scale, and use special formulas
// to calculate the result precision and scale.
TypeCheckResult.TypeCheckSuccess
case _ => super.checkInputDataTypes()
} |
And existing UTs show the result is correct.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. This fixed the tpch-q1 performance. But for the root cause, we still need to improve the i256 operations. Otherwise for queries containing decimal array to array multiplying, it still suffers performance issue.
Yes, you are right. We still need to address the i256 operations performance, otherwise when it runs into the related logic, the performance is still not good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I'm going to improve the i256 operations at the upstream: apache/arrow-rs#4303. |
Which issue does this PR close?
Closes #6278, bring the performance back.
Rationale for this change
For decimal multiply, avoid casting to the wider type, decimals can multiply decimals directly.
What changes are included in this PR?
Are these changes tested?
Main branch:
Running benchmarks with the following options: DataFusionBenchmarkOpt { query: Some(1), debug: false, iterations: 3, partitions: 1, batch_size: 8192, path: "./parquet_data", file_format: "parquet", mem_table: false, output_path: None, disable_statistics: true }
Query 1 iteration 0 took 1716.3 ms and returned 4 rows
Query 1 iteration 1 took 1697.0 ms and returned 4 rows
Query 1 iteration 2 took 1694.3 ms and returned 4 rows
Query 1 avg time: 1702.52 ms
Branch 23 (Tag 23.0.0):
Running benchmarks with the following options: DataFusionBenchmarkOpt { query: Some(1), debug: false, iterations: 3, partitions: 1, batch_size: 8192, path: "./parquet_data", file_format: "parquet", mem_table: false, output_path: None, disable_statistics: true, enable_scheduler: false }
Query 1 iteration 0 took 864.2 ms and returned 4 rows
Query 1 iteration 1 took 842.0 ms and returned 4 rows
Query 1 iteration 2 took 838.7 ms and returned 4 rows
Query 1 avg time: 848.29 ms
This PR:
Running benchmarks with the following options: DataFusionBenchmarkOpt { query: Some(1), debug: false, iterations: 3, partitions: 1, batch_size: 8192, path: "./parquet_data", file_format: "parquet", mem_table: false, output_path: None, disable_statistics: true }
Query 1 iteration 0 took 539.1 ms and returned 4 rows
Query 1 iteration 1 took 514.7 ms and returned 4 rows
Query 1 iteration 2 took 511.1 ms and returned 4 rows
Query 1 avg time: 521.61 ms
Are there any user-facing changes?