Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine the size() calculation of accumulator #5904

Closed
wants to merge 1 commit into from

Conversation

yahoNanJing
Copy link
Contributor

@yahoNanJing yahoNanJing commented Apr 7, 2023

Which issue does this PR close?

Closes #5903.

Rationale for this change

From the flame graph generated by
CARGO_PROFILE_RELEASE_DEBUG=true cargo flamegraph --no-inline --freq 500 --bin tpch -- benchmark datafusion --path ./data-parquet/ --format parquet --partitions 1 -q 17 --iterations 10
flamegraph-main
There are around 4% spent on calling the size() of accumulator, which can be improved.

What changes are included in this PR?

  • Avoid unnecessary calling of ScalarValue's size().
  • Avoid duplicated calculation of the initial size of accumulator_set.
  • Pull out the mode check to the top level to avoid if-else judgement for every group.

After applying this PR, from the flame graph, we can see there's almost no cost on the size() of accumulator. The benchmark result is improved from 11s to 10.5s.
flamegraph-5903

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate physical-expr Physical Expressions labels Apr 7, 2023
@@ -266,7 +266,7 @@ impl Accumulator for SumAccumulator {
}

fn size(&self) -> usize {
std::mem::size_of_val(self) - std::mem::size_of_val(&self.sum) + self.sum.size()
std::mem::size_of_val(self)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this size change after the AvgAccumulator /SumAccumulator struct is initialized?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest rename the size method to size_in_bytes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not. Option is of Enum type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the flamegraph, it seems it's not a bottleneck now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docs

https://github.com/yahoNanJing/arrow-datafusion/blob/issue-5903/datafusion/expr/src/accumulator.rs#L88

    /// Allocated size required for this accumulator, in bytes, including `Self`.
    /// Allocated means that for internal containers such as `Vec`, the `capacity` should be used
    /// not the `len`
    fn size(&self) -> usize;

The change in this PR seems to avoid extra allocations in ScalarValue (such as ScalarValue::Utf8 which has an allocated string in it)

.fold(acc, |acc, accumulator| acc + accumulator.size())
})
}

/// The state that is built for each output group.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic is quite complex to collect the memory size of the accumulators, maybe the computation is more than the real useful aggregations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree the calculation for the memory size is complicated.

I think @tustvold has been thinking about how to improve performance in this area, but I am not sure how far he has gotten

In general, managing individual allocations (and then accounting for their sizes) for each group is a significant additional overhead for grouping.

@yahoNanJing
Copy link
Contributor Author

Hi @Dandandan, @alamb, could you help review this PR?

@alamb
Copy link
Contributor

alamb commented Apr 11, 2023

I will try and review this carefully later today

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yahoNanJing

The idea of caching the accumulator sizes seems like a good one.

I am not sure about the changes to Sum and Avg accumulators

Maybe we need to take a step back and figure out how to improve grouping performance more holistically rather than trying to do a special optimization on allocation accounting 🤔

For example #4973

@@ -266,7 +266,7 @@ impl Accumulator for SumAccumulator {
}

fn size(&self) -> usize {
std::mem::size_of_val(self) - std::mem::size_of_val(&self.sum) + self.sum.size()
std::mem::size_of_val(self)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docs

https://github.com/yahoNanJing/arrow-datafusion/blob/issue-5903/datafusion/expr/src/accumulator.rs#L88

    /// Allocated size required for this accumulator, in bytes, including `Self`.
    /// Allocated means that for internal containers such as `Vec`, the `capacity` should be used
    /// not the `len`
    fn size(&self) -> usize;

The change in this PR seems to avoid extra allocations in ScalarValue (such as ScalarValue::Utf8 which has an allocated string in it)

})
})?;
}
AggregateMode::FinalPartitioned | AggregateMode::Final => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe github is rendering the diff confusingly, but this seems like a significant amount of new code

.fold(acc, |acc, accumulator| acc + accumulator.size())
})
}

/// The state that is built for each output group.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree the calculation for the memory size is complicated.

I think @tustvold has been thinking about how to improve performance in this area, but I am not sure how far he has gotten

In general, managing individual allocations (and then accounting for their sizes) for each group is a significant additional overhead for grouping.

@yahoNanJing
Copy link
Contributor Author

Thanks @alamb for your comments. Just refactored the code based on latest main branch code.

@alamb
Copy link
Contributor

alamb commented May 12, 2023

I am trying to clean up outstanding PRs and I came across this one. What shall we do with it @yahoNanJing -- should we pursue getting it merged?

@alamb
Copy link
Contributor

alamb commented May 30, 2023

Marking as Draft until we come to a consensus on what to do with this PR (so it is not on the review list)

@alamb alamb marked this pull request as draft May 30, 2023 17:56
@alamb
Copy link
Contributor

alamb commented Apr 8, 2024

Since this has been open for more than a year, closing it down. Feel free to reopen if/when you keep working on it.

@alamb alamb closed this Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refine the size() calculation of accumulator
4 participants