Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: improve performance for BuildHistAndTopN #48902

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

hawkingrei
Copy link
Member

@hawkingrei hawkingrei commented Nov 26, 2023

What problem does this PR solve?

Issue Number: close #49180

Problem Summary:

What changed and how does it work?

Now, Analyze will take most of the time on the BuildHistAndTopN. BuildHistAndTopN has many unnecessary sort and compare with low NDV. so we can merge the same data then sort and compare. it will have better performance.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
➜  tidb git:(improve_against)  go test -benchmem -run=^$ -bench ^BenchmarkBuildHistAndTopNWithLowNDV github.com/pingcap/tidb/pkg/statistics --benchtime=30s
goos: linux
goarch: amd64
pkg: github.com/pingcap/tidb/pkg/statistics
cpu: AMD Ryzen 7 7735HS with Radeon Graphics
BenchmarkBuildHistAndTopNWithLowNDV/true-16         	     483	  74791981 ns/op	59844620 B/op	 1002092 allocs/op
BenchmarkBuildHistAndTopNWithLowNDV/false-16        	     477	  75674610 ns/op	59844684 B/op	 1002092 allocs/op
PASS
ok  	github.com/pingcap/tidb/pkg/statistics	90.061s
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Copy link

ti-chi-bot bot commented Nov 26, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from hawkingrei. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 26, 2023
Copy link

codecov bot commented Nov 26, 2023

Codecov Report

Attention: Patch coverage is 86.82171% with 34 lines in your changes missing coverage. Please review.

Project coverage is 57.0459%. Comparing base (542907f) to head (ae68073).
Report is 2296 commits behind head on master.

Additional details and impacted files
@@                Coverage Diff                @@
##             master     #48902         +/-   ##
=================================================
- Coverage   72.0440%   57.0459%   -14.9981%     
=================================================
  Files          1452       1671        +219     
  Lines        347472     652500     +305028     
=================================================
+ Hits         250333     372225     +121892     
- Misses        76776     254801     +178025     
- Partials      20363      25474       +5111     
Flag Coverage Δ
integration 41.3288% <39.1472%> (?)
unit 70.1822% <86.8217%> (-1.8619%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 54.0269% <ø> (-2.2860%) ⬇️
parser ∅ <ø> (∅)
br 59.7614% <ø> (+7.6339%) ⬆️

@ngaut ngaut changed the title *: improve proformance for BuildHistAndTopN *: improve performance for BuildHistAndTopN Nov 26, 2023
pkg/statistics/builder.go Outdated Show resolved Hide resolved
@hawkingrei hawkingrei force-pushed the improve_against branch 3 times, most recently from 87037c9 to ed7511b Compare November 27, 2023 12:32
@hawkingrei hawkingrei force-pushed the improve_against branch 2 times, most recently from 0e6c0c1 to afefa94 Compare November 29, 2023 09:37
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 29, 2023
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 29, 2023
@hawkingrei hawkingrei force-pushed the improve_against branch 4 times, most recently from 1e1583c to d123b8e Compare December 1, 2023 13:40
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 1, 2023
@hawkingrei
Copy link
Member Author

/retest

@hawkingrei hawkingrei force-pushed the improve_against branch 2 times, most recently from c490e43 to dca3831 Compare December 4, 2023 08:14
@hawkingrei
Copy link
Member Author

/retest

Copy link
Member

@time-and-fate time-and-fate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please at least keep the comments valid and up-to-date after you copied them from elsewhere.
  2. If you need to copy ~80% of an existing function, I suggest we try to find some methods to avoid the large duplication.
  3. From your benchmark result, I think we are trading space for time instead of a pure improvement. So what's the reason behind this change, and have we checked the risk?

@hawkingrei hawkingrei force-pushed the improve_against branch 8 times, most recently from c485888 to afc0d4c Compare December 12, 2023 03:47
@hawkingrei
Copy link
Member Author

  1. Please at least keep the comments valid and up-to-date after you copied them from elsewhere.

    1. If you need to copy ~80% of an existing function, I suggest we try to find some methods to avoid the large duplication.

    2. From your benchmark result, I think we are trading space for time instead of a pure improvement. So what's the reason behind this change, and have we checked the risk?

I have done the refactor to merge the same code and reduce memory allocate

Now, the new one is better than the old one.

@hawkingrei hawkingrei force-pushed the improve_against branch 3 times, most recently from 0eb92df to 90a9a31 Compare December 18, 2023 05:01
@time-and-fate
Copy link
Member

I think there is little performance difference in the new benchmark result.

@@ -242,6 +354,7 @@ func BuildHistAndTopN(
isColumn bool,
memTracker *memory.Tracker,
needExtStats bool,
highNDVMode bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
highNDVMode bool,
lowNDVMode bool,

Why it is high NDV?

@ti-chi-bot ti-chi-bot bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 5, 2024
Copy link

ti-chi-bot bot commented Jan 5, 2024

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Signed-off-by: Weizhen Wang <[email protected]>
Signed-off-by: Weizhen Wang <[email protected]>
Signed-off-by: Weizhen Wang <[email protected]>
Signed-off-by: Weizhen Wang <[email protected]>
Signed-off-by: Weizhen Wang <[email protected]>
@hawkingrei hawkingrei removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 19, 2024
@ti-chi-bot ti-chi-bot bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 17, 2024
Copy link

ti-chi-bot bot commented Apr 17, 2024

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

ti-chi-bot bot commented Nov 18, 2024

@hawkingrei: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/build ae68073 link true /test build
idc-jenkins-ci-tidb/unit-test ae68073 link true /test unit-test
idc-jenkins-ci-tidb/check_dev_2 ae68073 link true /test check-dev2
pull-lightning-integration-test ae68073 link true /test pull-lightning-integration-test
pull-integration-e2e-test ae68073 link true /test pull-integration-e2e-test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

improve performance for the BuildHistAndTopN
4 participants