Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce memory consumption of stats data #16572

Open
zz-jason opened this issue Apr 18, 2020 · 2 comments
Open

reduce memory consumption of stats data #16572

zz-jason opened this issue Apr 18, 2020 · 2 comments
Assignees
Labels
component/statistics epic/memory-management sig/execution SIG execution type/enhancement The issue or PR belongs to an enhancement.

Comments

@zz-jason
Copy link
Member

zz-jason commented Apr 18, 2020

Development Task

When TiDB bootstrap, it reads the stats data of all tables:

func (h *Handle) InitStats(is infoschema.InfoSchema) (err error) {

When there're lots of tables in a TiDB cluster, caching all the stats data into a single TiDB server may cause a high memory consumption when the TiDB server bootstrapped. It increases the OOM risk of the TiDB server.

Here are things we need to do:

  • Add a benchmark test to see the exact memory consumption for 1K, 2K, 4K, 8K, 16K tables.
  • Consider some strategies to reduce memory consumption. For example:
    • Don't gather CM-Sketch for a unique or primary index, by default, it can save 40 KB memory.
    • Optimize the data structure in memory to store CM Sketch and Histograms.
    • Not all the tables are queried at the same time, we may not need to load stats data for all tables at the bootstrap time. We may further consider some stats cache replacement algorithm to drop old unused stats data and load new requested stats data into the stats cache.
    • We may also introduce a two-layer cache, which is: memory <- local disk <- TiKV cluster. It's more complicated than the first idea, we can discuss it in the future.
  • We also need a visible way to see the memory consumption of stats data cache. This topic can be expanded to other components. Such as global variable cache, global binding cache, etc.

See also #17200

@zz-jason zz-jason added type/enhancement The issue or PR belongs to an enhancement. component/statistics labels Apr 18, 2020
@SunRunAway SunRunAway self-assigned this Apr 27, 2020
@SunRunAway
Copy link
Contributor

image

Another scenario.

@winoros
Copy link
Member

winoros commented Jun 19, 2020

There's also a need that load by need method is not good for AP query since if may crash TiDB or execute a very very long time. So it will be great helpful if we can use design a way to remove the load by need while reducing the memory usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/statistics epic/memory-management sig/execution SIG execution type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

3 participants