Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memory pool configuration to datafusion-cli #7419

Closed
alamb opened this issue Aug 25, 2023 · 1 comment · Fixed by #7424
Closed

Add memory pool configuration to datafusion-cli #7419

alamb opened this issue Aug 25, 2023 · 1 comment · Fixed by #7424
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Aug 25, 2023

Is your feature request related to a problem or challenge?

While trying to test #7400 with datafusion-cli I found I can't do it with datafuson-cli because datafusion-cli doesn't have a memory manager enabled.

Describe the solution you'd like

I would like to add two new new command line options to datafusion-cli

  1. -m / --mem-limit that if set, would set the memory pool size limit. If unset no memory pool is used
  2. --mem-pool-type=<greedy|fair>, defaults to greedy that specifies the pool type: GreedyMemoryPool or FairSpillPool respectively

Examples of usage

# memory is not limited
datafusion-cli -c 'select 1, 2 from foo';

# run query with greedy memory pool set to use 10G
datafusion-cli --memory-limit 10G -c 'select 1, 2 from foo'; 

# run query with greedy memory pool set to use 10G
datafusion-cli -m 10G -c 'select 1, 2 from foo'; 

# run query with fair memory pool set to use 10G
datafusion-cli --pool-type=fair -m 10G -c 'select 1, 2 from foo'; 

See https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/struct.FairSpillPool.html for more details

Describe alternatives you've considered

I also thought about setting the pools via SET commands (like setting the target batch size). However, I don't think we should allow change memory limits via SQL because memory limits is likely not something a multi-tenant system would like to do . It should be setup before the session starts or by the runtime system, not the user in SQL

Additional context

Since this is well specified and is mostly an exercise in figuring out how datafusion-cli works, I think this would make a good first project

@alamb alamb added enhancement New feature or request good first issue Good for newcomers labels Aug 25, 2023
@Weijun-H
Copy link
Member

I am glad to pick this ticket up this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants