Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding an optional connection pool utility #146

Open
jraymakers opened this issue Feb 8, 2025 · 5 comments
Open

Consider adding an optional connection pool utility #146

jraymakers opened this issue Feb 8, 2025 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@jraymakers
Copy link
Contributor

The way to run multiple statements at the same time in DuckDB is using multiple connections. So, it could be helpful to expose a utility that manages a set of connections, making this recommended pattern easy to use.

Some care is needed because DuckDB connections carry semantics: they are the scope of temporary objects, context such as the current database and schema, variables, and named prepared statements. The user of a connection pool needs to be aware of these semantics and should likely avoid using these features. With appropriate documentation, however, a connection pool could still be useful.

@jraymakers jraymakers self-assigned this Feb 8, 2025
@jraymakers jraymakers added the enhancement New feature or request label Feb 8, 2025
@jraymakers jraymakers added this to the API Conveniences milestone Feb 9, 2025
@elefeint
Copy link
Contributor

elefeint commented Feb 10, 2025

The way DuckDB Python/ODBC/JDBC handle this is with a Database Instance cache -- a single DuckDB instance is used for all uses of the same connection path, giving you different Connection objects with their own context, but while still maintaining the same database instance with loaded extensions, configuration etc. I think it got recently added to the C API.

@jraymakers
Copy link
Contributor Author

There's a separate issue (#148) to track exposing the instance cache.

This item is about a distinct feature: managing a set of connections to a single instance and distributing queries across those connections to achieve better parallelism.

@bleskes
Copy link

bleskes commented Feb 10, 2025

@jraymakers curious to hear the use cases you see / heard about this? as it stands today, I don't know of any other clients that do this (other then the instance cache, which I agree is different), because of DuckDB's limited concurrency and philosophy of using all machine resources to power a single query. Also, setting up connections is so light weight that it's usually just not worth it.

@jraymakers
Copy link
Contributor Author

This came out of the discussion of this issue: #142

This bit of DuckDB documentation also mentions connection pooling.

I agree some thought is needed about whether and when using a connection pool makes sense. Hence the title of this item starts "Consider", not (definitely) "Support".

@bleskes
Copy link

bleskes commented Feb 10, 2025

Thanks for doc pointer! it's new to me.

And ack on the issue title - I was just wondering what triggered it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants