-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Use file cache to list partitions if available #9655
Conversation
When discovering partitions for pruning, if we specify no partition columns, we call `list_all_files`, which uses the `list_files_cache` if it exists and is filled. If we specify partition columns, before this change, we recursively list files in the object store to discover partitions. That happens on every request, and listing files e.g. in AWS S3 can be slow (especially if it's 100k+). With this change, if the `list_files_cache` exists and is filled, we get all files from there and use that to discover partitions. Closes apache#9654
Note I'm an absolute rust noob :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried about the performance of this approach, I think with enough files it might even be slower than just listing from S3 (which is quite slow, as you point out) due to the O(N^2) growth, which would not make it a strict improvement.
On the other hand it would be great if we had benchmarks exercising ListingTable
with a large number of files -- 100s of thousands, as you mentioned in the issue. My team has found it to be a pain point with DataFusion when trying to keep planning times under 10 ms
@@ -168,24 +178,154 @@ struct Partition { | |||
files: Option<Vec<ObjectMeta>>, | |||
} | |||
|
|||
#[derive(Debug, Default)] | |||
struct ObjectMetaLister { | |||
objects: Arc<Vec<ObjectMeta>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would I recommend a trie for this use case, as list_with_delimiter
called individually on every single partition, which means this code is going to perform O(N^2) in the worst case. I used sequence_trie
for this in my own object store cache implementation.
IMO it would be ideal if the ListFilesCache
itself returned a trie instead of a Vec
-- then no conversion will need to happen at all.
Thank you @henrifroese for this contribution. It is an excellent issue to identify Update here is I think a more holistic review of how ListingTable works and what we want out of it might be in order -- I started #9964 to discuss |
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
When discovering partitions for pruning, if we specify no partition columns, we call
list_all_files
, which uses thelist_files_cache
if it exists and is filled.If we specify partition columns, before this change, we recursively list files in the object store to discover partitions. That happens on every request, and listing files e.g. in AWS S3 can be slow (especially if it's 100k+).
With this change, if the
list_files_cache
exists and is filled, we get all files from there and use that to discover partitions.Closes #9654.