Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] A FileIO API to list files iteratively #4791

Closed
2 tasks done
smdsbz opened this issue Dec 27, 2024 · 0 comments · Fixed by #4834
Closed
2 tasks done

[Feature] A FileIO API to list files iteratively #4791

smdsbz opened this issue Dec 27, 2024 · 0 comments · Fixed by #4834
Labels
enhancement New feature or request

Comments

@smdsbz
Copy link
Contributor

smdsbz commented Dec 27, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Currently the FileIO interface only supports listing all files / directories under a given path at a time. As a consequence callers of FileIO, e.g. ObjectRefresh, can only choose to load the entire catalog of files into memory, which may lead to poor performance and OOM.

Solution

Introduce paged list API like the following:

Pair<FileStatus[], String> listFilesPaged(
        Path path, boolean recursive, long pageSize, @Nullable String continuationToken)

This should allow implementations to take advantage of batched list APIs that are commonly seen in object stores, e.g. ListObjectsV2 with continuation token.

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
1 participant