Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rust] ObjectStore::read_dir method #911

Merged
merged 3 commits into from
May 29, 2023
Merged

[Rust] ObjectStore::read_dir method #911

merged 3 commits into from
May 29, 2023

Conversation

eddyxu
Copy link
Contributor

@eddyxu eddyxu commented May 29, 2023

To support scan directory on local and cloud storage.

@eddyxu eddyxu requested a review from changhiskhan May 29, 2023 05:25
@eddyxu
Copy link
Contributor Author

eddyxu commented May 29, 2023

Will add cloud storage tests into integration tests.

@@ -176,6 +176,20 @@ impl ObjectStore {
ObjectWriter::new(self, path).await
}

/// Read a directory (start from base directory) and returns all sub-paths in the directory.
pub async fn read_dir(&self, dir_path: impl Into<Path>) -> Result<Vec<String>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return Result<Vec<Path>> instead for downstream processing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will fix it.

.iter()
.map(|cp| cp.filename().map(|s| s.to_string()).unwrap_or_default())
.chain(output.objects.iter().map(|o| o.location.to_string()))
.map(|s| Ok(Path::parse(s)?.filename().unwrap().to_string()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this returning just the shortname or the full path? At the object store level, maybe it makes more sense to get the full paths? and then the lancedb layer can take just the shortname before the .lance as the table name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns the filename, which is the same behavior as the OS readdir() call. LanceDB uses the readdir() semantic, right? For example, to list what tables are available.

@changhiskhan
Copy link
Contributor

also windows failure

Copy link
Contributor

@changhiskhan changhiskhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're returning just the filename, does it make sense anymore to return Path?

@eddyxu
Copy link
Contributor Author

eddyxu commented May 29, 2023

if we're returning just the filename, does it make sense anymore to return Path?

Change it back to return String

@eddyxu eddyxu merged commit b0c2344 into main May 29, 2023
@eddyxu eddyxu deleted the lei/read_dir branch May 29, 2023 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants