Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC-2779: List With Metakey #2779

Merged
merged 10 commits into from
Aug 7, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions core/src/docs/rfcs/2779_list_with_metakey.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
- Proposal Name: `list_with_metakey`
- Start Date: 2023-08-04
- RFC PR: [apache/incubator-opendal#2779](https://github.com/apache/incubator-opendal/pull/2779)
- Tracking Issue: [apache/incubator-opendal#0000](https://github.com/apache/incubator-opendal/issues/0000)

# Summary

Move `metadata` API to `Lister` to simplify the usage.

# Motivation

The current `Entry` metadata API is:

```rust
use opendal::Entry;
use opendal::Metakey;

let meta = op
.metadata(&entry, Metakey::ContentLength | Metakey::ContentType)
.await?;
```

This API is difficult to understand and rarely used correctly. And in reality, users always fetch the same set of metadata during listing.
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved

Take one of our users code as an example:

```rust
let stream = self
.inner
.scan(&path)
.await
.map_err(|err| format_object_store_error(err, &path))?;

let stream = stream.then(|res| async {
let entry = res.map_err(|err| format_object_store_error(err, ""))?;
let meta = self
.inner
.metadata(&entry, Metakey::ContentLength | Metakey::LastModified)
.await
.map_err(|err| format_object_store_error(err, entry.path()))?;

Ok(format_object_meta(entry.path(), &meta))
});

Ok(stream.boxed())
```

By moving metadata to `lister`, our user code can be simplified to:
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
oowl marked this conversation as resolved.
Show resolved Hide resolved

```rust
let stream = self
.inner
.scan_with(&path)
.metakey(Metakey::ContentLength | Metakey::LastModified)
.await
.map_err(|err| format_object_store_error(err, &path))?;

let stream = stream.then(|res| async {
let entry = res.map_err(|err| format_object_store_error(err, ""))?;
let meta = entry.into_metadata()

Ok(format_object_meta(entry.path(), &meta))
});

Ok(stream.boxed())
```

By introducing this change:

- Users don't need to capture `Operator` in the closure.
suyanhanx marked this conversation as resolved.
Show resolved Hide resolved
- Users don't need to do async call like `metadata()` again.

# Guide-level explanation

The new API will be:

```rust
let entries: Vec<Entry> = op
.list_with("dir")
.metakey(Metakey::ContentLength | Metakey::ContentType).await?;

let meta: &Metadata = entries[0].metadata();
```

Metadata can be queried directly when listing entries via `metadata()`, and later extracted via `into_parts()`.

# Reference-level explanation

We will add `metakey` into `OpList`. Underlying services can use those information to try their best to fetch the metadata.

There are following possibilities:

- The entry metadata is met: `Lister` return the entry directly
- The entry metadata is not met and not fully filled: `Lister` will try to send `stat` call to fetch the metadata
- The entry metadata is not met and fully filled: `Lister` will return the entry directly.
xyjixyjixyji marked this conversation as resolved.
Show resolved Hide resolved

To make sure we can handle all metadata correctly, we will add a new capability called `stat_max_metakey`. This capability will be used to indicate the maximum number of metadata that can be fetched via `stat` call. `Lister` can use this capability to decide whether to send `stat` call or not.
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved

Services' lister implementation should not changed.

# Drawbacks
suyanhanx marked this conversation as resolved.
Show resolved Hide resolved

None

# Rationale and alternatives

Keeping the complex standalone API has limited benefit given low usage.

# Prior art

None

# Unresolved questions

None

# Future possibilities

## Add glob and regex support for Lister

We can add `glob` and `regex` support for `Lister` to make it more powerful.
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
3 changes: 3 additions & 0 deletions core/src/docs/rfcs/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -151,3 +151,6 @@ pub mod rfc_2758_merge_append_into_write {}

#[doc = include_str!("2774_lister_api.md")]
pub mod rfc_2774_lister_api {}

#[doc = include_str!("2779_list_with_metakey.md")]
pub mod rfc_2779_list_with_metakey {}