Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core)!: implement write returns metadata #5562

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

meteorgan
Copy link
Contributor

@meteorgan meteorgan commented Jan 19, 2025

Which issue does this PR close?

Part of #5557

Rationale for this change

What changes are included in this PR?

  • implement write returns metadata for Operator and BlockingOperator
  • enable write_has_content_length for all services and implement the logic for them
  • implement write returns metadata logic for service S3, fs, monoiofs and webhdfs
  • other services return default Metadata after writing
  • skip return value for language bindings

Are there any user-facing changes?

Yes.

  • write, write_with in Operator returns Result<Metadata> instead of Return<()>
  • Writer.close() returns Return<Metadata> instead of Return<()>
  • write in BlockingOperator returns Return<Metadata> instead of Return<()>
  • FunctionWrite.call() used by BlockingOperator returns Return<Metadata> instead of Return<()>

@meteorgan
Copy link
Contributor Author

emmm, another issue related to ceph rados: the Etag returned by CompleteMultipartUpload doesn't follow the standard.
image

@Xuanwo
Copy link
Member

Xuanwo commented Jan 20, 2025

the Etag returned by CompleteMultipartUpload doesn't follow the standard.

Well, well, well...

Let's do our best to permit this since it's not harmful. Technically, we don't allow users to parse the ETag value; instead, they should store it and compare it with values from the same source. I'm guessing the issue lies in part of our code assuming that the ETag must contain "?

@meteorgan
Copy link
Contributor Author

the Etag returned by CompleteMultipartUpload doesn't follow the standard.

Well, well, well...

Let's do our best to permit this since it's not harmful. Technically, we don't allow users to parse the ETag value; instead, they should store it and compare it with values from the same source. I'm guessing the issue lies in part of our code assuming that the ETag must contain "?

No. The issue is that I compare the Etag from stat with the one from write in the tests to ensure we get the correct Etag if it's returned by write. However, the value returned from stat contain " while the value returned from write does not. they aren't consistent with each other, even within the same service !

@Xuanwo
Copy link
Member

Xuanwo commented Jan 20, 2025

No. The issue is that I compare the Etag from stat with the one from write in the tests to ensure we get the correct Etag if it's returned by write. However, the value returned from stat contain " while the value returned from write does not. they aren't consistent with each other, even within the same service !

That's really surprising. Let's disable the Ceph RADOS test and create an issue to track it. I believe such a bug isn't worth our workaround.

@meteorgan
Copy link
Contributor Author

That's really surprising. Let's disable the Ceph RADOS test and create an issue to track it. I believe such a bug isn't worth our workaround.

I've done some tests. it's very interesting that stat_with().if_match(etag) works fine regardless of whether the ETag includes the prefix and suffix " or not.

image

@meteorgan
Copy link
Contributor Author

I've done some tests. it's very interesting that stat_with().if_match(etag) works fine regardless of whether the ETag includes the prefix and suffix " or not.

The results are the same across Ceph RADOS, Minio and S3

@meteorgan meteorgan force-pushed the write_returns_metadata_for_s3 branch from 81aeb13 to ddc3de5 Compare January 20, 2025 13:47
@meteorgan
Copy link
Contributor Author

That's really surprising. Let's disable the Ceph RADOS test and create an issue to track it. I believe such a bug isn't worth our workaround.

I still created a workaround by adding prefix and suffix " for Etag in the test if they are not included. I think this approach is simpler, what do you think ?

@meteorgan meteorgan marked this pull request as ready for review January 20, 2025 14:03
self.kv.set(&self.path, buf).await
self.kv.set(&self.path, buf).await?;

Ok(Metadata::default())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can return at least the content length of file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, I planned to leave this for later PRs, as I was only prepared to implement the logic for S3. However, since content_length is not an Option, I agree with you - it's better to return content_length as the very least; otherwise users might be confused by the zero length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there is still an issue - what about append ? Can we have a simple method to retrieve the content_length without performing a stat or parsing the result of append ?

let buf = self.buffer.clone().collect();
self.kv.blocking_set(&self.path, buf)?;
Ok(())
Ok(Metadata::default())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same.

@@ -301,7 +301,8 @@ impl<S: Adapter> oio::Write for KvWriter<S> {
}
};
self.kv.set(&self.path, value).await?;
Ok(())

Ok(Metadata::default())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can return back the metadata in value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thank you.

@@ -330,7 +331,7 @@ impl<S: Adapter> oio::BlockingWrite for KvWriter<S> {
};

kv.blocking_set(&self.path, value)?;
Ok(())
Ok(Metadata::default())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same.

async fn close(&mut self) -> Result<()> {
Ok(())
async fn close(&mut self) -> Result<Metadata> {
Ok(Metadata::default())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should return valid data here. The metadata could be returned and replaced by self.inner.append().await, then stored within AppendWriter until the user calls close.

let (Some(upload_id), Some(file_id)) = (self.upload_id.as_ref(), self.file_id.as_ref())
else {
return Ok(());
return Ok(Metadata::default());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to set a content length for each service, at the very least. content_length is specifically handled in OpenDAL, where users will receive 0 if it doesn't set.

if !maybe_error.code.is_empty() {
return Err(from_s3_error(maybe_error, parts));
}

Ok(())
let ret: CompleteMultipartUploadResult =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can embed the error struct in CompleteMultipartUploadResult so that we don't need to parse twice.

last_modified_time
);
}
if let Some(etag) = meta.etag() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to comment this test until rados fix it.

assert_eq!(stat_meta.content_length(), meta.content_length());
}
if let Some(last_modified_time) = meta.last_modified() {
assert_eq!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's more readable to use the following?

assert_eq!(meta.last_modified(), stat_meta.last_modified())

@Xuanwo
Copy link
Member

Xuanwo commented Jan 20, 2025

Perhaps we should introduce write_has_content_type flags to indicate which metadata is available in the meta returned by the write operation. Checking the meta.content_type() may pass the test but is meaningless to users; it could also potentially conceal incorrect implementations that developers can simplely return Metadata::default() to pass all tests. And it makes it more difficult for us to determine the implementation status of various services.

@meteorgan
Copy link
Contributor Author

meteorgan commented Jan 21, 2025

Perhaps we should introduce write_has_content_type flags to indicate which metadata is available in the meta returned by the write operation.

Would write_returns_content_type be a better name?

Checking the meta.content_type() may pass the test but is meaningless to users; it could also potentially conceal incorrect implementations that developers can simplely return Metadata::default() to pass all tests.

if a service returns Metadata::default(), it indicates that it implements nothing. There's a chance that a developer intended to return some contents but forgot set the fields. Unfortunately, the tests can't catch this.

And it makes it more difficult for us to determine the implementation status of various services.

Make sense ! Making it explicit is better.

@meteorgan
Copy link
Contributor Author

No. The issue is that I compare the Etag from stat with the one from write in the tests to ensure we get the correct Etag if it's returned by write. However, the value returned from stat contain " while the value returned from write does not. they aren't consistent with each other, even within the same service !

Wow, they've already had a PR for this issue: ceph/ceph#60477

@meteorgan
Copy link
Contributor Author

Gently push. Hi, @Xuanwo I've provided some replies, do you have any suggestions ?

@Xuanwo
Copy link
Member

Xuanwo commented Jan 24, 2025

Would write_returns_content_type be a better name?

I prefer to align with exists stat_has_xxx and list_has_xxx.

Wow, they've already had a PR for this issue: ceph/ceph#60477

That's nice, maybe we can disable ceph test for now...

@meteorgan meteorgan force-pushed the write_returns_metadata_for_s3 branch from ddc3de5 to a1e0b0a Compare January 26, 2025 12:53
@meteorgan meteorgan changed the title feat(core)!: implement write returns metadata for s3 feat(core)!: implement write returns metadata Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants