-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API, Core: Add content offset and size to DeleteFile #11446
Conversation
@@ -55,7 +56,7 @@ default ChangelogOperation operation() { | |||
|
|||
@Override | |||
default long sizeBytes() { | |||
return length() + deletes().stream().mapToLong(ContentFile::fileSizeInBytes).sum(); | |||
return length() + ContentFileUtil.contentSizeInBytes(deletes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is in api
. I initially moved ContentFileUtil
(which holds relevant logic) to api
, which required moving more classes. Specifically, I am talking about Partitioning
and MetadataColumns
. I am thinking about alternatives. I could add a brand new ScanTaskUtil
in api
, but it is a bit awkward to have both utils.
Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a fairly simple method that doesn't need to require extensive refactoring. Why not add a DeleteFileUtil
or something with this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can. We will have ContentFileUtil
and DeleteFileUtil
. A bit awkward but not a big deal.
76a372d
to
b45b414
Compare
} | ||
} | ||
|
||
public static long contentSizeInBytes(Iterable<? extends ContentFile<?>> files) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not using streams for performance reasons.
b45b414
to
e30c3fa
Compare
Thanks, Anton. I think it will be valuable to link this and previous PRs to the proposal issue, #11122. |
@manuzhang, good call. Linked. |
@@ -79,7 +80,7 @@ void updateFlushResult(WriteResult result) { | |||
Arrays.stream(result.deleteFiles()) | |||
.forEach( | |||
deleteFile -> { | |||
deleteFilesSizeHistogram.update(deleteFile.fileSizeInBytes()); | |||
deleteFilesSizeHistogram.update(ContentFileUtil.contentSizeInBytes(deleteFile)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nastra, I cherry-picked all Flink changes across all versions, unlike the prototype PR you reviewed.
Preconditions.checkArgument(contentOffset != null, "Content offset is required for DV"); | ||
Preconditions.checkArgument(contentSizeInBytes != null, "Content size is required for DV"); | ||
} else { | ||
Preconditions.checkArgument(contentOffset == null, "Content offset is only for DV"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these descriptive enough? Trying to stay on one line but have meaningful error messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's quite difficult to stay on a single line while also producing a good/descriptive error msg. I would favor a better error msg, since that is what people later will run into and will read.
What about Content offset must only be set for DVs
here? In other places we typically use Invalid xyz: ...
, so this error msg could also be Invalid content offset: must be null for DVs
and in the msgs further above it could be Invalid content offset: must be non-null for DVs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one comment around error messaging
This PR adds content offset and size to
DeleteFile
for DVs.Scan, commit, and snapshot metrics will be updated in a subsequent PR. This work is part of #11122.