Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fetch remainder of metadata if it is large #873

Merged
merged 4 commits into from
May 17, 2023

Conversation

wjones127
Copy link
Contributor

@wjones127 wjones127 commented May 16, 2023

Fixes #856

@wjones127 wjones127 force-pushed the wjones127/856-large-manifest branch from 7afdc79 to dd6bb46 Compare May 16, 2023 17:38
@wjones127

This comment was marked as resolved.

@wjones127 wjones127 changed the title test: add failing test for reading manifest fix: fetch remainder of metadata if it is large May 16, 2023
@wjones127 wjones127 force-pushed the wjones127/856-large-manifest branch from 54442a2 to 2a94717 Compare May 16, 2023 20:07
@wjones127 wjones127 marked this pull request as ready for review May 16, 2023 21:26
};

// Need to trim the magic number at end, and the non-manifest data at the beginning
let buf = buf.slice(4..buf.len() - 16);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this 4 stand for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm well my comment ", and the non-manifest data at the beginning" isn't right / specific. I'm not sure yet; all I knew is that it was skipped previously. I guess it's the u32 message length here:

https://github.com/eto-ai/lance/blob/d3f6f6c31c909da22c0b7d486bbc35ec101f6079/rust/src/io/object_writer.rs#L68

Perhaps we should be reading that length and validating it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added validation for that and a better comment 👍

@@ -66,7 +66,7 @@ impl From<&pb::DataFile> for DataFile {
///
/// A fragment is a set of files which represent the different columns of the same rows.
/// If column exists in the schema, but the related file does not exist, treat this column as `nulls`.
#[derive(Debug, Clone)]
#[derive(Debug, Clone, PartialEq)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it require PartialEq, do you need to sort / storing them in some containers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was for the sake of testing (assert_eq requires this).

Copy link
Contributor

@eddyxu eddyxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wjones127 wjones127 merged commit 6af670a into main May 17, 2023
@wjones127 wjones127 deleted the wjones127/856-large-manifest branch May 17, 2023 14:58
@haoxins
Copy link
Contributor

haoxins commented May 18, 2023

can we bump a new version to crates.io?

@wjones127
Copy link
Contributor Author

@haoxins we will release a new version today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error: panicked at 'assertion failed: file_size - manifest_pos <= buf.len()'
3 participants