Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix offsets for Binary/Lists/LargeLists #727

Merged
merged 7 commits into from
Mar 24, 2023

Conversation

gsilvestrin
Copy link
Contributor

@gsilvestrin gsilvestrin commented Mar 23, 2023

@gsilvestrin gsilvestrin changed the title fixing offsets Fix offsets for Binary/Lists/LargeLists Mar 24, 2023
@gsilvestrin gsilvestrin marked this pull request as ready for review March 24, 2023 00:32
@gsilvestrin gsilvestrin requested review from changhiskhan and eddyxu and removed request for changhiskhan March 24, 2023 00:37
rust/src/encodings/binary.rs Outdated Show resolved Hide resolved
rust/src/io/reader.rs Outdated Show resolved Hide resolved

// Make sure the big array was not written to the file
let file_size_bytes = store.size(&path).await.unwrap();
assert!(file_size_bytes < 1_000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we just need to make 2000 values instead of 1M?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just writing an array that is so big that it will be obvious if the assert doesn't match (I think the actual file size is ~ 250 bytes)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, i mean 2000 elements is good enough for test. and it is faster in CI

let large_list_array = LargeListArray::from_iter_primitive::<Int32Type, _, _>(vec![
Some(vec![Some(10), Some(11)]),
Some(vec![Some(12), Some(13)]),
Some((0..1_000_000).map(|n| Some(n)).collect::<Vec<_>>()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test to actually write multiple arrays, wanna just make sure the cummulation of offsets is correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly, but test_read_struct_of_list_arrays creates multiple structs that contains lists and write_struct_array should group them together.

@gsilvestrin gsilvestrin requested a review from eddyxu March 24, 2023 01:14
Copy link
Contributor

@eddyxu eddyxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Pending reduce the # of elements created in the test to speed up tests.


// Make sure the big array was not written to the file
let file_size_bytes = store.size(&path).await.unwrap();
assert!(file_size_bytes < 1_000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, i mean 2000 elements is good enough for test. and it is faster in CI

@gsilvestrin gsilvestrin merged commit 9200ed6 into main Mar 24, 2023
@gsilvestrin gsilvestrin deleted the gsilvestrin/bugfix-variable_length_offsets branch March 24, 2023 01:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants