Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for list of Dictionaries #664

Merged
merged 3 commits into from
Mar 21, 2023

Conversation

gsilvestrin
Copy link
Contributor

@gsilvestrin gsilvestrin commented Mar 8, 2023

  • Fixed a bug in write_manifest that would skip writing the dict values for the first field in the schema

@gsilvestrin gsilvestrin linked an issue Mar 21, 2023 that may be closed by this pull request
- Fixed a bug in write_manifest that would skip writing the dict values for the first field in the schema
@gsilvestrin gsilvestrin force-pushed the gsilvestrin/lists_dictionary branch from 35574ce to 4c3c350 Compare March 21, 2023 22:00
@gsilvestrin gsilvestrin changed the title initial support for list of dictionaries Implement support for list of Dictionaries Mar 21, 2023
_ => {
// Add list / large list support.
// Add list / large list support. - should we panic?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there more types to implement?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more types. We dont need to support them yet.

Can panic or return error here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

panic causes more issues - this method is called for all fields, not only dictionaries. So we would need to either make this match exhaustive or keep the comment (I left the comment)

@@ -41,7 +41,7 @@ pub async fn write_manifest(
) -> Result<usize> {
// Write dictionary values.
let max_field_id = manifest.schema.max_field_id().unwrap_or(-1);
for field_id in 1..max_field_id + 1 {
for field_id in 0..max_field_id + 1 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could potentially be (0..max_field_id), but mut_field_by_id will return None when the id does not exist

@gsilvestrin gsilvestrin marked this pull request as ready for review March 21, 2023 22:04
@gsilvestrin gsilvestrin requested a review from eddyxu March 21, 2023 22:04
Copy link
Contributor

@eddyxu eddyxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor issues.

@@ -375,8 +377,16 @@ impl Field {
lance_field.set_dictionary(struct_arr.column(i));
}
}
DataType::List(_) => {
let list_arr = arr.as_any().downcast_ref::<ListArray>().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a as_list_array() in arrow-rs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I refactored it

_ => {
// Add list / large list support.
// Add list / large list support. - should we panic?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more types. We dont need to support them yet.

Can panic or return error here.

@gsilvestrin gsilvestrin requested a review from eddyxu March 21, 2023 22:51
@gsilvestrin gsilvestrin merged commit b783810 into main Mar 21, 2023
@gsilvestrin gsilvestrin deleted the gsilvestrin/lists_dictionary branch March 21, 2023 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error reading data with Lance
2 participants