Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge two Schemas #263

Merged
merged 9 commits into from
Oct 27, 2022
Merged

Merge two Schemas #263

merged 9 commits into from
Oct 27, 2022

Conversation

eddyxu
Copy link
Contributor

@eddyxu eddyxu commented Oct 25, 2022

Merge two nested schemas.

@eddyxu eddyxu self-assigned this Oct 25, 2022
@eddyxu eddyxu added c++ C++ issues arrow Apache Arrow related issues enhancement New feature or request labels Oct 25, 2022
@eddyxu eddyxu changed the title Merge Schema Merge two Schemas Oct 25, 2022
@eddyxu eddyxu requested a review from changhiskhan October 25, 2022 23:11
@eddyxu eddyxu marked this pull request as ready for review October 25, 2022 23:12
Copy link
Contributor

@changhiskhan changhiskhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of questions

auto self_type = type();
if (self_type->id() != arrow_field.type()->id()) {
return ::arrow::Status::Invalid(
"Can not merge two fields: ", self_type->ToString(), "!=", arrow_field.type()->ToString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: change the message to indicate that it's trying to merge two different types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

auto new_field = Copy(true);
if (::arrow::is_list_like(self_type->id())) {
auto list_type = std::dynamic_pointer_cast<::arrow::ListType>(arrow_field.type());
return children_[0]->Merge(*list_type->value_field());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this, if we're merging two list fields, then should the return value be a list type of the merge of the two value types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the case for merging list<struct<....>>, so the struct type is nested value type for a list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just see is_list_like, but how do we know it's a struct inside the list field?

} else {
ARROW_ASSIGN_OR_RAISE(auto new_child_field, child->Merge(*arrow_child));
new_field->children_.emplace_back(new_child_field);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about children of this struct field that's not in the arrow_field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_field has the children copied here https://github.com/eto-ai/lance/pull/263/files/90ec4c778ad42bf0f54d1abb0b8aef09e45bdd22#diff-0459f40916915b924d3c6aa11554911c9f00fcded2c0361b85f215f6623cce58R398.

It seems to a bug that the common child between two fields will be insert twice tho. Lemme fix it.

@eddyxu eddyxu merged commit e8922f9 into main Oct 27, 2022
@eddyxu eddyxu deleted the lei/merge_schema branch October 27, 2022 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Apache Arrow related issues c++ C++ issues enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants