-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic extension type handling #90
Conversation
cpp/src/lance/arrow/scanner_test.cc
Outdated
} | ||
|
||
std::string Serialize() const override { | ||
std::string result(" "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for? copy in memory?
Lets use something like https://github.com/eto-ai/lance/blob/main/cpp/src/lance/io/endian.h?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string buffer has 1 "\n" , right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i copy-pasted this from Arrow test codebase. It's just serializing the type not the data. This isn't trying to cover all data types.
cpp/src/lance/arrow/scanner_test.cc
Outdated
explicit ParametricType(int32_t parameter) | ||
: ExtensionType(::arrow::int32()), parameter_(parameter) {} | ||
|
||
int32_t parameter() const { return parameter_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why parameter is an int32
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's just a test class, doesn't really matter what the parameter is.
cpp/src/lance/format/schema.cc
Outdated
std::shared_ptr<::arrow::Field> Field::ToArrow() const { return ::arrow::field(name(), type()); } | ||
std::shared_ptr<::arrow::Field> Field::ToArrow() { | ||
if (is_extension_field()) { | ||
auto ext_type = ::arrow::GetExtensionType(extension_name()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this handle project?
For excample, if an image object has <URI: str, data:binary>
. can it support SELECT image.uri FROM dataset
?
I am ok if there is another issue to address that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me add a test case to check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if image
is of ImageType extension, projecting image.uri
returns the storage schema. So in the returned schema it's got image
as a struct and uri
as the utf8 sub-field.
cpp/src/lance/format/schema.h
Outdated
@@ -138,6 +138,10 @@ class Field final { | |||
|
|||
std::string logical_type() const { return logical_type_; }; | |||
|
|||
std::string extension_name() const { return extension_name_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prob can make the method to const std::string& extension_name()
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was trying to stay consistent with logical_type and name. Any particular reason this should be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yea, lets change logical type to const string&
, avoid a copy each time calling the string.
cpp/src/lance/format/schema.h
Outdated
@@ -201,10 +205,13 @@ class Field final { | |||
// TODO: use enum to replace protobuf enum. | |||
pb::Field::Type GetNodeType() const; | |||
|
|||
void init(std::shared_ptr<::arrow::DataType> dtype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No error expect to happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just moving the existing code to an init method so I can call either with the input dtype or with the storage type. What errors/exceptions should I be expecting here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets change the method (not getter / setter) name to CapCase
to be consistent w/ arrow and google
7ae2eb9
to
76f43bd
Compare
I'm sure I'm missing some edge cases but seems like this gives us what we want?