-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: avro bytes test for Literal #80
Conversation
crates/iceberg/src/spec/values.rs
Outdated
@@ -995,7 +995,7 @@ mod tests { | |||
assert_eq!(literal, expected_literal); | |||
|
|||
let mut writer = apache_avro::Writer::new(&schema, Vec::new()); | |||
writer.append_ser(bytes).unwrap(); | |||
writer.append_ser(Into::<ByteBuf>::into(literal)).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Into
seems strange. How about using Bytes::from(literal)
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a ByteBuf::from(value: impl Into<Vec<u8>>)
which overrides the From<Literal>
for ByteBuf
and I need to somehow specify that I want the conversion from Literal
to ByteBuf
. Maybe you know of a different way to specify which form/into
implementation to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a
ByteBuf::from(value: impl Into<Vec<u8>>)
which overrides theFrom<Literal>
forByteBuf
Hi, I'm not quite understanding. Does Into<Vec<u8>>
exist for Literal
? Or is it because Literal
implements From<Literal> for BytesBuf
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There currently is no Into<Vec<u8>>
for Literal
but there is a From<Literal> for ByteBuf
, which I would like to use. When I use ByteBuf::from
, the compiler wants to use the existing ByteBuf::from(value: impl Into<Vec<u8>>)
implementation, which gives me a type error because Literal
doesn't implement Into<Vec<u8>>
. So I need someway to tell the compiler the right from/into
implementation to use. It works the way I did it but it might be a bit confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I got it. ByteBuf
provides it's own ByteBuf::from
that may conflicts with From
trait.
Maybe we should provide a Literal::to_vec()
instead? Users can utilize ByteBuf::from(Literal::to_vec())
to enhance semantic clarity.
Incidentally, since Literal
already implements Serialize
, it's easy to misuse this API. Can we find a way to prevent this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I got it.
ByteBuf
provides it's ownByteBuf::from
that may conflicts withFrom
trait.
Exactly.
It actually works with writer.append_ser(ByteBuf::from(literal)).unwrap();
when I provide a impl From<Literal> for Vec<u8>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good to me overall, but I'm slightly concerned about the use of Into::::into(literal). Will our users have to write code in this manner? Or does it occur internally?
One choice is to impl impl From<Literal> for Vec<u8>
instead of impl From<Literal> for ByteBuf
and then we can convert to ByteBuf by ByteBuf::from(literal)
Another choice may be:
let bytes: BytesBuf = literal.into();
writer.append_ser(bytes)
Actually in the case where the type is BytesBuf explicitlly, we don't need Into::<ByteBuf>::into(literal)
.🤔
when I provide a impl From for Vec.
This way looks good to me.
I'm a little confused. I find that this avro bytes is different with binary encoding in avro spec. What's difference between these two format? 🤔 |
I will try to explain it the best way I can. Please correct me if I'm wrong. Iceberg defines it's own binary serialization format for statistics (upper & lower bounds). This statistical information is represented as bytes which are then stored in avro files using the avro binary encoding. So this is somewhat of a two-layered setup. In order to interpret a value you first have to deserialize the avro value into bytes and then convert the bytes to a value using the iceberg binary format. When I implemented |
Thanks for your explanation! Totally understanded it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
The PR looks good to me overall, but I'm slightly concerned about the use of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
I think it's used for serialization of literal, and not exposed to user. |
I added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks!
Thanks @ZENOTME for finding the bug! |
Great catch @JanKaul and everyone for reviewing! Let's get this in! |
This PR fixes a logical bug in the tests for converting
Literal
s to and fromByteBuf
. The bug is that I wrote the wrong bytes to the avro file.