Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avro bytes test for Literal #80

Merged
merged 3 commits into from
Oct 18, 2023
Merged

Conversation

JanKaul
Copy link
Collaborator

@JanKaul JanKaul commented Oct 17, 2023

This PR fixes a logical bug in the tests for converting Literals to and from ByteBuf. The bug is that I wrote the wrong bytes to the avro file.

@@ -995,7 +995,7 @@ mod tests {
assert_eq!(literal, expected_literal);

let mut writer = apache_avro::Writer::new(&schema, Vec::new());
writer.append_ser(bytes).unwrap();
writer.append_ser(Into::<ByteBuf>::into(literal)).unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Into seems strange. How about using Bytes::from(literal) instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a ByteBuf::from(value: impl Into<Vec<u8>>) which overrides the From<Literal> for ByteBuf and I need to somehow specify that I want the conversion from Literal to ByteBuf. Maybe you know of a different way to specify which form/into implementation to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a ByteBuf::from(value: impl Into<Vec<u8>>) which overrides the From<Literal> for ByteBuf

Hi, I'm not quite understanding. Does Into<Vec<u8>> exist for Literal? Or is it because Literal implements From<Literal> for BytesBuf?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There currently is no Into<Vec<u8>> for Literal but there is a From<Literal> for ByteBuf, which I would like to use. When I use ByteBuf::from, the compiler wants to use the existing ByteBuf::from(value: impl Into<Vec<u8>>) implementation, which gives me a type error because Literal doesn't implement Into<Vec<u8>>. So I need someway to tell the compiler the right from/into implementation to use. It works the way I did it but it might be a bit confusing.

Copy link
Member

@Xuanwo Xuanwo Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I got it. ByteBuf provides it's own ByteBuf::from that may conflicts with From trait.

Maybe we should provide a Literal::to_vec() instead? Users can utilize ByteBuf::from(Literal::to_vec()) to enhance semantic clarity.

Incidentally, since Literal already implements Serialize, it's easy to misuse this API. Can we find a way to prevent this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I got it. ByteBuf provides it's own ByteBuf::from that may conflicts with From trait.

Exactly.

It actually works with writer.append_ser(ByteBuf::from(literal)).unwrap(); when I provide a impl From<Literal> for Vec<u8>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me overall, but I'm slightly concerned about the use of Into::::into(literal). Will our users have to write code in this manner? Or does it occur internally?

One choice is to impl impl From<Literal> for Vec<u8> instead of impl From<Literal> for ByteBuf and then we can convert to ByteBuf by ByteBuf::from(literal)

Another choice may be:

let bytes: BytesBuf = literal.into();
writer.append_ser(bytes)

Actually in the case where the type is BytesBuf explicitlly, we don't need Into::<ByteBuf>::into(literal).🤔

when I provide a impl From for Vec.

This way looks good to me.

@ZENOTME
Copy link
Contributor

ZENOTME commented Oct 17, 2023

I'm a little confused. I find that this avro bytes is different with binary encoding in avro spec. What's difference between these two format? 🤔

@JanKaul
Copy link
Collaborator Author

JanKaul commented Oct 17, 2023

I will try to explain it the best way I can. Please correct me if I'm wrong.

Iceberg defines it's own binary serialization format for statistics (upper & lower bounds). This statistical information is represented as bytes which are then stored in avro files using the avro binary encoding. So this is somewhat of a two-layered setup. In order to interpret a value you first have to deserialize the avro value into bytes and then convert the bytes to a value using the iceberg binary format.

When I implemented Literal my goal was to use serde_bytes::ByteBuf for the avro binary values. Just like you used it for the ManifestList. After deserialization the ByteBuf has to be converted to a Literal and that's what the Literal::try_from_bytes method is for.

@ZENOTME
Copy link
Contributor

ZENOTME commented Oct 17, 2023

I will try to explain it the best way I can. Please correct me if I'm wrong.

Thanks for your explanation! Totally understanded it!

Copy link
Contributor

@ZENOTME ZENOTME left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Xuanwo
Copy link
Member

Xuanwo commented Oct 18, 2023

The PR looks good to me overall, but I'm slightly concerned about the use of Into::<ByteBuf>::into(literal). Will our users have to write code in this manner? Or does it occur internally?

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@liurenjie1024
Copy link
Contributor

The PR looks good to me overall, but I'm slightly concerned about the use of Into::<ByteBuf>::into(literal). Will our users have to write code in this manner? Or does it occur internally?

I think it's used for serialization of literal, and not exposed to user.

@JanKaul
Copy link
Collaborator Author

JanKaul commented Oct 18, 2023

I added impl From<Literal> for Vec<u8>. Now it should be clearer.

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

@JanKaul
Copy link
Collaborator Author

JanKaul commented Oct 18, 2023

Thanks @ZENOTME for finding the bug!

@Fokko
Copy link
Contributor

Fokko commented Oct 18, 2023

Great catch @JanKaul and everyone for reviewing! Let's get this in!

@Fokko Fokko merged commit 9f94186 into apache:main Oct 18, 2023
6 checks passed
@JanKaul JanKaul deleted the fix-avro-bytes-test branch March 4, 2024 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants