Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization refactor step 2 #269

Merged
merged 6 commits into from
May 14, 2020
Merged

Conversation

greg-szabo
Copy link
Member

Implemented null-value deserialization for Vec using Visitor.

This is part 2 of the serialization improvement efforts described in #247 .

(This PR was opened after part 1 was merged. It was easier to open this than to keep the original #257)

This PR is ready for review.

  • Referenced an issue explaining the need for the change
  • Updated all relevant documentation in docs
  • Updated all code comments where relevant
  • Wrote tests
  • Updated CHANGES.md

@greg-szabo greg-szabo requested a review from liamsi May 13, 2020 01:59
@codecov-io
Copy link

codecov-io commented May 13, 2020

Codecov Report

Merging #269 into master will increase coverage by 0.0%.
The diff coverage is 0.0%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master    #269   +/-   ##
======================================
  Coverage    27.4%   27.4%           
======================================
  Files          99      99           
  Lines        3694    3694           
  Branches     1168    1169    +1     
======================================
+ Hits         1013    1014    +1     
+ Misses       1871    1870    -1     
  Partials      810     810           
Impacted Files Coverage Δ
tendermint/src/rpc/endpoint/abci_query.rs 0.0% <0.0%> (ø)
tendermint/src/serializers.rs 19.4% <0.0%> (+1.3%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 08f280f...0ff4ae3. Read the comment docs.

Copy link
Member

@liamsi liamsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! looks good. I'm surprised we need to write an extra visitor for dealing with null in json (I'd expect this would be handled by serde by default 🤷). I'll give a more thorough review later today!

Thanks Greg :-)

@greg-szabo
Copy link
Member Author

greg-szabo commented May 13, 2020

I think this is more to do about the strictness in Rust. When you omit a value in JSON, it defaults to null. Rust can deal with omitted values by using the Default trait. But if you explicitly set a value to null in JSON, that's not strictly a default value in Rust. Hence it's a value you need to receive and process. By default, nulls are Nones in Rust, so with an Option<> we would be able to receive and process it. But the whole point for us was to get away from Options, to make life easier for the developer, hence we need a "layer" of code that receives the explicit null value and turns it into something else. That's where the Visitor comes in.
If serde wanted to deal with this internally, they would need to assume that a null value translates to the default value. Since, even the Default trait is optional, I think they wanted to keep this optional too. Maybe an annotation for null values would help, like there is one for default values.

Edit: Going through the serde issues, I think this is actually not well thought out. (For example: Inconsistent handling of Option/defaulting in seq deserializer)

Using a Visitor gives us a consistent behaviour for now. Maybe later it gets better and we can get away from the Visitor.

@liamsi
Copy link
Member

liamsi commented May 13, 2020

I might have thought that serde does treat null as an empty vector. Apparently it doesn't: serde-rs/serde#836

So the code looks fine to me, I was just wondering if we can do without the extra visitor here. Also I though we'd have a separate serilziation type for these cases (which could have an option in the fields and the actual ideal tendermint type that just has a vec and From impls between those types).

Maybe I'm overthinking all this 😄

@greg-szabo
Copy link
Member Author

greg-szabo commented May 13, 2020

In the case of Vec<u8> it doesn't seem to make sense to turn it into a custom type (which would possibly allow us to use Option<> and convert internally, like in CommitSig) because the custom type wouldn't add any value (HexBytes and Base64Bytes are only different on the JSON side, it doesn't make any difference for the developer). Additionally, we can't really turn it into a generic "Bytes" type either, unless we store what it should serialize to. - Which seems overkill. The annotation takes care of that with less hassle.

Built-in types can only be extended in the library we use them in (everywhere one-by-one) which would sprinkle serialization code all over the place. Or we write an annotation (like in the PR), which will require the Visitor so it can convert between two different types. (emphasis on different)

So, with annotating built-in types from weird formats, we might need to use a Visitor (like in this case). For custom types, most probably we won't need it, we'll have an internal struct (with Option<> or whatever it needs) and convert from that.

At least that's how I see it. If you think we can do this without the Visitor, point me in the right direction.

@melekes
Copy link
Contributor

melekes commented May 14, 2020

I might have thought that serde does treat null as an empty vector.

Maybe the solution would be to not have explicit nulls? Is it possible?

melekes
melekes previously approved these changes May 14, 2020
@liamsi
Copy link
Member

liamsi commented May 14, 2020

Maybe the solution would be to not have explicit nulls? Is it possible?

I would like that and I think this would be ideal! But it would require some changes in the go code too and we'd need to communicate this clearly in the spec (nulls are not uncommon in JSON).

The work in tendermint/tendermint#4828 could be an opportunity to validate this idea.

@liamsi
Copy link
Member

liamsi commented May 14, 2020

In the case of Vec it doesn't seem to make sense to turn it into a custom type

I didn't mean Vec type but the wrapping struct which has Vec<u8> as a field. E.g. let's say the Header:

serilaization::block::Header <---- TryFrom ----> core::block::Header

The left header can use an Option in the fields or whatever is necessary to correctly deserialize this and the TryFrom handles this accordingly and translates into a real, core value on the right (with absolutely no serde annotation whatsoever). See #197 (comment)

Downside would be a lot of duplication as the types would be almost identical. But it would be really, really simple, too. Also, there would be a clear separation between serialization types and ideal core types that hold more logic and with which devs would mainly deal with. Devs wouldn't need to care at all about the serialization specifics. The only thing they need to now is that under a serialization module, there are types from which you can use to (de)serilalize and translate to the core types. Does al that make any sense?

That all said, this PR is an improvement to current situation and excellent work! The above rather touches upon that we didn't yet decide on a clear strategy on serialization as a whole (and not about specific core types).I think we should merge this PR but keep in mind that there is a bigger question on howto handle serialization (also of different serialization kinds, e.g. binary proto/amino and different kinds of JSON). If we decide to go with the above mentioned approach, some of the custom serialization code (the questioned Visitor and the ByteStringType enum), can be deleted.

liamsi
liamsi previously approved these changes May 14, 2020
Copy link
Member

@liamsi liamsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too happy about the custom Visitor and the introduced ByteStringType enum etc.

It seems "null" decoding could have been achieved by changing one line in each serializers::bytes:: module deserialize method.

As this is probably debatable what is easier to read and digest as a human and this PR as improves the serialization code by adding (back) the ability to deal with nulls in a relatively simple manner, I'm OK with this PR as is. Most of this might go through a major overhaul when tendermint switch to using jsonpb.

Impressive work on wrestling with the serde API.

liamsi added a commit that referenced this pull request May 14, 2020
really works
@liamsi liamsi mentioned this pull request May 14, 2020
5 tasks
* Try if #269 (comment)
really works

* cargo fmt --

* clippy
@liamsi liamsi dismissed stale reviews from melekes and themself via 0ff4ae3 May 14, 2020 12:36
Copy link
Member

@liamsi liamsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@greg-szabo greg-szabo merged commit c2f3118 into master May 14, 2020
@greg-szabo greg-szabo deleted the greg/serialization-2-alter branch May 14, 2020 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants