Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization improvements #248

Closed
wants to merge 17 commits into from
Closed

Conversation

greg-szabo
Copy link
Member

@greg-szabo greg-szabo commented May 2, 2020

#247 - Serialization refactor
#246 - CommitSig bug fix

  • Referenced an issue explaining the need for the change
  • Updated all relevant documentation in docs
  • Updated all code comments where relevant
  • Wrote tests
  • Updated CHANGES.md

timestamp: Time,
/// Signature of vote
signature: Signature,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Representing this as an enum is a nice idea 👍

}
}

deserializer.deserialize_map(CommitSigVisitor)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the enum idea is cool, it seems to come with the cost of a rather complicated Deserialize impl for commitSig.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and that's the tradeoff @brapse was going to go for in an earlier discussion. (Feel free to engage Sean.) The idea is that serialization is not actually the core product here and during the usual development it's just annoying to do something some particular way just because serialization says so.
This approach puts the logic of making sure that the incoming data is deserialized properly (with some basic validation) and the outgoing data is serialized properly in the serialization "layer" and the internal workings of the application can become completely independent from it. For example it can make use of the additional features of Rust better.
Just to highlight it even more: if/when the Tendermint data model changes because the Go + Rust teams decide on some other implementation, the changes can be implemented in this layer without affecting the application.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit there is one not-completely-Rust-native-best-practice feature in the code and it annoys me slightly. I deserialize the incoming CommitSig into an internal CommitSigSpec structure that is defined in ADR-025. I could have gone hard-core and implement complete deserialization while parsing the incoming text. But that would be even more convoluted and harder to change down the road. I feel this is a good compromise that allows us to easily update the deserialization code, if a new ADR comes by.

Copy link
Member

@liamsi liamsi May 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and that's the tradeoff @brapse was going to go for in an earlier discussion. (Feel free to engage Sean.) The idea is that serialization is not actually the core product here and during the usual development it's just annoying to do something some particular way just because serialization says so.

That is absolutely reasonable. Serialization isn't the core product and we shouldn't build our core types around it. All in favour of decoupling serialization concerns and core types (I like this approach for this: #197 (comment) by @tarcieri ).

What I was referring to was that the seemingly beautiful change made the part where the serialization actually happens difficult to read and quite complex. A simpler way might be to have a serialization type and a From / Into for that and the actual type. This is very similar to what you are already doing with the CommitSigSpec type but simpler to parse for a human :-)

Also, it would be cool if the CommitSigSpec would use an enum instead of an u8 for the block_id_flag. Then one wouldn't need to annotate the match arms with meaning (e.g. 1 => {...} with // BlockIDFlagAbsent).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and done: Today's commit moved things into serde's "from" annotation. So now, the code looks a bit cleaner, although it's not less code:
In the serializers, a new RawCommitSig type was introduced that deserializes from JSON and it looks very much like the definition in the ADR (including the enum for BlockIDFlags). Deserialization automatically invokes basic type-checking of the struct (like an u8 has to be an u8).
The Rust-native CommitSig type converts from this RawCommitSig type through it's "From" trait which holds the validation for the struct. This logic is more about "if this variable is set, than that other variable also has to have a value".

So the deserializer does type-check and the From trait does validation. My only grief is that type-checking happens in the serializers library while validation happens in the struct's library (in this case at CommitSig). Interestingly, this is similar to how it used to be, so I'm guessing the developers will be fine with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the code looks cleaner that is already worth it 👍 And I think it is OK for the core type to defines when it's valid (s.t. you can only construct a valid type).

@greg-szabo
Copy link
Member Author

I'm going to look into the clippy messages and fix them. While I do that I also wanted to call out for @Shivani912 and ask her about some of the test-stable tests that are failing. (single_skip_commit_tests_verify and commit_tests_verify). I feel they are failing because the used JSON input is not compliant with the slightly stricter implementation this new CommitSig has: according to ADR-025, if a signature has block_id_flag = 1 (no vote cast), then there should be no signature and timestamp fields. Some of the JSON files have them, hence the above two tests fail.

Shivani, I'm happy to fix those in the JSON files, but I'm worried that I'm not editing the source of truth. Are you still copying these files from your own repository? Is it ok if I update them here?

@greg-szabo greg-szabo force-pushed the greg/serialization-improvements branch from 20cf2d6 to d1b33ff Compare May 3, 2020 00:24
@@ -34,7 +34,7 @@ mod rpc {
async fn abci_info() {
let abci_info = localhost_rpc_client().abci_info().await.unwrap();

assert_eq!(&abci_info.version, "0.16.2");
assert_eq!(&abci_info.version, "0.17.0");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, this will also close: #249

@liamsi
Copy link
Member

liamsi commented May 3, 2020

Just a remark regarding the JSON tests: some might fail because now basic validation is done while decoding the type (e.g. the relation between BlockIdFlag and absence or presence of certain fields).

This is a good move (as we can fail earlier on invalid input) but might require changes in the JSON test because that would mean that some test-files won't parse anymore. Currently, they would parse correctly but fail later (when validation is done). This is a sign that the JSON tests might be to narrowly built around the golang version: there the JSON in a few cases might parse correctly and the validation would fail later (and this is implicitly expected as an error in the JSON test description).

@Shivani912
Copy link
Contributor

I'm going to look into the clippy messages and fix them. While I do that I also wanted to call out for @Shivani912 and ask her about some of the test-stable tests that are failing. (single_skip_commit_tests_verify and commit_tests_verify). I feel they are failing because the used JSON input is not compliant with the slightly stricter implementation this new CommitSig has: according to ADR-025, if a signature has block_id_flag = 1 (no vote cast), then there should be no signature and timestamp fields. Some of the JSON files have them, hence the above two tests fail.

Shivani, I'm happy to fix those in the JSON files, but I'm worried that I'm not editing the source of truth. Are you still copying these files from your own repository? Is it ok if I update them here?

I'll fix this on the generator code and update the tests in this repo. But, looks like this should be reported for the the Go code. A CommitSig with BlockIDFlagAbsent is expected to have all other fields "empty" including the ValidatorAddress field (this check here).

Thanks for finding this @greg-szabo !

@greg-szabo
Copy link
Member Author

Yes, I'm going to report that on the next Tendermint call. The Go code doesn't comply with the ADR.

@liamsi
Copy link
Member

liamsi commented May 4, 2020

A general remark: I think this PR could have better been 2, 3 or even 4 PRs:

  1. CommitSig bug fix
  2. serilization refactor:
    2.1) smaller repetitive changes like cleaning up all those #[serde()] annotations
    2.2) introduction of that ByteStringType
    2.3) splitting up existing serializers into separate modules/files and introducing new serializers (if any)

@greg-szabo greg-szabo force-pushed the greg/serialization-improvements branch from 8bc39cd to 2a032ab Compare May 4, 2020 23:03
@liamsi
Copy link
Member

liamsi commented May 4, 2020

The failing tests (on stable) is because #248 (comment)
Previously this would parse but return an error on validation, now this doesn't parse anymore (and because of the panic none of the tests in that json file are run):

thread 'commit_tests_verify' panicked at 'called `Result::unwrap()` on an `Err` value: Error("timestamp is present for BlockIDFlagAbsent CommitSig", line: 1343, column: 16)', tendermint/tests/lite.rs:138:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@@ -94,7 +94,7 @@ jobs:
args: --all-features --no-fail-fast
env:
CARGO_INCREMENTAL: "0"
RUSTFLAGS: "-Zprofile -Ccodegen-units=1 -Cinline-threshold=0 -Clink-dead-code -Coverflow-checks=off -Zno-landing-pads"
RUSTFLAGS: '-Zprofile -Ccodegen-units=1 -Cinline-threshold=0 -Clink-dead-code -Coverflow-checks=off -Cpanic=unwind -Zpanic_abort_tests'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not fix this on master, get that PR merged in quickly and merge master into this PR? It would benefit all other opened PRs in the meantime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a "one-branch" guy :D
Sure, that makes sense.

@Shivani912
Copy link
Contributor

@greg-szabo Another note on CommitSig with BlockIDFlagAbsent:

An empty timestamp field (i.e. of type time.Time) in Tendermint Go is by default "zero" time i.e. "0001-01-01T00:00:00Z". This means that in case of an empty CommitSig, there will always be timestamp field with this zero time, unless they change the way it serializes.

@greg-szabo
Copy link
Member Author

greg-szabo commented May 4, 2020

I thought they'd serialize it as a null... They do that all the time. :)

@greg-szabo
Copy link
Member Author

Regarding managing the PR: I was thinking on doing that but all the PRs would have depended on each other (or at least in some cases it would have). Is that ok?

Also, the work is not done yet, I just chose a bigger milestone instead of small changes (which I'll try next time). There is a "huge" TODO in the middle of the serializers code that will refactor at least the Header struct and possibly the Block too.

@liamsi
Copy link
Member

liamsi commented May 4, 2020

I thought they'd serialize it as a null... They do that all the time. :)

Nope: #196 (comment)
I quote @ebuchman here (again):

Serialization issues have been a constant plague upon us.

@liamsi
Copy link
Member

liamsi commented May 4, 2020

Is that ok?

OK, but please do not add more to it at least:

Also, the work is not done yet, I just chose a bigger milestone instead of small changes (which I'll try next time). There is a "huge" TODO in the middle of the serializers code that will refactor at least the Header struct and possibly the Block too.

It quickly becomes painful to review and also the quality of the review decreases. Some of the changes do not depend on each other, e.g.: (smaller things like) #248 (comment) and https://github.com/informalsystems/tendermint-rs/pull/248/files#r419026955,
and (partly) 2.1) and the first half of 2.3) in #248 (comment). Also some of the changes could be done incrementally.

);
}
// TODO: deal with Time
// see https://github.com/informalsystems/tendermint-rs/pull/196#discussion_r401027989
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was also a reminder about the time peculiarities :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sh*t, I completely missed that. :( Thanks for the thorough review!

Should I add the fix on this PR?

The fix as I see is accepting Nil commits with timestamp and checkinf if the timestamp is 0. If yes, all's well (but don't put the timestamp in the enum), if no, raise an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened issue #259 about this.

@liamsi liamsi mentioned this pull request May 4, 2020
_ => None,
/// CommitSig implementation
impl CommitSig {
/// Helper: Extract validator address, since it's always present (according to ADR-025)
Copy link
Member

@liamsi liamsi May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really should clarify this first: #246 (comment)

I can very well be that adr isn't representing the full decision anymore, or, that it is slightly confusing nil and absent votes. Also: #196 (comment)

cc @melekes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created tracking issue #260 about this.

@Shivani912 Shivani912 mentioned this pull request May 5, 2020
5 tasks
@yihuang
Copy link
Contributor

yihuang commented May 6, 2020

I suggest adding symmetric assertions to the unit tests, something like:

serde_json::from_str(&serde_json::to_string(&obj).unwrap()).expect("Should be decodable");

Or even better the decoded one is the same as original one if it implements Eq:

assert_eq!(serde_json::from_str(&serde_json::to_string(&obj).unwrap()).unwrap(), obj);

@liamsi
Copy link
Member

liamsi commented May 6, 2020

This is a great suggestion @yihuang. There is a little function introduced by @romac that could be used here:

/// Test that a struct `T` can be:
///
/// - parsed out of the provided JSON data
/// - serialized back to JSON
/// - parsed back from the serialized JSON of the previous step
/// - that the two parsed structs are equal according to their `PartialEq` impl
pub fn test_serialization_roundtrip<T>(json_data: &str)

@tarcieri
Copy link
Contributor

tarcieri commented May 6, 2020

An interesting way to do those "round trip" tests is with a tool like proptest

@greg-szabo
Copy link
Member Author

Ok, so I've broken out this PR into three smaller PRs as we discussed with Ismail. This one will be closed soon, I just want to save the additional things that came up into the issue (#247) until they are implemented/decided on.

@greg-szabo greg-szabo closed this May 8, 2020
@greg-szabo greg-szabo deleted the greg/serialization-improvements branch May 26, 2020 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants