Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Light Client model-based testing #529

Closed
wants to merge 62 commits into from

Conversation

andrey-kuprianov
Copy link
Contributor

@andrey-kuprianov andrey-kuprianov commented Aug 21, 2020

See #414. Also see Light Client model-based testing guide for the high-level description how model-based testing will be working.

This PR touches quite a lot of the repository, so I will be grateful for multiple reviews. It is mostly about building a lot of infrastructure that enables model-based testing, so it should be easy to extend now to other types of tests (bisection), or other crates/repos when needed.

One particular issue is that it uncovered a number of discrepancies between the model and the implementation of the LIghtClient; please see the changes to the Lightclient_A_1.tla / Blockchain_A_1.tla I have fixed them by adjusting the model, but it would be nice if @josef-widder or @konnov take a look at it.

In general, I hope, this is the first step in the right direction of keeping the specs live and in sync with the implementation, and of thorough / to some extent exhaustive testing of the implementation.

  • Referenced an issue explaining the need for the change
  • Updated all relevant documentation in docs
  • Updated all code comments where relevant
  • Wrote tests
  • Updated CHANGELOG.md

* defined structure for model-based single-step light client tests
* added a simple driver for such tests modeled after original one in
tests/lite.rs
* add the first auto-generated model-based test
* add symlinks to the TLA+ model

The important change in the test structure is that now each input block
carries with it the expected verdict, not a single expected result for the whole
input array: the latter is unclear how to interpret in case of failures.

Besides, having such more fine-grained verdicts allows to differentialte
between different types of errors returned by the light client.
…implementation

In the current model (Blockchain_A_1.tla) the check InTrustingPeriod
treats the edge case now = header.time + TRUSTING_PERIOD as valid,
while the implementation treats it as invalid.
`
Changed the model to follow the implementation.
self.dir = Some(dir.to_owned());
self
}
/// Execute the command as a child process, and extract its status, stdout, stderr.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you say a few words about what this gives us that we wouldn't get by just using https://doc.rust-lang.org/std/process/struct.Output.html provided by command.output()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right, this would allow to shorten the code a bit. I was simply not aware of this at the time of writing. But I don't think it's too critical -- at the end, it's the same functionality.

Copy link
Contributor

@shonfeder shonfeder Aug 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! My concern in that case is mostly not about code length. Rather, I'm concerned with (1) supporting readability by using the commonly available idioms and constructs rather than requiring readers to learn bespoke constructs that just duplicate existing functionality, and (2) supporting maintainability but cutting preventing the unnecessary addition of 80 lines of code to the code base. IME, both of those considerations matter enough that it is worth doing the right thing here.

Naturally, I'll deffer to others who are more in tune with the standards for this repo if they feel differently. But I have a strong view on this, and if I were the main reviewer I would insist on the change.

@shonfeder
Copy link
Contributor

Is it possible that this could be broken into smaller PRs? Some reasonable seeming divisions that occur to me while surveying the changes:

  • One PR for the command.rs and apalache.rs modules
  • One PR for changes to existing test harness
  • One PR for changes to the existing tests
  • One PR for changes to the TLA specs

I'm not actually sure if things could decouple that way, but it would enable much better quality reviews and digestion of the work you've done here. Even if it were only possible to break into 2 or three parts, it would be a big help.

@shonfeder
Copy link
Contributor

Also, probably worth fixing the clippy check's before a review :)

@andrey-kuprianov
Copy link
Contributor Author

andrey-kuprianov commented Aug 21, 2020

@shonfeder I do like your suggestion about breaking it into smaller PRs:) Not sure though I will be able to do it now, as I am on vacation till Sept 1...

As for the clippy check that fails, what they propose is actually incorrect. I.e. if you apply this you get a compiler error. I guess this points to a bug in clippy.

@shonfeder
Copy link
Contributor

shonfeder commented Aug 21, 2020

As for the clippy check that fails, what they propose is actually incorrect. I.e. if you apply this you get a compiler error. I guess this points to a bug in clippy.

I think the proper thing to do in such cases is to silence the linter check for the affected block of code https://github.com/rust-lang/rust-clippy#allowingdenying-lints (and I guess file an issue with clippy if it is also broken behavior, but that's orthogonal :).

Copy link
Member

@josef-widder josef-widder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only looked that the blockchain and lightclient TLA files. It looks that some changes (I commented on them) are made to match the code, but as a result the TLA+ files may differ from the English spec. We should create an issue that highlights the changes needed in the English spec to keep everything consistent.

@@ -75,7 +75,7 @@ LBT == [header |-> BT, Commits |-> {NT}]

(* the header is still within the trusting period *)
InTrustingPeriod(header) ==
now <= header.time + TRUSTING_PERIOD
now < header.time + TRUSTING_PERIOD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change come from the implementation or the English spec? Are those consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comes from the implementation, as otherwise I get a testing error: spec behavior is different from what we have in the implementation. And the English spec contains this trusted.Header.Time > now - trustingPeriod, so I guess this means that the English spec and the implementation agree, but the TLA+ spec is wrong in that respect; could you please verify that it's true?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now we do model-based testing as conformance test between English spec, TLA+, and the implementation. Amazing 😉

Comment on lines +138 to +139
THEN "SUCCESS"
ELSE "NOT_ENOUGH_TRUST"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think OK and CANNOT_VERIFY are used in the English spec. Are SUCCESS and NOT_ENOUGH_TRUST used in the implementation? If yes, we should adapt I guess the English spec.

Copy link
Contributor Author

@andrey-kuprianov andrey-kuprianov Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken the verdicts from the implementation. There is another issue I've got, namely that according to the TLA+ constraints the verdict FAILED_TRUSTING_PERIOD cannot ever be produced. At least it's impossible to obtain a test with such verdict from Apalache, and my quick manual check on the TLA+ constraints tells me it's indeed logically impossible. Could you please verify that? In case it's true, it may be makes sense to remove such a verdict. And I would actually update the English spec as well, as I find the implementation names a bit more informative

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have a look with @konnov

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why FAILED_TRUSTING_PERIOD cannot be produced. It is produced in line 115. However, it is not propagated into state. Do you like to see it in state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR is closed; let's discuss on another PR: 546

@@ -110,7 +110,7 @@ FaultAssumption(pFaultyNodes, pNow, pBlockchain) ==
(* Can a block be produced by a correct peer, or an authenticated Byzantine peer *)
IsLightBlockAllowedByDigitalSignatures(ht, block) ==
\/ block.header = blockchain[ht] \* signed by correct and faulty (maybe)
\/ block.Commits \subseteq Faulty /\ block.header.height = ht \* signed only by faulty
\/ block.Commits \subseteq Faulty /\ block.header.height = ht /\ block.header.time > 0 \* signed only by faulty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess faulty processes may be allowed to write any time, also negative. Or is the time stored in some unsigned domain in the implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is indeed stored and serialized/deserialized as an unsigned integer, so signed integers do no make sense and only add confusion, I am afraid

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in the English spec we implicitly also assume positive times. Perhaps we should clarify it there explicitly.

Copy link
Contributor

@konnov konnov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice trick with the history variable. However, instead of directly extending Lightclient_A_1.tla, I would recommend to extend Lightclient_A_1 with another module that monitors the light client with the history variable.

@andrey-kuprianov
Copy link
Contributor Author

It probably does make a lot of sense to split this PR into multiple ones, as @shonfeder proposed; I will do that when I am back from my vacation on Sept. 2. Just write it here so please do not spend your time reviewing this one.

@ebuchman
Copy link
Member

It probably does make a lot of sense to split this PR into multiple ones, as @shonfeder proposed; I will do that when I am back from my vacation on Sept. 2. Just write it here so please do not spend your time reviewing this one.

Let's close this in the meantime then. Thanks for breaking it up!

@shonfeder
Copy link
Contributor

Thank you for the PR refactor! Will dig into reviewing whatever is still open ASAP. 🙏

romac pushed a commit that referenced this pull request Sep 10, 2020
#547)

* #414: testgen tester -- utilities to run multiple tests with logs and reports

* #547 add missing file updates from #529

* fix merge typo

* TestEnv: change path parameters into AsRef<Path>

* change TestEnv::full_path to return PathBuf

* apply simplifications suggested by Romain

* apply simplification from Romain

* account for WOW Romain's suggestion on RefUnwindSafe

* address Romain's suggestion on TestEnv::cleanup

* cargo clippy

* update CHANGELOG.md
romac pushed a commit that referenced this pull request Sep 11, 2020
* #414: testgen tester -- utilities to run multiple tests with logs and reports

* 14: add testgen commands, in particular to call apalache and jsonatr

* #547 add missing file updates from #529

* fix merge typo

* remove ref to time: it's in another PR

* fix clippy warning

* TestEnv: change path parameters into AsRef<Path>

* change TestEnv::full_path to return PathBuf

* apply simplifications suggested by Romain

* apply simplification from Romain

* account for WOW Romain's suggestion on RefUnwindSafe

* address Romain's suggestion on TestEnv::cleanup

* cargo clippy

* update CHANGELOG.md

* addressed Romains's suggestions
romac pushed a commit that referenced this pull request Sep 11, 2020
* #414: testgen tester -- utilities to run multiple tests with logs and reports

* 14: add testgen commands, in particular to call apalache and jsonatr

* #414: add model-based test driver to LightClient tests

* add talk abstract

* #547 add missing file updates from #529

* fix merge typo

* remove ref to time: it's in another PR

* fix clippy warning

* fix merging typo

* cargo fmt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
light-client Issues/features which involve the light client tests verification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants