-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Light Client model-based testing #529
Conversation
… commands (e.g. apalache, jsonatr)
* defined structure for model-based single-step light client tests * added a simple driver for such tests modeled after original one in tests/lite.rs * add the first auto-generated model-based test * add symlinks to the TLA+ model The important change in the test structure is that now each input block carries with it the expected verdict, not a single expected result for the whole input array: the latter is unclear how to interpret in case of failures. Besides, having such more fine-grained verdicts allows to differentialte between different types of errors returned by the light client.
…implementation In the current model (Blockchain_A_1.tla) the check InTrustingPeriod treats the edge case now = header.time + TRUSTING_PERIOD as valid, while the implementation treats it as invalid. ` Changed the model to follow the implementation.
self.dir = Some(dir.to_owned()); | ||
self | ||
} | ||
/// Execute the command as a child process, and extract its status, stdout, stderr. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you say a few words about what this gives us that we wouldn't get by just using https://doc.rust-lang.org/std/process/struct.Output.html provided by command.output()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right, this would allow to shorten the code a bit. I was simply not aware of this at the time of writing. But I don't think it's too critical -- at the end, it's the same functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! My concern in that case is mostly not about code length. Rather, I'm concerned with (1) supporting readability by using the commonly available idioms and constructs rather than requiring readers to learn bespoke constructs that just duplicate existing functionality, and (2) supporting maintainability but cutting preventing the unnecessary addition of 80 lines of code to the code base. IME, both of those considerations matter enough that it is worth doing the right thing here.
Naturally, I'll deffer to others who are more in tune with the standards for this repo if they feel differently. But I have a strong view on this, and if I were the main reviewer I would insist on the change.
Is it possible that this could be broken into smaller PRs? Some reasonable seeming divisions that occur to me while surveying the changes:
I'm not actually sure if things could decouple that way, but it would enable much better quality reviews and digestion of the work you've done here. Even if it were only possible to break into 2 or three parts, it would be a big help. |
Also, probably worth fixing the clippy check's before a review :) |
@shonfeder I do like your suggestion about breaking it into smaller PRs:) Not sure though I will be able to do it now, as I am on vacation till Sept 1... As for the clippy check that fails, what they propose is actually incorrect. I.e. if you apply this you get a compiler error. I guess this points to a bug in clippy. |
I think the proper thing to do in such cases is to silence the linter check for the affected block of code https://github.com/rust-lang/rust-clippy#allowingdenying-lints (and I guess file an issue with clippy if it is also broken behavior, but that's orthogonal :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only looked that the blockchain and lightclient TLA files. It looks that some changes (I commented on them) are made to match the code, but as a result the TLA+ files may differ from the English spec. We should create an issue that highlights the changes needed in the English spec to keep everything consistent.
@@ -75,7 +75,7 @@ LBT == [header |-> BT, Commits |-> {NT}] | |||
|
|||
(* the header is still within the trusting period *) | |||
InTrustingPeriod(header) == | |||
now <= header.time + TRUSTING_PERIOD | |||
now < header.time + TRUSTING_PERIOD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change come from the implementation or the English spec? Are those consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comes from the implementation, as otherwise I get a testing error: spec behavior is different from what we have in the implementation. And the English spec contains this trusted.Header.Time > now - trustingPeriod
, so I guess this means that the English spec and the implementation agree, but the TLA+ spec is wrong in that respect; could you please verify that it's true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So now we do model-based testing as conformance test between English spec, TLA+, and the implementation. Amazing 😉
THEN "SUCCESS" | ||
ELSE "NOT_ENOUGH_TRUST" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think OK and CANNOT_VERIFY are used in the English spec. Are SUCCESS and NOT_ENOUGH_TRUST used in the implementation? If yes, we should adapt I guess the English spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken the verdicts from the implementation. There is another issue I've got, namely that according to the TLA+ constraints the verdict FAILED_TRUSTING_PERIOD cannot ever be produced. At least it's impossible to obtain a test with such verdict from Apalache, and my quick manual check on the TLA+ constraints tells me it's indeed logically impossible. Could you please verify that? In case it's true, it may be makes sense to remove such a verdict. And I would actually update the English spec as well, as I find the implementation names a bit more informative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have a look with @konnov
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why FAILED_TRUSTING_PERIOD
cannot be produced. It is produced in line 115. However, it is not propagated into state
. Do you like to see it in state
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this PR is closed; let's discuss on another PR: 546
@@ -110,7 +110,7 @@ FaultAssumption(pFaultyNodes, pNow, pBlockchain) == | |||
(* Can a block be produced by a correct peer, or an authenticated Byzantine peer *) | |||
IsLightBlockAllowedByDigitalSignatures(ht, block) == | |||
\/ block.header = blockchain[ht] \* signed by correct and faulty (maybe) | |||
\/ block.Commits \subseteq Faulty /\ block.header.height = ht \* signed only by faulty | |||
\/ block.Commits \subseteq Faulty /\ block.header.height = ht /\ block.header.time > 0 \* signed only by faulty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess faulty processes may be allowed to write any time, also negative. Or is the time stored in some unsigned domain in the implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is indeed stored and serialized/deserialized as an unsigned integer, so signed integers do no make sense and only add confusion, I am afraid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess in the English spec we implicitly also assume positive times. Perhaps we should clarify it there explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice trick with the history variable. However, instead of directly extending Lightclient_A_1.tla
, I would recommend to extend Lightclient_A_1 with another module that monitors the light client with the history variable.
It probably does make a lot of sense to split this PR into multiple ones, as @shonfeder proposed; I will do that when I am back from my vacation on Sept. 2. Just write it here so please do not spend your time reviewing this one. |
Let's close this in the meantime then. Thanks for breaking it up! |
Thank you for the PR refactor! Will dig into reviewing whatever is still open ASAP. 🙏 |
#547) * #414: testgen tester -- utilities to run multiple tests with logs and reports * #547 add missing file updates from #529 * fix merge typo * TestEnv: change path parameters into AsRef<Path> * change TestEnv::full_path to return PathBuf * apply simplifications suggested by Romain * apply simplification from Romain * account for WOW Romain's suggestion on RefUnwindSafe * address Romain's suggestion on TestEnv::cleanup * cargo clippy * update CHANGELOG.md
* #414: testgen tester -- utilities to run multiple tests with logs and reports * 14: add testgen commands, in particular to call apalache and jsonatr * #547 add missing file updates from #529 * fix merge typo * remove ref to time: it's in another PR * fix clippy warning * TestEnv: change path parameters into AsRef<Path> * change TestEnv::full_path to return PathBuf * apply simplifications suggested by Romain * apply simplification from Romain * account for WOW Romain's suggestion on RefUnwindSafe * address Romain's suggestion on TestEnv::cleanup * cargo clippy * update CHANGELOG.md * addressed Romains's suggestions
* #414: testgen tester -- utilities to run multiple tests with logs and reports * 14: add testgen commands, in particular to call apalache and jsonatr * #414: add model-based test driver to LightClient tests * add talk abstract * #547 add missing file updates from #529 * fix merge typo * remove ref to time: it's in another PR * fix clippy warning * fix merging typo * cargo fmt
See #414. Also see Light Client model-based testing guide for the high-level description how model-based testing will be working.
This PR touches quite a lot of the repository, so I will be grateful for multiple reviews. It is mostly about building a lot of infrastructure that enables model-based testing, so it should be easy to extend now to other types of tests (bisection), or other crates/repos when needed.
One particular issue is that it uncovered a number of discrepancies between the model and the implementation of the LIghtClient; please see the changes to the
Lightclient_A_1.tla
/Blockchain_A_1.tla
I have fixed them by adjusting the model, but it would be nice if @josef-widder or @konnov take a look at it.In general, I hope, this is the first step in the right direction of keeping the specs live and in sync with the implementation, and of thorough / to some extent exhaustive testing of the implementation.