-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Action items: adopting property-based testing in CPython's CI #65
Comments
Two other alternatives to 2:
The way I see it, if we write the property-based tests with lots of |
I have some concerns about Hypothesis. I worry that there is a rather hard "sell" going on here. Can we perhaps have an open discussion on the merits of Hypothesis itself on python-dev before committing to this? It seems to promote a coding style that's more functional than is good for Python, and its inner workings seem too magical to me. Also the decorator-based DSL looks pretty inscrutable to me. |
You mean
Do you mean it's promoting a coding style within the tests, or in the public API? When designing I think there are situations where it makes sense to write functional-style strategies because it can in some situations give hypothesis more information about how the strategies are transformed (and thus allow it to optimize how the data is drawn rather than drawing from the full set and discarding a bunch of stuff that doesn't match, or doing a bunch of extra logic on each draw), but it's by no means required. I'd say most of my "complex strategies" are decorated functions rather than chains of maps and filters. From the stdlib-property-tests repo, this seems like the most complicated strategy I'm seeing employed, and I find it fairly simple to understand, and I don't know how easy to read any code would be that generates trees of arbitrary depth containing objects with specific properties. |
Let's have this discussion elsewhere, not in the Steering Council's tracker.
|
Discussion has continued on BPO-42109. |
FWIW, I have a better understanding now of what Hypothesis does and how. I withdraw my opposition against Hypothesis insofar as it's about how it works. I leave it up to the SC to consider the other issues that have been brought up, largely around having a 3rd party dependency in the core tests. (Zac, thanks for your patience with me and for your help in understanding Hypothesis' approach). |
@brettcannon - this seems to be some combination of stuck, forgotten, or low-priority. Is there anything I can do to help out, instead of coming back for another presentation in 2022? |
Sorry about the delayed response on this! What we are looking for is a PR w/ a framework that stubs out hypothesis in the stdlib (@pganssle was working on this at some point, but we don't think he is currently). Probably an email to python-dev to gather help wouldn't hurt. Otherwise once a PR showing how it will work is ready then a separate email to python-dev to allay concerns, get feedback, etc. would be good. |
We talked about this again and we agreed we need a working, practical solution for the core devs to evaluate. |
With PEP-678 done (for now), I'll aim to have a working up-to-date PR for evaluation by or at PyCon 👍 |
any update on this? |
Since I (as hyptothesis team member) am very interested in bringing this to CPython, I would like to propose a solution. TLDR (I've made a PR with the setup example): sobolevn/cpython#4
I think this is the main problem right now. As far as I know, CPython's test framework does not have any 3rd party dependencies (except for optional But, some other tasks have:
I think we should treat I also don't think that running hypothesis on every build is worth it.
Basically, the action file would look like so: https://github.com/sobolevn/cpython/blob/8e20a3185939d01723ba928c35393ced0bf55053/.github/workflows/hypothesis.yml This looks quite simple to me. And what is the most important here: we can decrease the complexity for regular contributors by not to dealing with quite complex and unknown property-based tests. Versioning
Next steps
Feedback is welcome! CC @Zac-HD |
This sounds like a very reasonable plan to me. I don't want the core test suite to depend on packages from outside our repo, but a scheduled buildbot totally can. An additional policy to determine is "what do we do when tests fail?" The downside of a non-blocking test suite is that someone has to be watching and reacting to it. We can't just assume that'll happen. |
We could have a bot automatically create an issue when the workflow fails. That would bring greater visibility to the failure. We have a daily workflow at typeshed which does this (though it has a few warts, e.g. you have to click through to the workflow in order to see exactly which tests failed where). Typeshed workflow: https://github.com/python/typeshed/blob/708996dff09581e52ae6c91711f82d293d3ec542/.github/workflows/daily.yml#L85-L103 Example issue: python/typeshed#8818 |
Apologies for letting this slip for so long. I'd pretty strongly prefer @pganssle's approach of providing a Hypothesis shim, so that you can run without Hypothesis installed - in which case the I'm also pretty nervous that running as a cron job will make the experience much more frustrating, as problems are reported after they're merged rather than before. What's the actual downside of running a Hypothesis job in CI? The compute cost is trivial, and failures have to be addressed at some point regardless. (My main project for this month is open-sourcing the fuzzer I've been building for a while; once I get the config and workflow nailed down that would be a great thing to drop in a cron job. Happy to be pulled in on CPython issues or PRs anytime though, they'll take even higher priority.) |
@Zac-HD, yes, I agree that running For example, let's say some new contributor adds something / fixes a bug. And But, in the scenario I proposed as a compromise: we can still benefit from having hypothesis in the CPython. With this approach we can:
So, please consider my proposal as plan "B", so we can actually get this merged in the nearest future. |
Our required CI tests are already too unstable. If we add anything to them, I'd want it to be entirely static, and ideally transparent, which basically means a list of all the test cases. Provided they're checked in and it's obvious which test is failing, I'm not concerned about how they're generated, but they can't depend on installing anything from PyPI. And as sobolevn says, they should never fail on unrelated PRs. I'm not convinced that (a) hypothesis can guarantee that (and I say that as a user who deals with new random failures due to hypothesis in my other projects), and (b) if it does that it still provides any value over a static list of examples. |
I think this makes Paul's stubs approach a good one: it's a dependency-free list of test cases written out in the source code, but with the advantage that you can use Hypothesis to find additional test cases outside of your required CI tests, e.g. in a cronjob. I'd personally still run a non-required CI job with Hypothesis installed too, but I'm not going to argue further. My colleague Nelson wrote about the workflow here. I've also been playing with the idea of using the fuzzer to find a minimal set of covering examples which can be automatically added to the source code; if CPython would use that I'll bump it to the top of my todo list. |
🎊 python/cpython#22863 has been merged, and property-based tests are now run in CPython's CI 🎊 There are plenty of possible enhancements (more tests, fuzzing, etc.), but none that require a decision by the Steering Council. With thanks for your patience, I think it's time to close this issue at last! |
At the 2021 Language Summit - like in 2020 - I proposed that CPython should use Hypothesis in CI and increase its use of fuzzing (via OSS-Fuzz). Core devs were generally keen, and so @brettcannon asked me to open an issue here to track the action items. As listed in my slides, that's:
The main question is how exactly to get Hypothesis installed into the CI environment. @zooba was concerned about depending on being able to pip install things before we even run core tests, and suggested vendoring Hypothesis. I'm not implacably opposed to vendoring, but hopefully Paul's shims-approach plus pinning the version might already address this concern?
Tests for the parser or tokenize module would also pull in dependencies on
hypothesmith
, plus transitive dependencies on Lark and LibCST. This would be much more elegant to handle with shims than by vendoring, and particularly valuable given the history of Hypothesis-discovered parser bugs (e.g. BPO-40661, BPO-38953, and BPO-42218).I also proposed cherry-picking from https://github.com/Zac-HD/stdlib-property-tests to test the parser and expanding CPython's use of OSS-Fuzz. However, I think that once the Steering Council has unblocked the first property-based tests for CI, further work can proceed on the usual issue-by-issue or PR-by-PR basis. I will of course be happy to help with this in whatever way I can.
The text was updated successfully, but these errors were encountered: