-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve structure of timezones()
strategies
#2414
Comments
BPO-40536 / pganssle/zoneinfo#60 adds The
|
(Warning: Post contains wall of text)
So it seems: >>> import zoneinfo
>>> len(zoneinfo.available_timezones())
595 Most of those are not unique zones, mind you, there's a lot of aliases, and you could pretty easily detect that (on my system they're installed with hard links): >>> import os
>>> def get_inode(key):
... return os.stat(os.path.join(zoneinfo.TZPATH[0], key)).st_ino
...
>>> len(set(map(get_inode, zoneinfo.available_timezones())))
388 That said, an advantage hypothesis has over exhaustive testing is that it has the test case minimizer. Luckily, the first alphabetical zone, Having a reasonable (even if it's based on some heuristics) basis for minimization would be very useful.
I'm not sure what you mean by "the comparative testing thing", or why it would be disqualifying. Anyway, I think it may not be so bad to use a mostly hard-coded ordering (possibly move the data for this ordering into an extras package that can be updated independently of the hypothesis code, but honestly these keys are pretty stable):
You can hard-code a list of "super simple" and "especially weird" and then filter the lists by Another option for a generated list of tricky zones is to just parse the files yourself (it's easier than it seems — I've written versions of this parser several times now; plus, it doesn't really matter if you get it wrong, since the order was essentially arbitrary before anyway). Although
For reasons I get into a bit in this SO answer, this isn't a perfect representation of the zone (it won't pick out things like the At some point I will have time to actually work on some of this stuff, but in the very likely event that that point is far in the future, I hope these notes are useful. |
Thanks Paul! Walls of text are appropriate and useful here 😄
That's the other benefit of the proposal in this issue - with your five-part categorisation (:heart:) we'll always shrink to UTC or to a fixed offset if possible. Concretely, I think my proposal is to write (some scripts which generate) a list of groups of keys such as |
This issue was prompted by @pganssle's fantastic review of #2392 (comment): between an expert commentary and my own knowledge that Ireland has a negative DST offset, I wrote a much more targeted test and exposed a problem with affected
pytz
timezones.Ideally the obvious test written by a naive user would also discover such problems. #69 describes half the trick; the other half is to preferentially generate 'tricky' timezones. I think this could be as simple as changing from sampling from a flat list, to sampling from from similar subsets of available timezones. For example:
st.datetimes()
towards bug-revealing values such as DST transitions and other oddities #69 (comment), etc.Of course the purpose of this structure is to put a disproportionate emphasis on entries from the weird-stuff category. To make this more fun, we can't hard-code a list because timezones are subject to change at short notice and can vary between machines - so we need to compute the groups each time Hypothesis runs.
This is low on my list of priorities for now, not least because it isn't that useful without #69, but it would be nice to introduce it along with a PEP-615 timezones strategy in September 🙂
The text was updated successfully, but these errors were encountered: