Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add available timezones #60

Merged
merged 6 commits into from
May 18, 2020
Merged

Add available timezones #60

merged 6 commits into from
May 18, 2020

Conversation

pganssle
Copy link
Owner

@pganssle pganssle commented May 9, 2020

Add a function to retrieve all time zones available on the system.

Copy link

@FFY00 FFY00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe we should be ignoring posix and right subdirectories? Right now zoneinfo.available_timezones() returns 3 entries for each timezone, eg. Africa/Abidjan, posix/Africa/Abidjan and right/Africa/Abidjan. But only Africa/Abidjan should be used.

See https://www-master.ufr-info-p6.jussieu.fr/2014/spip.php?page=zman&nom=hwclock#sect15.

To configure a system to use a particular database all of the files located in its
directory must be copied to the root of /usr/share/zoneinfo. Files are never used directly
from the posix or ’right’ subdirectories, e.g., TZ=’right/Europe/Dublin’. This habit was
becoming so common that the upstream zoneinfo project restructured the system’s file tree
by moving the posix and ’right’ subdirectories out of the zoneinfo directory and into
sibling directories:

  • /usr/share/zoneinfo
  • /usr/share/zoneinfo-posix
  • /usr/share/zoneinfo-leaps

zoneinfo.available_timezones() is also returning posixrules as a valid timezone, valid_key() is not enough in this case.

Comment on lines +154 to +166
if not key or key in valid_zones:
continue

if valid_key(fpath):
valid_zones.add(key)
Copy link

@FFY00 FFY00 May 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a very small nitpick, this could be simplified as:

if key and valid_key(fpath):
    valid_zones.update([key])

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite the same thing, because valid_key actually opens and reads the file, but it won't actually change the outcome if we've already found the key from another source. If tzdata is installed, the common case is that we read a single file, populate valid_zones with all the standard keys, then start searching TZPATH to populate with any additional zones available to the user, so hopefully very few file accesses are actually happening.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I missed that, sorry.

@pganssle
Copy link
Owner Author

I think maybe we should be ignoring posix and right subdirectories? Right now zoneinfo.available_timezones() returns 3 entries for each timezone, eg. Africa/Abidjan, posix/Africa/Abidjan and right/Africa/Abidjan. But only Africa/Abidjan should be used.

Hm.. I am mildly torn on this, because users do have the option of choosing from the posix/ or right/ directories, and if you're using available_timezones() for something like generating hypothesis tests, you want to hit all of them.

On the other hand, the link you posted is suggestive that these are special directories and should be ignored, and make zonenames does not include the posix/ or right/ directories.

zoneinfo.available_timezones() is also returning posixrules as a valid timezone, valid_key() is not enough in this case.

This is another one I'm a bit torn about, because this actually is a valid zone, and it might be useful to people, since it seems like it's just a link to your local time zone:

>>> import zoneinfo
>>> zoneinfo.ZoneInfo("posixrules")
zoneinfo.ZoneInfo(key='posixrules')

But there should probably be a better way to get that.

I'm not sure how to solve this without hard-coding. posixrules we could detect if we were to deduplicate links, but on my system (and in the output of tzcode's make install) the links are hard links rather than symlinks to a canonical value. We'd be at risk of removing America/New_York in favor of US/Eastern (or even worse, removing both of them in favor of posixrules for people with systems set to Eastern time).

One possible option is to parse tzdata.zi and try and pull the values from there, but not every distro ships tzdata.zi, so we'd need some fallback.

The situation with posix/ and right/ are also complicated by the fact that according to your link, apparently some distros change the structure of the zoneinfo directories, so I'm not sure we can count on hard-coding posix/ and right/ as values to filter out.

If you have any suggestions let me know. I think I will send an e-mail to the tz mailing list to see if they have any ideas.

@FFY00
Copy link

FFY00 commented May 13, 2020

I am also torn by this, it shouldn't be used, but people are using it. I think simply disallowing it might not be the best idea, but I don't think the current behavior is the best either.

From the top of my head, I would suggest adding a strict argument defaulting to True.

def available_timezones(strict=True):
	...

When strict is true we would filter out posix/ and right/. We could actually do posixrules too.

There are several use cases where you might want to use available_timezones(), and this option could be toggled accordingly. In most cases, for eg. when populating a GUI element with the available timezones, you wouldn't want posix/ and right/ to show up. But there are some use cases where we need to be more permissive, eg. when you actually need the list of all possible values, like the hypothesis tests example. Here, we can still switch on the behavior.

strict was the first thing that came to mind, it could be replaced by something better.

I'm not sure how to solve this without hard-coding. posixrules we could detect if we were to deduplicate links, but on my system (and in the output of tzcode's make install) the links are hard links rather than symlinks to a canonical value. We'd be at risk of removing America/New_York in favor of US/Eastern (or even worse, removing both of them in favor of posixrules for people with systems set to Eastern time).

I think hardcoding should be fine, at least in the context I described above.

The situation with posix/ and right/ are also complicated by the fact that according to your link, apparently some distros change the structure of the zoneinfo directories, so I'm not sure we can count on hard-coding posix/ and right/ as values to filter out.

Well, I don't think this is an issue since we only have to ignore posix/ and right/ inside TZPATH.

I think I will send an e-mail to the tz mailing list to see if they have any ideas.

Please CC me if you remember 😄

@pganssle
Copy link
Owner Author

From the top of my head, I would suggest adding a strict argument defaulting to True.

I dunno if this is very clear, it could mean any number of things, including, "Only give me canonical zones".

There are several use cases where you might want to use available_timezones(), and this option could be toggled accordingly. In most cases, for eg. when populating a GUI with the available timezones, you wouldn't want posix/ and right/ to show up.

Well, as I mention in the documentation, "populating a GUI with the available timezones" is not really a valid use case for this anyway. You'd want to filter the list via CLDR / ICU anyway to get human-readable names. If this is going to be a common use case, maybe it's better to avoid adding the feature at all.

I think hardcoding should be fine, at least in the context I described above.

This is destined for the Python standard library (this repo will turn into a backport when I'm done integrating it there), so I'm very reluctant to hard-code things, particularly if they are incidental and not any sort of interface with backwards-compatibility guarantees and cross-platform compatibility. People end up using old versions of Python for years before upgrading, and the release cycle is quite long.

Well, I don't think this is an issue since we only have to ignore posix/ and right/ inside TZPATH.

I was more concerned with the possibility that downstream distributors might rename these directories for whatever reason. Normally I'd say, "Ok, fine, they can just patch their version of Python", but that would mean that people not using the system python (which is my preferred recommendation anyway) would get some weird and different value here because of an unusual zoneinfo configuration.

It's a tough call and I'm tempted to push this feature to Python 3.10.

@pganssle pganssle force-pushed the add_available_timezones branch from bb1e2ac to 40bc5a5 Compare May 13, 2020 17:54
@FFY00
Copy link

FFY00 commented May 13, 2020

I dunno if this is very clear, it could mean any number of things, including, "Only give me canonical zones".

Right. What about this:

def available_timezones(ingore=['posix', 'strict', 'zoneinfo']):
	...

Although I am conflicted by it. It would resolve most of our issues but I am not sure if it's a good api. The problem here is that we want to have a useful default for most people.

Well, as I mention in the documentation, "populating a GUI with the available timezones" is not really a valid use case for this anyway. You'd want to filter the list via CLDR / ICU anyway to get human-readable names. If this is going to be a common use case, maybe it's better to avoid adding the feature at all.

Hum, okay, that makes sense. Although I would like to note here that names like Europe/Lisbon are commonly used for timezones outside this context.

>>> import babel.dates
>>> babel.dates.get_timezone_name(babel.dates.get_timezone('Europe/Lisbon'))
'Western European Time'

If I am prompted the CLDR names, as above, personally, I would have to google my city's timezone to find out the correct name.

So, even though I think it would be great to be able to just use CLDR, I don't think it would be very realistic to expect it from everyone. If I was designing a UI I would probably go for Europe/Lisbon (Western European Time).

I think this is still a valid use-case.

This is destined for the Python standard library (this repo will turn into a backport when I'm done integrating it there), so I'm very reluctant to hard-code things, particularly if they are incidental and not any sort of interface with backwards-compatibility guarantees and cross-platform compatibility. People end up using old versions of Python for years before upgrading, and the release cycle is quite long.

Yeah, I am aware, I got here from bpo. Worst case scenario people could opt out and do their own filtering, but yes, it would be great to avoid any issues.

I was more concerned with the possibility that downstream distributors might rename these directories for whatever reason. Normally I'd say, "Ok, fine, they can just patch their version of Python", but that would mean that people not using the system python (which is my preferred recommendation anyway) would get some weird and different value here because of an unusual zoneinfo configuration.

I don't think anyone is doing this, they may be changing the paths, but renaming the posix and strict folder inside TZPATH is a different thing. Anyway, I have no backing for this.

@pganssle pganssle force-pushed the add_available_timezones branch from 6edc368 to 13d6206 Compare May 17, 2020 16:42
@pganssle pganssle force-pushed the add_available_timezones branch from 41f9ea4 to 4a7806a Compare May 17, 2020 17:53
pganssle added 5 commits May 18, 2020 12:12
The right/ and posix/ directories are not canonical zones (note that
`make zonenames` does not include them), and posixrules is a symlink to
a "local" time zone.

In the reference implementation of the tz database, these are special
names, and it is safe to exclude them by name.
right/ and posix/ should be mirrors of the primary database, with the
same tree of time zones below them as the main directory. Since these
are zones likely to produce edge cases, but available_timezones() now longer
includes them, we can manually add them back in for these purposes.

The posixrules file is deliberately left out, here, because it's
intended to always be a link to another zone.
@pganssle pganssle force-pushed the add_available_timezones branch from 4a7806a to 5b5041f Compare May 18, 2020 16:12
This path is only hit if the test function is used wrong, which
hopefully the tests are not doing.
@pganssle pganssle merged commit eb66f78 into master May 18, 2020
@pganssle pganssle deleted the add_available_timezones branch May 21, 2020 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants