Skip to content
This repository has been archived by the owner on Jul 26, 2023. It is now read-only.

Matplotlib tick units should be localised according to country and language #11

Open
carlhiggs opened this issue Apr 14, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@carlhiggs
Copy link
Member

Currently, tick marks for the threshold plots are handled by matplotlib's engineering ticker

# axis formatting
cax.xaxis.set_major_formatter(ticker.EngFormatter())

For numbers in the thousands this has the result of abbreviating units using a 'k' which is generally desirable, at least in English.

However, in Czech the meaning of 'k' is not natural/intuitive; for example, a better option would be "tisíce" (thousand).
image

There is a python library for localising units which we have implemented elsewhere in the code for this project, Babel, which has a format_unit() function that could potentially be used for this. However, I haven't seen examples of its use in the context of matplotlib, or specifically the engineering ticker. It may be beyond scope to address this issue, but ideally, we would deal with internationalisation/localisation of these units as we have elsewhere in the project for translations.

@carlhiggs
Copy link
Member Author

The code for the Matplotlib Engineering ticker is here: https://github.com/matplotlib/matplotlib/blob/v3.5.1/lib/matplotlib/ticker.py#L1311-L1459

It may be do-able to create a modified class with a 'locale' function to implement localised formatted units using Babel...

@carlhiggs
Copy link
Member Author

Perhaps it could be done by mapping between the units used by the Eng ticker, and those present in the CLDR Unit Validity XML file which Babel uses (according to the format_unit() specification, linked above).

@carlhiggs
Copy link
Member Author

carlhiggs commented Apr 14, 2022

Matplotlib Eng ticker seems to essentially be used to scale according to SI metric prefixes representing power to base 10 (from -24 to 24), while Babel is focused on a broader set of units but not necessarily with coverage of all of these powers. Pint, another python library for dealing with units has created a plain language look up of these for the xml-derived terms here) -- and notably there doesn't seem to be a way of dealing with, for example 'septillionth/quadrillionth', being 'y' or 10−24. At least, in a naive test, Babel doesn't recognise those words as being English (not suprising, they aren't in the xml spec):

>>> format_unit(12,'septillionth',locale=locale,length=length)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/env/lib/python3.9/site-packages/babel/units.py", line 119, in format_unit
    raise UnknownUnitError(unit=measurement_unit, locale=locale)
babel.units.UnknownUnitError: septillionth is not a known unit in en
>>> format_unit(12,'quadrillionth',locale=locale,length=length)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/env/lib/python3.9/site-packages/babel/units.py", line 119, in format_unit
    raise UnknownUnitError(unit=measurement_unit, locale=locale)
babel.units.UnknownUnitError: quadrillionth is not a known unit in en

Just entering 'y' doesn't work either -- Babel appears to use a kind of search to find the closest matching unit, and for 'y' it picks 'days'. I think this must be done because the formal term for km is 'length-kilometer' but it matches correctly if you say 'kilometer' (but not kilometre).

Anyway - the short story is

  • The matplotlib Eng ticker works for our current purposes in English, but not necessarily other locales
  • broadly, the Eng ticker is overkill (we are dealing with hundreds and thousands (up to hundreds of thousands)-- so not near -/+ 24 to the power of 10!)
  • perhaps the answer is, drop the broad stroke Eng ticker and have a more custom implementation of Babel to deal specifically with thousands for different locales... if its possible

But come to think of it, I'm not sure that Babel has support for 'thousands'.... I think i missed that in considering the above

@carlhiggs carlhiggs added the enhancement New feature or request label Apr 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant