Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(formatValue): fix the ordering of +/- symbols when formatting currency values #1376

Merged
merged 29 commits into from
May 23, 2022

Conversation

ikesau
Copy link
Member

@ikesau ikesau commented Apr 12, 2022

Fixes #1330

I rewrote the formatValue function to make it more linear and easier to reason about. It now relies much more closely on d3's formatting method:

  1. Instantiate a formatter function with a locale and appropriate currency symbol
  2. Create a specifier
  3. Format the value with the specifier
  4. Postprocess the formatted string to add percentage information and abbreviation, if necessary

Unit tests have also been added.

@ikesau ikesau requested a review from danielgavrilov April 12, 2022 00:33
@danielgavrilov
Copy link
Contributor

@ikesau We have a "SVG diff / svgTester" that runs on PRs, and for this one it found 64 differences. You can click on "Details" in the test run above, and then visit the link of the branch containing the SVGs, and then and then open the latest commit there.

One category of differences is that the $ prefix unit seems to be getting dropped in some cases:

Screenshot 2022-04-12 at 16 07 35

And some of the differences are exactly the intended fix – putting the - in front of the currency.

For this type of change I'd maybe review 30 of the charts to see if any unintended changes are introduced. You can select the "rich diff" tab in the GitHub to make it easier to find differences (although it's a bit slow to do this in the GitHub views unfortunately...)

Will do a code review now, but just wanted to give some background how it works.

Copy link
Contributor

@danielgavrilov danielgavrilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one seems more tricky to fix than I expected because of the recursive calls. I left some comments, but didn't think of ideas for a potentially good fix...

clientUtils/formatValue.test.ts Outdated Show resolved Hide resolved
clientUtils/formatValue.ts Outdated Show resolved Hide resolved
@ikesau ikesau requested a review from danielgavrilov April 22, 2022 19:01
@danielgavrilov
Copy link
Contributor

danielgavrilov commented Apr 25, 2022

Just trying to document the output differences:

  • $- becomes -$

  • .5 gets rounded up instead of down (chart)

  • 🔶 additional number prefixes

    magnitude long short
    10¹⁵ quadrillion P
    10¹⁸ quintillion E
    10²¹ sextillion Z
    10²⁴ septillion Y
    Screenshot 2022-04-25 at 12 40 09
  • 🛑 Negative values with number prefixes are getting inconsistent decimal places? (chart)

Screenshot 2022-04-25 at 12 27 41

  • 🛑 Additional decimal places shown when number is below the minimum representable value for the given decimal places. e.g. for 2 decimal places, minimum value is 0.01. If a number is lower than that, it used to be formatted as <0.01. Now it seems there are cases where it's formatted with more decimal places than that?

    Screenshot 2022-04-25 at 12 36 41

@danielgavrilov
Copy link
Contributor

danielgavrilov commented Apr 25, 2022

🔶 additional number prefixes

I would say few people know what these mean, and we shouldn't go any higher than trillion? Especially the short number prefixes, which I had never seen before.

There is still the issue that in production currently we have some large values like 10 million trillion, which make little sense formatted that way, but we can keep this issue out of scope for now. I think with the new implementation that will be formatted as 10,000,000 trillion? Which seems better.

(The ideal fix is to switch to scientific notation (#1171) for large magnitudes, but definitely out of scope here.)

🛑 Negative values with number prefixes are getting inconsistent decimal places?

🛑 Additional decimal places shown when number is below the minimum representable value for the given decimal places.

I would say these two are bugs that need fixing? 😬

Copy link
Contributor

@danielgavrilov danielgavrilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is much neater! I hope this task didn't take a huge hit on motivation. 😬 (especially since I have a few more comments... 👀)

There are two things I'm wondering about:

  • Should we discard short number abbreviations if the unit is %? (chart)

    I think we don't do it for long abbreviations, e.g. we don't output 1 million % but 1,000,000%, so we also probably shouldn't output 1k%.

    Screenshot 2022-04-27 at 11 36 56
  • We are getting an additional decimal place in some cases in the COVID data explorer sidebar:

    Screenshot 2022-04-27 at 11 45 16

    Is it likely related to the unit being % and short abbreviations enabled? Since it's potentially an explorer issue, let me know if you need help debugging!

clientUtils/formatValue.ts Outdated Show resolved Hide resolved
Comment on lines 92 to 111
short: {
k: "k",
M: "M",
G: "B",
T: "T",
P: "P",
E: "E",
Z: "Z",
Y: "Y",
},
long: {
k: "k",
M: " million",
G: " billion",
T: " trillion",
P: " quadrillion",
E: " quintillion",
Z: " sextillion",
Y: " septillion",
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove all after trillion (T)? I don't think they're widely understood.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, might this be more complicated since D3 is outputting them?

Could we tell D3 to disable SI prefixes if a value is quadrillion or higher, or are there cases where it could output 0.9P?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit tricky unfortunately. I agree, no one really knows these prefixes (hence remapping G to B) and there was work done in a d3-format PR to introduce this functionality natively, but I don't think there's any out-of-the-box way to set a threshold on abbreviation.

Some options we have:

  • Switch to exponential notation after trillion (also handling cases where 9.5 trillion would get rounded to 1 quadrillion if precision is set to 1)
  • Re-implement the logic to get it to output 1,234T (possibly quite hard to do with this flow)
  • Change the prefixes from P to kT or quad or thousand trillion or whatever.

Thoughts?

Copy link
Contributor

@danielgavrilov danielgavrilov Apr 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I don't think we should be switching to exponential notation in this PR. As we discussed privately, when using exponential notation all values should be in exponential notation, not just the large ones. Also, the default formatting, e.g. 5e3 is problematic, not many people understand that it means 5 × 10^3 (we could rewrite that, but it's another potential can of worms).
  • Yes the ideal fix is probably that it outputs 1,234 trillion as it currently does.
  • thousand trillion is worse than what we have now (which is no double prefixes until we reach million trillion, which in itself is a bug we wanted to fix when eventually there's an important enough use case)

This issue makes me wonder if we want to rely on a 3rd party library to this degree. It's possible there will be another feature in the future that is harder to implement because we have to circumvent d3-format, and maybe it would force us to re-implement a subset of d3-format ourselves? What do you think?

We could also maybe fork d3-format.

Copy link
Member Author

@ikesau ikesau Apr 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts:

Agreed on deferring the larger task of identifying when to use scientific notation across an entire graph. Though it won't be hard to update the exponent notation in postprocessString - output.replace("e", " x 10^") would do the trick.

I don't think 1,234 million trillion is really any better than 1.234 sextillion, though yes, people can at least understand 1,000,000T unlike 1E.

These are such rare cases though, and afaik, we don't use short abbreviation for them.

So, given we deal with numbers of this magnitude so rarely (supercomputer-power-flops, ai-training-computation, biomass-vs-abundance-taxa) I think that leaving the long abbreviations as is, and changing 1E to 1Quint is the best way forward, and we can fix it for good when we handle those numbers properly which is with scientific notation.

Also, without diving into their code, I don't think we should fork d3-format and fundamentally change how it works. My intuition is that that would require far more effort than any other option. Working with the postprocessString function will surely be easier than forking and rewriting a d3 project, even for future formatting features.

clientUtils/formatValue.ts Outdated Show resolved Hide resolved
@ikesau ikesau force-pushed the 1330-format-value branch from 32524e7 to 852a630 Compare April 27, 2022 15:41
@marcelgerber marcelgerber self-requested a review May 17, 2022 09:30
Copy link
Member

@marcelgerber marcelgerber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ike!
This is all great work, and it's interesting how this small issue escalated into a complete rewrite.

However, I am unsure if relying on d3 so much is (still) the right way to go here. We need to do a lot of preprocessing and postprocessing to get out the result we want, and it seems that d3 isn't doing all that much work.
Also, the format specifiers they use are cryptic anyway, and understanding what we need to pass to d3 to get out what is a hard ask.

For these reasons, I feel like we could self-roll our formatValue function instead - how hard can it be? 😆
We got good test coverage now (thank you!), and we can also make use of the well-supported Intl.NumberFormat, which gives us nice things like number grouping, { signDisplay: "exceptZero" } and support for significant digits (if we want to go that way).

Let me know what you think, and feel free to discuss if anything's unclear!

["negative with unit", -1, "-$1", { unit: "$" }],
["trailingZeroes true", 1.10, "1.1", { trailingZeroes: false }],
["trailingZeroes false", 1.10, "1.10", { trailingZeroes: true }],
["$ spaceBeforeUnit false", 1.1, "$1.1", { spaceBeforeUnit: false, unit: "$" }],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test case with { unit: "£" }, because I'm currently unsure if that is handled correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think no chart uses £, I tried searching for one a while ago. We could and maybe should just drop it, it's such an odd one to have (why not have too?)

@ikesau
Copy link
Member Author

ikesau commented May 18, 2022

Oof, oh no! I wasn't aware of the NumberFormat class. I probably would have gone with that if we were going from scratch, but I guess I still wasn't thinking big enough this time 😓

I've definitely come to understand the ins and outs of d3-format, but yeah obviously it's pretty oblique interface. Let me play around with NumberFormat a little bit and see how it handles before making a decision.

@ikesau
Copy link
Member Author

ikesau commented May 18, 2022

Okay, so I made a Intl.NumberFormat demo

It's basically the same thing. An object that you configure with a bunch of settings and then format with.

Some differences:

  • Its prefixes only go up to trillion, and only in short form (100K, 1T, 1000T, etc)
  • Its rounding, decimal place, and SF logic is all a bit more complex, but can still be configured for our use-case.

So, we could rewrite it to ditch d3-format (though it will still be in our bundle because d3-scale requires it) but I don't think the formatValue function will become any more readable. If we use NumberFormat, we'll still have a function that generates a config and postprocesses the output.

@danielgavrilov
Copy link
Contributor

danielgavrilov commented May 23, 2022

I am also worried about increasing reliance on d3-format. It makes us less able to make improvements to the implementation – the fact that we can't find an easy way around not outputting "sextillion" is a clear example of this, and I think there will probably be a few more cases in the future since we are very particular about value formatting.

If fixing the "sextillion" issue means having to re-implement d3-format functionality ourselves, it makes "rolling our own" not such a bad option? Especially since the old 90-line function already did this, relying very little on d3-format, it can't be that complex? 😅 Curious what you think @ikesau!

Copy link
Contributor

@danielgavrilov danielgavrilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, we have a thread of a discussion here around architecture that I started and isn't too productive. I think it would be best to move that discussion elsewhere.

This PR fixes a bug, improves readability significantly, adds comprehensive tests, and in case we find our requirements change in the future in such a way that our custom implementation is better, we can always bring back some of the old implementation.

For now, I think it would be best to merge and deploy as it is. 🚀

Copy link
Member

@marcelgerber marcelgerber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, let's merge this monster of a PR. It's a big improvement, and as gavrilov says we can still revisit this in the future.

@ikesau The big green Merge button is all yours :shipit:

@ikesau ikesau merged commit 34756c4 into master May 23, 2022
@ikesau ikesau deleted the 1330-format-value branch May 23, 2022 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

formatValue should include +/- before unit prefix, e.g. +$123 instead of $+123
3 participants