Add the `fmt_units()` method #240

rich-iannone · 2024-03-12T15:13:08Z

This PR adds the fmt_units() method. This performs a conversion of units in the units notation syntax (e.g., "x10^9 / L" or "x10^9 L^-1", etc.) to HTML. The method, like others of the fmt_*(), only concerns itself with text transformations in the table body. Other methods will also gain the ability to convert text in units notation to nicely formatted HTML in later PRs.

Here's an example that uses fmt_units() with the illness dataset. It so happens that the units column has strings in units notation, so, we just need to point this method to that column:

from great_tables import GT, style, loc
from great_tables.data import illness

(
    GT(illness, rowname_col="test")
    .fmt_units(columns="units")
    .fmt_number(columns=lambda x: x.startswith("day"), decimals=2, drop_trailing_zeros=True)
    .tab_header(title="Laboratory Findings for the YF Patient")
    .tab_spanner(label="Day", columns=lambda x: x.startswith("day"))
    .tab_spanner(label="Normal Range", columns=lambda x: x.startswith("norm"))
    .cols_label(
      norm_l="Lower",
      norm_u="Upper",
      units="Units"
    )
    .opt_vertical_padding(scale=0.4)
    .opt_align_table_header(align="left")
    .tab_options(heading_padding="10px")
    .tab_style(
        locations=loc.body(columns="norm_l"),
        style=style.borders(sides="left")
    )
    .opt_vertical_padding(scale=0.5)
)

Fixes: #211
Partially addresses the .epic issue: #169

machow

Thanks for putting all this work in the PR, especially into documenting define units. The tests are looking really great. I left some comments suggesting some ways we might be able to clean up the code a little.

These mostly center on...

grouping the logic for a variable definition together (into tighter if/elif/else blocks)
creating a UnitDefinition.from_token() method, since the class doesn't seem like it can be instantiated directly.

machow · 2024-05-29T14:47:58Z

great_tables/_helpers.py

+    def __getitem__(self, index: int) -> UnitDefinition:
+        return self.units_list[index]
+
+    def to_html(self) -> str:


since this..

loops over each element

converts each element to html

the elementwise conversions are independent

it seems like to_html() should be a method on the elements. (you could still have a to_html() method here that does [x.to_hmtl() for x in self.units_list] etc on the elements)

great_tables/_helpers.py

machow · 2024-05-29T15:49:44Z

great_tables/_helpers.py

+    if len(tokens_list) == 0:
+        return UnitDefinitionList(units_list=[])
+
+    for i in range(len(tokens_list)):


Because...

this loop creates a UnitDefinition object for each token

UnitDefinition will likely never be instantiated directly (because you have to input token, along with everything else this loop calculates)

It might be good as a constructor on UnitDefinition?

e.g.

@dataclass UnitDefinition: ... @classmethod def from_token(cls, token: str) -> UnitDefinition: # logic from the loop here ---- unit_subscript = None sub_super_overstrike = False chemical_formula = False exponent = None ... return cls(token, ...)

Now implemented.

machow

TODO:

need to escape > and <
let's rewrite the specification section of define units to cover the rules of the DSL

Unit DSL Rules

From pairing w/ @rich-iannone, here are what seems like the rules of define_units:

# Within unit rules ----
# 1. ^ creates a superscript
# 2. _ creates a subscript
# 3. subscripts and superscripts may be combined
#   - however, _ inside a superscript does not create a superscript

# 4. use [_subscript^superscript] to create an overstrike

# 5. / at the beginning adds the superscript -1
# 6. hyphen is transformed to minus sign
# 7. x at the beginning transformed to ×
# 8. ascii terms from biology/chemistry turned into TERM FORM (TODO: enumerate via code)

# 9. can create italics with * or _, and can create bold with ** or __
#   - can italicize AND bold together
#   - issue: because we use commonmark, a broader set of behaviors occur
#   - e.g. **m^2**, "a<marquee>123</marquee>b"

# ISSUE: < and > are unescaped

# Special notations ----
# 10. special symbol set surrounded by colons (e.g. :angstrom:)
# 11. chemistry notation: %C6%

machow · 2024-06-03T17:38:56Z

great_tables/_helpers.py

+            sub_super_overstrike = True
+
+            # Extract the unit w/o subscript from the string
+            unit = re.sub(r"(.+?)\[_.+?\^.+?\]", r"\1", token)


We can punt this for a future PR, but it seems like unit, unit_subscript, and exponent could be captured using something like this...

import re m = re.match(r"(.+?)\[_(.+?)(\^.+?) \]", token)

You can name groups using the ?P<some_name> syntax:

m = re.match(r"(?P<unit>.+?)\[_(?P<unit_subscript>.+?)(?P<exponent>\^.+?) \]", token)

Often people breaks these up using parentheses:

m = re.match( ( r"(?P<unit>.+?)" r"\[" r"_(?P<unit_subscript>.+?)" r"(?P<exponent>\^.+?)" r"\]" ), token ) #m.groups()

machow

This looks great, thanks for taking the time to make all the changes! One quick thing---it might be helpful to add a reference to define_units() in the docstring of fmt_units() (but we can always punt to another PR)

rich-iannone · 2024-06-03T21:30:28Z

@machow I added a reference to define_units() in the See Also section of the fmt_units() docstring.

rich-iannone added 19 commits March 9, 2024 17:19

Add the generate_tokens_list() util fn

364a112

Define a class to store a single unit definition

b75982f

Rename util function (prepend w/ _)

feda9c0

Import dataclass in order to define one

d3427c2

Define constructor as a data class

2fa1c3f

Add the UnitDefinitionList class

1566466

Add util functions to transform (sub|super)scripts

e436aad

Add the _replace_units_symbol() util fn

cfbda80

Add .to_html() method

452aa42

Add the _units_symbol_replacements() util fn

eb2aafb

Add the fmt_units() method

7430812

Add the built var in UnitDefinition

d5819b4

Remove use of pd.na in to_html() method

d59771a

Add docs for the fmt_units() method

286863b

Simplify _units_symbol_replacements()

1c3f90f

Make correction to documentation

60d2540

Add example for the fmt_units() method

d15c584

Make correction to example

2d727f2

Update _utils_units_notation.py

14e2f8b

github-actions bot temporarily deployed to pr-240 March 12, 2024 15:56 Destroyed

github-actions bot temporarily deployed to pr-240 March 12, 2024 15:57 Destroyed

Add tests for fmt_units()

0904ff8

github-actions bot temporarily deployed to pr-240 March 13, 2024 14:39 Destroyed

Merge branch 'main' into fmt-units

1eed161

github-actions bot temporarily deployed to pr-240 March 13, 2024 18:46 Destroyed

Add several tests of util fns

54bda69

github-actions bot temporarily deployed to pr-240 March 13, 2024 21:03 Destroyed

Add more tests of units not'n util fns

02de345

github-actions bot temporarily deployed to pr-240 March 13, 2024 21:37 Destroyed

Update example in fmt_units() docs

b13e43e

github-actions bot temporarily deployed to pr-240 May 24, 2024 16:32 Destroyed

Add explanatory text to define_units()

ff272d6

github-actions bot temporarily deployed to pr-240 May 24, 2024 16:51 Destroyed

Reorganize test_fmt_units()

c32b226

github-actions bot temporarily deployed to pr-240 May 24, 2024 19:15 Destroyed

machow requested changes May 29, 2024

View reviewed changes

Merge branch 'main' into fmt-units

e7c9022

github-actions bot temporarily deployed to pr-240 May 30, 2024 12:42 Destroyed

Improve comments in UnitDefinitionList cls

f775dc5

github-actions bot temporarily deployed to pr-240 May 30, 2024 14:37 Destroyed

Add the from_token class method

4b83c48

github-actions bot temporarily deployed to pr-240 May 30, 2024 14:48 Destroyed

Refactor to_html() method; add missing line-height attrs

5cf6062

github-actions bot temporarily deployed to pr-240 May 31, 2024 19:23 Destroyed

Update comments based on code review

3d4a1f9

github-actions bot temporarily deployed to pr-240 June 3, 2024 17:54 Destroyed

refactor: wire up UnitDefinition.from_token, .to_html methods

ac0ba7f

machow requested changes Jun 3, 2024

View reviewed changes

github-actions bot temporarily deployed to pr-240 June 3, 2024 19:19 Destroyed

Use improved definition of rules in example table

30add11

github-actions bot temporarily deployed to pr-240 June 3, 2024 20:04 Destroyed

Ensure < and > inputs are escaped on HTML output

ab0b399

github-actions bot temporarily deployed to pr-240 June 3, 2024 20:26 Destroyed

rich-iannone requested a review from machow June 3, 2024 20:29

machow approved these changes Jun 3, 2024

View reviewed changes

Add reference to the define_units() fn

55d2fab

github-actions bot deployed to pr-240 June 3, 2024 21:33 View deployment

rich-iannone merged commit 02a04ee into main Jun 4, 2024
13 checks passed

rich-iannone deleted the fmt-units branch June 4, 2024 01:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the `fmt_units()` method #240

Add the `fmt_units()` method #240

rich-iannone commented Mar 12, 2024 •

edited

Loading

machow left a comment

machow May 29, 2024

machow May 29, 2024

rich-iannone May 30, 2024

machow left a comment

machow Jun 3, 2024

machow left a comment

rich-iannone commented Jun 3, 2024

Add the fmt_units() method #240

Add the fmt_units() method #240

Conversation

rich-iannone commented Mar 12, 2024 • edited Loading

machow left a comment

Choose a reason for hiding this comment

machow May 29, 2024

Choose a reason for hiding this comment

machow May 29, 2024

Choose a reason for hiding this comment

rich-iannone May 30, 2024

Choose a reason for hiding this comment

machow left a comment

Choose a reason for hiding this comment

Unit DSL Rules

machow Jun 3, 2024

Choose a reason for hiding this comment

machow left a comment

Choose a reason for hiding this comment

rich-iannone commented Jun 3, 2024

Add the `fmt_units()` method #240

Add the `fmt_units()` method #240

rich-iannone commented Mar 12, 2024 •

edited

Loading