-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the fmt_units()
method
#240
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting all this work in the PR, especially into documenting define units. The tests are looking really great. I left some comments suggesting some ways we might be able to clean up the code a little.
These mostly center on...
- grouping the logic for a variable definition together (into tighter if/elif/else blocks)
- creating a
UnitDefinition.from_token()
method, since the class doesn't seem like it can be instantiated directly.
great_tables/_helpers.py
Outdated
def __getitem__(self, index: int) -> UnitDefinition: | ||
return self.units_list[index] | ||
|
||
def to_html(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this..
- loops over each element
- converts each element to html
- the elementwise conversions are independent
it seems like to_html()
should be a method on the elements. (you could still have a to_html()
method here that does [x.to_hmtl() for x in self.units_list]
etc on the elements)
great_tables/_helpers.py
Outdated
if len(tokens_list) == 0: | ||
return UnitDefinitionList(units_list=[]) | ||
|
||
for i in range(len(tokens_list)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because...
- this loop creates a UnitDefinition object for each token
- UnitDefinition will likely never be instantiated directly (because you have to input token, along with everything else this loop calculates)
It might be good as a constructor on UnitDefinition
?
e.g.
@dataclass
UnitDefinition:
...
@classmethod
def from_token(cls, token: str) -> UnitDefinition:
# logic from the loop here ----
unit_subscript = None
sub_super_overstrike = False
chemical_formula = False
exponent = None
...
return cls(token, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO:
- need to escape
>
and<
- let's rewrite the specification section of define units to cover the rules of the DSL
Unit DSL Rules
From pairing w/ @rich-iannone, here are what seems like the rules of define_units:
# Within unit rules ----
# 1. ^ creates a superscript
# 2. _ creates a subscript
# 3. subscripts and superscripts may be combined
# - however, _ inside a superscript does not create a superscript
# 4. use [_subscript^superscript] to create an overstrike
# 5. / at the beginning adds the superscript -1
# 6. hyphen is transformed to minus sign
# 7. x at the beginning transformed to ×
# 8. ascii terms from biology/chemistry turned into TERM FORM (TODO: enumerate via code)
# 9. can create italics with * or _, and can create bold with ** or __
# - can italicize AND bold together
# - issue: because we use commonmark, a broader set of behaviors occur
# - e.g. **m^2**, "a<marquee>123</marquee>b"
# ISSUE: < and > are unescaped
# Special notations ----
# 10. special symbol set surrounded by colons (e.g. :angstrom:)
# 11. chemistry notation: %C6%
sub_super_overstrike = True | ||
|
||
# Extract the unit w/o subscript from the string | ||
unit = re.sub(r"(.+?)\[_.+?\^.+?\]", r"\1", token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can punt this for a future PR, but it seems like unit, unit_subscript, and exponent could be captured using something like this...
import re
m = re.match(r"(.+?)\[_(.+?)(\^.+?) \]", token)
You can name groups using the ?P<some_name>
syntax:
m = re.match(r"(?P<unit>.+?)\[_(?P<unit_subscript>.+?)(?P<exponent>\^.+?) \]", token)
Often people breaks these up using parentheses:
m = re.match(
(
r"(?P<unit>.+?)"
r"\["
r"_(?P<unit_subscript>.+?)"
r"(?P<exponent>\^.+?)"
r"\]"
),
token
)
#m.groups()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks for taking the time to make all the changes! One quick thing---it might be helpful to add a reference to define_units()
in the docstring of fmt_units()
(but we can always punt to another PR)
@machow I added a reference to |
This PR adds the
fmt_units()
method. This performs a conversion of units in the units notation syntax (e.g.,"x10^9 / L"
or"x10^9 L^-1"
, etc.) to HTML. The method, like others of thefmt_*()
, only concerns itself with text transformations in the table body. Other methods will also gain the ability to convert text in units notation to nicely formatted HTML in later PRs.Here's an example that uses
fmt_units()
with theillness
dataset. It so happens that theunits
column has strings in units notation, so, we just need to point this method to that column:Fixes: #211
Partially addresses the
.epic
issue: #169