Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert old string substitutions to f-strings in term.py #2864

Merged
merged 4 commits into from
Jul 30, 2024

Conversation

ashleysommer
Copy link
Contributor

@ashleysommer ashleysommer commented Jul 30, 2024

I thought this work had already been done around 18 months ago.
Or maybe there was and old PR that does it? I'm not sure.

As-of Python 3.8, f-strings are feature-comparable with str.format(), and have even eventually surpassed the raw performance of old-style "%-substitution" strings from the Python2 days.

This PR does the follow:

  • Replace outdated % substitution strings in term.py
  • Uses concatenated pairs where appropriate to join strings
  • Improves the performance when stringifying URIRef, BNode, and Literal nodes
  • replace other outdated patterns in term.py with some modern equivalents for better performance and compatibility.

Some micro-benchmarking to show the kind of differences:

import timeit
# joining two string variables
t0 =timeit.timeit('a+b', setup='a,b = "<test>", "abcd13"*1000')
print(t0)
t1 =timeit.timeit('"%s%s" % (a, b)', setup='a,b = "<test>", "abcd13"*1000')
print(t1)
t2 = timeit.timeit('f"{a}{b}"', setup='a,b = "<test>", "abcd13"*1000')
print(t2)
t3 = timeit.timeit('f"<test>{b}"', setup='b = "abcd13"*1000')
print(t3)
# joining three strings
t0 =timeit.timeit('a+b+c', setup='a,b,c = "<test>", "abcd13"*1000, "</test>"')
print(t0)
t1 =timeit.timeit('"%s%s%s" % (a, b, c)', setup='a,b,c = "<test>", "abcd13"*1000, "<test>"')
print(t1)
t2 = timeit.timeit('f"{a}{b}{c}"', setup='a,b,c = "<test>", "abcd13"*1000, "</test>"')
print(t2)
# Wrapping a variable with prefix and suffix
t3 = timeit.timeit('f"<test>{b}</test>"', setup='b = "abcd13"*1000')
print(t3)

results

# two string variables
0.08020864403806627 # Concat with + is fastest for 2 (and only 2) values
0.08790699497330934
0.08184163994155824
0.08165007410570979
# joining three strings
0.15257811499759555
0.09559847600758076
0.08691176993306726 # f-string concat is fastest for unknown 3 variables.
0.08586340001784265 # Inline f-string substitution is fastest for wrapped variables.

…mance when stringifying URIRef, BNode, and Literal nodes, also replace other outdated patterns in term.py with some modern equivelents for better performance and compatibility.
@ashleysommer
Copy link
Contributor Author

Additionally, one private internal function is removed from term.py, that is _serial_number_generator() for the BNode constructor that was used as a fallback on for when the user doesn't pass their own Generator. Now we simply call uuid4().hex instead of using this custom generator function.

Removing this speeds up creation of BNodes (especially when many BNodes need to be created) because it removes one layer of indirection and one additional python method call.

The ability to pass your own generator function if you need a different sn_gen on BNode is still available.

@ashleysommer
Copy link
Contributor Author

Notice there is an odd behaviour in the tests. The previous version had the assertion that Literal Substitution of Literal(Decimal(1.2121214312312)) - Literal(Decimal(1.0)) equals Literal(Decimal(0.212121)), that seems wrong because it loses 8 decimal points of precision.
After these string formatting changes, in term.py the tests needed updating so the result is Literal(Decimal(0.2121214312312)). That must be a difference in the default precision of float-formatting between %-substitution and f-string.

…dn't work as a BNode prefix generator, even though it should have. This was revieled by a newly failing test after the change to better string concatenation in the BNode constructor.
@ashleysommer
Copy link
Contributor Author

One last change.
Fixed a very old bug where a Generator Function (a yield function) didn't work as a BNode prefix generator, even though it is documented to work and there are tests for it. This was relieved by a newly failing test after the change to better string concatenation in the BNode constructor. The old code was concealing this by serialzing the generator fn itself, instead of the output of the generator.

@coveralls
Copy link

Coverage Status

coverage: 90.627% (+0.001%) from 90.626%
when pulling 56ccb7a on fstrings_term
into cb2c8d1 on main.

Copy link
Contributor

@edmondchuc edmondchuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one. Speed improvements, bug fixes and readability!

@ashleysommer ashleysommer merged commit 5c6d942 into main Jul 30, 2024
22 checks passed
@ashleysommer ashleysommer deleted the fstrings_term branch July 30, 2024 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants