Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-117431: Improve performance of startswith and endswith #117782

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Apr 11, 2024

Improve performance of startswith and endswith by eliminating double work in tailmatch.

Benchmark results:

single-character match: x.startswith('a'): Mean +- std dev: [main_startswith] 80.6 ns +- 0.7 ns -> [v2_startswith] 69.8 ns +- 0.8 ns: 1.15x faster
single-character fail: x.startswith('q'): Mean +- std dev: [main_startswith] 67.8 ns +- 0.6 ns -> [v2_startswith] 68.4 ns +- 0.7 ns: 1.01x slower
two-character match: x.startswith('ab'): Mean +- std dev: [main_startswith] 81.3 ns +- 2.1 ns -> [v2_startswith] 75.9 ns +- 0.6 ns: 1.07x faster
two-character fail head: x.startswith('qb'): Mean +- std dev: [main_startswith] 68.0 ns +- 1.1 ns -> [v2_startswith] 76.0 ns +- 0.9 ns: 1.12x slower
two-character fail tail: x.startswith('aq'): Mean +- std dev: [main_startswith] 73.0 ns +- 3.3 ns -> [v2_startswith] 68.5 ns +- 0.9 ns: 1.07x faster
multi-character match: x.startswith('abcdefghijkl'): Mean +- std dev: [main_startswith] 84.9 ns +- 1.1 ns -> [v2_startswith] 78.7 ns +- 2.3 ns: 1.08x faster
multi-character fail midle: x.startswith('abcdef_hijkl'): Mean +- std dev: [main_startswith] 85.3 ns +- 2.0 ns -> [v2_startswith] 78.6 ns +- 0.8 ns: 1.09x faster

Benchmark hidden because not significant (3): empty: x.startswith(''), multi-character different kind match: xu.startswith('abcdefghijkl'), multi-character different kind fail: xu.startswith('abcdef_hijkl')

Geometric mean: 1.03x faster
Benchmark script
import pyperf
runner = pyperf.Runner()

setup="""
x = 'abcdefghijklmnop'
y = 'abcdefghijklmnop_bbbbbbbbbbbbbbbbbb'

xu = x + '\u1234'
yu = y + '\u1234'
l = 'a' * 1000 + 'b'
x_startswith = x.startswith
"""

# Tested with ./python sw.py --rigorous -o main_startswith.json

if 1:
    runner.timeit(name="empty: x.startswith('')", stmt="x.startswith(''); y.startswith('')", setup=setup)
    runner.timeit(name="single-character match: x.startswith('a')", stmt="x.startswith('a'); y.startswith('a')", setup=setup)
    runner.timeit(name="single-character fail: x.startswith('q')", stmt="x.startswith('q'); y.startswith('q')", setup=setup)
    
    runner.timeit(name="two-character match: x.startswith('ab')", stmt="x.startswith('ab'); y.startswith('ab')", setup=setup)
    runner.timeit(name="two-character fail head: x.startswith('qb')", stmt="x.startswith('qb'); y.startswith('qb')", setup=setup)
    runner.timeit(name="two-character fail tail: x.startswith('aq')", stmt="x.startswith('aq'); y.startswith('aq')", setup=setup)
    runner.timeit(name="multi-character match: x.startswith('abcdefghijkl')", stmt="x.startswith('abcdefghijkl'); y.startswith('abcdefghijkl')", setup=setup)
    runner.timeit(name="multi-character fail midle: x.startswith('abcdef_hijkl')", stmt="x.startswith('abcdef_hijkl'); y.startswith('abcdef_hijkl')", setup=setup)

    runner.timeit(name="multi-character different kind match: xu.startswith('abcdefghijkl')", stmt="xu.startswith('abcdefghijkl'); yu.startswith('abcdefghijkl')", setup=setup)
    runner.timeit(name="multi-character different kind fail: xu.startswith('abcdef_hijkl')", stmt="xu.startswith('abcdefghijkl'); yu.startswith('abcdefghijkl')", setup=setup)
  • By first checking the tail of the substring and then the start, we can combine a call to PyUnicode_READ and memcmp
  • For the single character case we prevent the check with PyUnicode_READ from happening twice.
  • With this PR the performance of many cases improves, in particular the case where substrings match . The only case where performance is less, is for substrings that fail to match on the start of the substring (but the performance for substrings that fail on the end of the substring improves).
  • Also see gh-117431: Optimize str.startswith #117480

@eendebakpt eendebakpt changed the title gh-117480: Improve performance of startswith and endswith (version 2) gh-117431: Improve performance of startswith and endswith (version 2) Apr 11, 2024
@erlend-aasland erlend-aasland added the performance Performance or resource usage label Apr 11, 2024
Copy link
Contributor

@erlend-aasland erlend-aasland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick-and-dirty initial review.

Objects/unicodeobject.c Outdated Show resolved Hide resolved
Objects/unicodeobject.c Show resolved Hide resolved
Objects/unicodeobject.c Outdated Show resolved Hide resolved
@erlend-aasland erlend-aasland changed the title gh-117431: Improve performance of startswith and endswith (version 2) gh-117431: Improve performance of startswith and endswith May 21, 2024
@erlend-aasland
Copy link
Contributor

Do you still see the "two-character fail head" slowdown?

@eendebakpt
Copy link
Contributor Author

Do you still see the "two-character fail head" slowdown?

Yes, and this is to be expected. For the two-character case we can either

i) Check the first character and then the last (e.g. the second character). This is the current implementation in main
ii) Check the last character and then the first. This happens in this PR.

The result is that main is faster for "abc".startswith("xb"), but this PR is faster for "abc".startswith("ax"). From an application point of view I am not sure whether one case is more important than the other. In the PR we trade performance for these two cases, but gain performance for several other (e.g. matching single character, matching multi character).

I have put some though into whether we could keep this PR more conservative (e.g. no cases with performance loss, but several cases with performance gain). The best I can create so far is main...eendebakpt:tailmatch_v3. There are some more branches and the memcmp does a bit more work than required, but just like this PR it should improve matching the single-character and two-character case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants