-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-122288: Improve performances of fnmatch.translate
#122289
Conversation
@barneygale Friendly ping in case you forgot you were interested in this PR. |
Could you add some timings to the PR description please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice improvement :)
Misc/NEWS.d/next/Library/2024-07-25-18-06-51.gh-issue-122288.-_xxOR.rst
Outdated
Show resolved
Hide resolved
Small optimization possibility - 1% for 1 pattern. Not sure about the others. Maybe slightly more readable too. REP1 = repeat('\\')
REP2 = repeat(r'\\')
REP3 = repeat('-')
REP4 = repeat(r'\-')
_replace = str.replace
...
stuff = map(_replace, chunks, REP1, REP2)
stuff = map(_replace, stuff, REP3, REP4)
stuff = '-'.join(stuff) And cache size. Not sure what it should be, are any rules for such? Apart from this, LGTM. |
I kept the same cache size. For |
My thinking is as follows. With 32K size, the max size of this application (with assumption that average string size is 20 characters) is: In [39]: sys.getsizeof('a' * 20) * 2 * 32000 / 1024**2
Out[39]: 3.7 # MB It is not much for standard machine, but is Python used on some micro platforms where that might be big? I have little experience with such and 4MB might not be an issue at all, but I would say it is worth looking into it. I think the easiest would be to find other use cases of Otherwise, if you can't find anything maybe someone else has any insights? As a last resort, could make a conservative choice. My python starts with 5MB initial memory consumption. Take 5% of it, then, 2048 size would be that. |
For microcontrollers, there is MicroPython and only a subset of Python is actually available. 4MB can be quite big but I don't think they would implement LRU cache that size. |
In this case, 32K seems fine to me. Just follow the standard and if there are any issues in the future, these can be handled as a pair. |
@barneygale This one is the (only) remaining fnmatch PR that I decided to keep. I'd appreciate your opinion on the size of the cache for |
a205e6b
to
0518912
Compare
The rationale for this change is as follows: re.escape() is only used to cache single Unicode characters in shell patterns; we may heuristically assume that they are ISO-8859-1 encodable, thereby requiring a cache of size 256. To allow non-traditional glyphs (or alphabets with a small number of common glyphs), we double the cache size.
0518912
to
bb6c3ee
Compare
@barneygale friendly ping in case you forgot about this PR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work :)
Co-authored-by: Barney Gale <[email protected]>
I'm committing from my phone so hopefully it'll be fine. Otherwise I'll have a look at my return next week! |
@barneygale I've also updated the tests just to remember what the indices were so it should be ready to merge now (you can take a last look since I've stared waaaay to much at this PR). |
Friendly ping for this PR @barneygale Maybe you want to merge it soon? |
Oh I see! no worries (now I know why you didn't automerge it). |
Sorry again for taking ages, thanks for the improvement :) |
…122289) Improve performance of this function by a factor of 1.7x. Co-authored-by: Barney Gale <[email protected]>
This is a smaller PR compared to the one for the C implementation and is probably easier to review. Below are the benchmarks reported on the issue:
fnmatch.translate
#122288