-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-103793: Defer formatting task name #103767
Conversation
The default task name is "Task-<counter>" (if no name is passed in during Task creation). This is initialized in `Task.__init__` (C impl) using string formatting, which can be quite slow. Actually using the task name in real world code is not very common, so this is wasted init. Let's defer this string formatting to the first time the name is read (in `get_name` impl), so we don't need to pay the string formatting cost if the task name is never read.
db3b8a6
to
dbfe832
Compare
dbfe832
to
74d0084
Compare
Strictly speaking, it's probably not necessary to store the task number in Not storing the name on construction saves memory and will likely make everything a tiny bit faster. |
…ng the next counter at lazy name generation
…on into defer-task-name-formatting
that's clever! I like it :) made this change to the PR. CI is still running, but at least on my local test run it didn't break any tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts.
Lib/asyncio/tasks.py
Outdated
self._name = f'Task-{_task_name_counter()}' | ||
# optimization: defer task name formatting to first get_name | ||
self._name = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to optimize the .py version (which is normally not run since there is a C accelerator version)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we definitely don't need to optimize the python version. I assumed we wanted to maintain parity between the versions, but maybe it doesn't matter to this extent (or my assumption was wrong and parity is not a goal).
I will revert the changes to the python version.
No, tasks numbers are assigned as they are allocated, its a little debugging aid and should not be changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my suggestion on issue.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
I have made the requested changes; please review again Apologies about the review request spam - I don't know why GitHub decided to do that 🤔 (and I don't have permissions to clean up the reviewers list) |
Thanks for making the requested changes! @kumaraditya303: please review the changes made to this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, nice speed win
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but did you re-run the benchmark, now that we have a PyLong object creation?
And did you run the buildbots with the latest version to check for leaks?
We implemented something even simpler than a tagged pointer.
🤖 New build scheduled with the buildbot fleet by @gvanrossum for commit 592d44b 🤖 If you want to schedule another build, you need to add the 🔨 test-with-refleak-buildbots label again. |
(Dismissing Kumar's review for him, he told me he's busy for the next few weeks. Setting the refleaks-buildbots label, those are sufficient for me.) |
I did rerun the benchmarks, it was 5-7% faster on the async benchmarks. I'll rerun again on main vs the parent commit and share the full report. |
Thanks! (And just because I work on weekends doesn’t mean I expect you to.) |
pyperformance async benchmarks 4-7% faster
microbenchmark ~10% faster
with this PR:
on parent commit:
|
Excellent—thanks! |
* main: (26 commits) pythongh-104028: Reduce object creation while calling callback function from gc (pythongh-104030) pythongh-104036: Fix direct invocation of test_typing (python#104037) pythongh-102213: Optimize the performance of `__getattr__` (pythonGH-103761) pythongh-103895: Improve how invalid `Exception.__notes__` are displayed (python#103897) Adjust expression from `==` to `!=` in alignment with the meaning of the paragraph. (pythonGH-104021) pythongh-88496: Fix IDLE test hang on macOS (python#104025) Improve int test coverage (python#104024) pythongh-88773: Added teleport method to Turtle library (python#103974) pythongh-104015: Fix direct invocation of `test_dataclasses` (python#104017) pythongh-104012: Ensure test_calendar.CalendarTestCase.test_deprecation_warning consistently passes (python#104014) pythongh-103977: compile re expressions in platform.py only if required (python#103981) pythongh-98003: Inline call frames for CALL_FUNCTION_EX (pythonGH-98004) Replace Netlify with Read the Docs build previews (python#103843) Update name in acknowledgements and add mailmap (python#103696) pythongh-82054: allow test runner to split test_asyncio to execute in parallel by sharding. (python#103927) Remove non-existing tools from Sundry skiplist (python#103991) pythongh-103793: Defer formatting task name (python#103767) pythongh-87092: change assembler to use instruction sequence instead of CFG (python#103933) pythongh-103636: issue warning for deprecated calendar constants (python#103833) Various small fixes to dis docs (python#103923) ...
The default task name is
Task-<counter>
(if no name is passed in during Task creation). This is initialized inTask.__init__
(C impl) using string formatting, which can be quite slow. Actually using the task name in real world code is not very common, so this is wasted init.Let's defer this string formatting to the first time the name is read (in
get_name
impl), so we don't need to pay the string formatting cost if the task name is never read.Fixes gh-103793
performance analysis
in a full run of pyperformance, this is 1.00x faster overall, and up to 5% faster on async benchmarks.
full report in this gist.