-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defer string formatting in asyncio Task creation #103793
Comments
I am surprised that string formatting is so slow to make a difference here. If we are implementing this optimization then I suggest to use a tagged pointer and a union for task name and counter combined into a single word with LSB set for tagging. |
This is the way it was originally implemented in Cinder :) |
so far we have 3 different implementation approaches that provide similar perf benefits:
@kumaraditya303 iiuc you prefer option 2, for the benefit of not changing the order of task number allocation. |
Option 2 seems needlessly complicated for something as relatively unimportant as task names. Is there an expectation that the numbers in task names increase over time? Option 4 is to assign the number at task creation and compute the name whenever it is requested. |
How does that work for user provided task names? |
I would love to go with option 1 :) I think option 4 can ~work if we do something like this:
if we go with this, we can go ahead and cache the formatted name back to the name field, which makes it similar to option 2 but using type checks instead of tagged pointers. |
I like @markshannon's option 4 with the your suggestions @itamaro. I think Option 1's Con makes it less attractive to me. |
Looking again, I now agree that the order in which task numbers are assigned may feel meaningful to the user, and only assigning the number when the task is being printed could cause some problems. E.g. the user has several tasks, and no matter in which order they print them, the first one printed is always named "Task-1" -- this could be very confusing. So this is another vote for 3 or 4. (I would be fine with 3. From the words Mark used when he proposed 4, I assume he missed that sometimes the user passes in a name which should override this mechanism.) |
Thanks for the feedback! |
The default task name is "Task-<counter>" (if no name is passed in during Task creation). This is initialized in `Task.__init__` (C impl) using string formatting, which can be quite slow. Actually using the task name in real world code is not very common, so this is wasted init. Let's defer this string formatting to the first time the name is read (in `get_name` impl), so we don't need to pay the string formatting cost if the task name is never read. We don't change the order in which tasks are assigned numbers (if they are) -- the number is set on task creation, as a PyLong instead of a formatted string. Co-authored-by: Łukasz Langa <[email protected]>
Thanks for this improvement! (Would still like to see the perf numbers now that we allocate a PyLong during init.) |
* main: (26 commits) pythongh-104028: Reduce object creation while calling callback function from gc (pythongh-104030) pythongh-104036: Fix direct invocation of test_typing (python#104037) pythongh-102213: Optimize the performance of `__getattr__` (pythonGH-103761) pythongh-103895: Improve how invalid `Exception.__notes__` are displayed (python#103897) Adjust expression from `==` to `!=` in alignment with the meaning of the paragraph. (pythonGH-104021) pythongh-88496: Fix IDLE test hang on macOS (python#104025) Improve int test coverage (python#104024) pythongh-88773: Added teleport method to Turtle library (python#103974) pythongh-104015: Fix direct invocation of `test_dataclasses` (python#104017) pythongh-104012: Ensure test_calendar.CalendarTestCase.test_deprecation_warning consistently passes (python#104014) pythongh-103977: compile re expressions in platform.py only if required (python#103981) pythongh-98003: Inline call frames for CALL_FUNCTION_EX (pythonGH-98004) Replace Netlify with Read the Docs build previews (python#103843) Update name in acknowledgements and add mailmap (python#103696) pythongh-82054: allow test runner to split test_asyncio to execute in parallel by sharding. (python#103927) Remove non-existing tools from Sundry skiplist (python#103991) pythongh-103793: Defer formatting task name (python#103767) pythongh-87092: change assembler to use instruction sequence instead of CFG (python#103933) pythongh-103636: issue warning for deprecated calendar constants (python#103833) Various small fixes to dis docs (python#103923) ...
Feature or enhancement
When an asyncio Task is created without passing in a
name
(a common case), the init method uses a global counter to generate a task name of the form"Task-<counter>"
.It is very common in applications that the task name is never read or used, and string formatting has non-negligible runtime cost, making task creation slower. It would be beneficial to defer the string formatting operation and avoid incurring that overhead during task creation.
This can be done by storing the counter in the task struct, and using it to lazily populate the name in the
get_name
method.Linked PRs
The text was updated successfully, but these errors were encountered: