-
-
Notifications
You must be signed in to change notification settings - Fork 31.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sys.settrace suffers quadratic behavior for large dictionary literals on 3.12+ #127953
Comments
I bisected CPython and it is perhaps no surprise that the problem started with:
/cc @markshannon |
This is a known issue, which I tried to alleviate in #118127. The problem is getting the line number - with the new mechanism, if a function(module) is super long, calculating the line number would be difficult. We can solve this for good, but we need extra run-time memory (plenty of those) to remember the line number. Considering how rare this case is, it might not be worth it to make the trade off. |
Do you have a suggestion for the person who cannot move up from 3.11? |
Is ruling out that specific file a possible solution? If coverage.py does not access I know this is not a perfect solution, maybe there is a way to solve this in a better way. I did not manage to do it without too much sacrifice on memory but that might just because of my limited knowledge of the topic. Anyone should feel free to try it out - I think I explained the reason of this slow down in the PR mentioned above. |
Dup of #107674 (which I also wrote, oops!) |
(Continuing here for a moment...) If I change the trace_it.py code to return None, then 11.out and 12.out both are very short:
and
but 3.12 still takes a long time to complete. Are we calculating the line number even when we don't need to call the trace function? This is a simulation of omitting the large file from coverage.py, which doesn't solve the problem. |
BTW, the same is true of tracing very long functions: even returning None from the trace function, 3.12 takes much longer than 3.11. The effect isn't as pronounced as with a dict literal. |
That's because the optimization is not backported (it's not a bug fix unfortunately). It should be different on 3.13 - and we can't really do anything about it. |
And yes, as I mentioned in the comment above, it's not about the literal, it's about line number. Any function/module-level code that's really long will trigger the issue. Calculating the line number of the current byte code is an O(N) operation after 3.12. |
Thanks, 3.13 does go quickly when not tracing. Can we talk more about ways to prevent the problem when tracing? Can you point me to the code so I can try to understand the underlying issue? Would increased memory use be a problem? Wouldn't it be a small fraction of the code's memory footprint? |
The fundamental thing of this issue is, you need a mapping from an instruction to a line number. If you take a look at The original optimization, is to calculate it once (we need to do it anyway for instrumentation), and store the data somewhere. First of all, we don't want to impact the size of memory when there's no instrumentation, so it does not live with CodeObject directly. For each instrumented line of code, we have a However, this estimation would be off when there are too many lines in a code object, where we would fall back to the One way to sovle this, is to increase the In the best case, it would be 1 byte per line with instrumentation, which probably results in kb level run-time memory cost. I don't think that's unacceptable, but the benefit is also insignificant - it helps with the tracing speed when the function is more than a few hundred lines, which is rare (and not recommended). Probably the only real life scenario, is the coverage tool for a constant mapping. Notice that debugger will not trigger the issue after the optimization I made (because it does not access line number for every line event). So like I mentioned above, I just don't know if this is worth the effort. And of course, if we decide to fix it, the fix would be trivial. It's more about whether we should do it. I can add @markshannon to the discussion about the issue. |
Playing with ways to map instructions to line numbers, I figured that it seemed cheaper (and faster to build) to store a tuple of size N-instructions where the value at each index is the line number. So retrieval should be # Instruction -> line number:
instr_to_line = {1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3, 7: 4}
# (line_number, line_number, ...):
as_tuple = (1, 1, 2, 2, 2, 3, 4) Edit: I found a gist where I recorded this idea: https://gist.github.com/devdanzin/591394b798b38a332ec3ecdef0f6f082 |
This actually costs more than what I mentioned above (by mapping I did not mean a hashmap, simply a math-level mapping, a one-on-one relation). The method I mentioned above costs 1 byte per line, the tuple solution takes a Python integer per bytecode, which is probably 100x larger on average :) |
This is still an issue. |
Possible solutions:
Of the above, 1 is this simplest but still has the same fundamental flaw. 2 is possibly the most complex, but will speed up all line number lookups, not just those for I think the deciding factor in favor of 3, is that solution 2 cannot be backported. |
…of the code object (pythonGH-128350)
Bug report
Bug description:
Using a trace function with sys.settrace, performance suffers when importing a large dictionary literal. The performance exhibits quadratic time complexity on 3.12 and up. It is fine on 3.11.
This originated with a coverage.py bug report: nedbat/coveragepy#1906
To reproduce, create make_mapping.py:
and trace_it.py:
Then, create the large mapping, and run trace_it.py on multiple versions, and look at the timings:
These are the results I see. The parenthetical numbers are nanoseconds since the last output line:
CPython versions tested on:
3.11, 3.12, 3.13, 3.14
Operating systems tested on:
macOS
Linked PRs
The text was updated successfully, but these errors were encountered: