-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
display summary and breakdown of timing tests #41
Conversation
Starting speed test: test_speed_few_locals Starting speed test: test_speed_few_locals_with_call Starting speed test: test_speed_many_constants Starting speed test: test_speed_many_globals Starting speed test: test_speed_many_locals Starting speed test: test_speed_many_locals_with_call |
Lib/test/test_new_pyc.py
Outdated
@@ -93,7 +93,7 @@ def test_consts(self): | |||
def do_test_speed(self, body, call=False): | |||
NUM_FUNCS = 100 | |||
functions = [ | |||
f"def f{num}(a, b):\n{body}" | |||
f"def f{num}(a, b):\n{body}".replace("a", f"a{num}_").replace("b", f"b{num}_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The replace() call bothers me a bit, since it's so context-free. Maybe we should invest a little in a better way of generating N functions with M lines each following a pattern that's a function of N and M?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. We probably want both tests, unique and not unique, to measure with and without the interning benefit.
Are those test results significantly better than before? I find the speed tests pretty noisy (two runs may vary rather wildly). |
PS. Maybe the 4 consecutive execs don't actually help much, and we should just do the first exec? (Only the first one hydrates anything.) |
The second exec is interesting because we want it to be the same in both cases once hydrated. |
Oh, right. NM then. |
The results I get seem fairly stable, but it doesn't make sense to me that test_speed_many_locals and slower than test_speed_many_locals_with_call for Classic. |
To your question - with the new test Classes is significantly slower, so New looks better in comparison. The load+1 exec time for test_speed_many_locals_with_call is 2x slower rather than 5x slower. |
I just got this for the third time, it's weird: test_speed_many_locals_with_call |
ah of course: test_speed_many_locals_with_call has |
So if I increase it to 300 in test_speed_many_locals_with_call then I get for the old test:
and for the new test:
|
Hm, of course this is the problem with micro-benchmarks -- you can use them to prove anything. :-) We need to think more about this. Maybe we need to produce a little mathematical model to predict the approximate cost of loading a code object using the classic and new approach. Something like a constant K plus some factor times the number of instructions plus another factor times the number of unique constants plus yet another factor times the number of unique names. (Did I forget anything?) Microbenchmarking can give us K and the other factors, assuming the model is not too far off. (We need two models, one for classic marshal and another for the new approach,) We can then attempt to do a static analysis of a known code corpus (e.g. the top 100 or top 4000 PyPI packages) to estimate the average number of instructions, unique constants, names, etc. This can then give us an approximate cut-off point for the one thing that's harder to estimate: the fraction of functions that are never called. For example if 50% of the functions in the corpus (selected randomly) are called, we'll obtain an X speedup. We can also try to measure X (or more directly the classic and new times) for the modules that are imported during a typical standard run before the first line of user code is run -- this tells us whether we're doing anything about the "startup is slow" problem. |
The micro benchmarks at this point are useful to see what exactly is going on. For our worst result (many locals with call with repeated variable names) we know now that the difference is probably the interning that marshal does. So if we implement interning as you suggested here then the story will improve for that case. Once we're done with this and all the other optimizaitons then I agree a model can help understand the tradeoffs. |
It's simpler than interning even. There was an important missed case, where a string found via a redirect would be loaded even if the redirect target was already loaded. I have pushed a fix. |
Cool, I'm trying it now. (rebased this and pushed some stuff I did on the tests) |
This prints for each test a breakdown like this:
And then a summary in the end:
|
Ah, sorry about the typo in ceval.c. Fixed in HEAD. These summaries are very useful, but I keep getting confused about whether larger ratios mean faster or slower. I propose to invert the ratios (so tn/tc instead of tc/tn) so that we can stick to "smaller numbers are better" (which seems to be the mantra of PyPerformance and pyperf). I still see a lot of noise. E.g. two consecutive runs (with inverted ratios):
But the trend is still clear: steady state is sometimes faster, sometimes slower, three tests are clearly faster (the many constants, globals, locals) and the tests with calls are still dramatically slower. If I comment out the hydration of local variable names (as shown before), many_locals_with_calls load+exec time goes down to 2.5, and few_locals_with_call to 1.6. I don't think interning will help (since in the faster version names aren't loaded at all). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
almost.. need to flip the direction of the prose. |
It's either what I just committed, or it needs to be "New is x% slower" (which sounds very negative) or "classic is X% faster" (which sounds like the wrong focus). |
No, that was wrong, let me revert the last commit. |
Thanks again! |
This change makes the classic version slower. There was something unfair about the previous test when names were repeated between the functions. I think marshal was able to benefit from caching or interning or something which the new method was not.