display summary and breakdown of timing tests #41

iritkatriel · 2021-07-10T23:07:46Z

This change makes the classic version slower. There was something unfair about the previous test when names were repeated between the functions. I think marshal was able to benefit from caching or interning or something which the new method was not.

iritkatriel · 2021-07-10T23:32:20Z

Starting speed test: test_speed_few_locals
Classic load: 0.076
Classic exec #1: 0.024
Classic exec #2: 0.028
Classic exec #3: 0.025
Classic exec #4: 0.025
Classic total: 0.178
PYCO: 0.128
New PYC load: 0.000
New PYC exec #1: 0.080
New PYC exec #2: 0.044
New PYC exec #3: 0.027
New PYC exec #4: 0.027
New PYC total: 0.182
Classic-to-new ratio: 0.98 (new is -2% faster)

Starting speed test: test_speed_few_locals_with_call
Classic load: 0.079
Classic exec #1: 0.201
Classic exec #2: 0.190
Classic exec #3: 0.187
Classic exec #4: 0.200
Classic total: 0.857
PYCO: 0.127
New PYC load: 0.001
New PYC exec #1: 0.418
New PYC exec #2: 0.201
New PYC exec #3: 0.199
New PYC exec #4: 0.202
New PYC total: 1.025
Classic-to-new ratio: 0.84 (new is -16% faster)

Starting speed test: test_speed_many_constants
Classic load: 8.591
Classic exec #1: 0.033
Classic exec #2: 0.026
Classic exec #3: 0.026
Classic exec #4: 0.026
Classic total: 8.703
PYCO: 2.679
New PYC load: 0.001
New PYC exec #1: 0.797
New PYC exec #2: 0.040
New PYC exec #3: 0.030
New PYC exec #4: 0.027
New PYC total: 0.897
Classic-to-new ratio: 9.70 (new is 870% faster)

Starting speed test: test_speed_many_globals
Classic load: 1.814
Classic exec #1: 0.645
Classic exec #2: 0.644
Classic exec #3: 0.645
Classic exec #4: 0.645
Classic total: 4.393
PYCO: 0.362
New PYC load: 0.001
New PYC exec #1: 2.011
New PYC exec #2: 0.793
New PYC exec #3: 0.749
New PYC exec #4: 0.750
New PYC total: 4.307
Classic-to-new ratio: 1.02 (new is 2% faster)

Starting speed test: test_speed_many_locals
Classic load: 10.680
Classic exec #1: 0.027
Classic exec #2: 0.577
Classic exec #3: 0.026
Classic exec #4: 0.026
Classic total: 11.335
PYCO: 1.466
New PYC load: 0.000
New PYC exec #1: 0.897
New PYC exec #2: 0.032
New PYC exec #3: 0.028
New PYC exec #4: 0.028
New PYC total: 0.989
Classic-to-new ratio: 11.47 (new is 1047% faster)

Starting speed test: test_speed_many_locals_with_call
Classic load: 2.495
Classic exec #1: 0.240
Classic exec #2: 0.241
Classic exec #3: 0.240
Classic exec #4: 0.243
Classic total: 3.460
PYCO: 0.357
New PYC load: 0.000
New PYC exec #1: 5.504
New PYC exec #2: 0.322
New PYC exec #3: 0.272
New PYC exec #4: 0.275
New PYC total: 7.680
Classic-to-new ratio: 0.45 (new is -55% faster)
test_new_pyc passed in 1 min 4 sec

gvanrossum · 2021-07-10T23:33:51Z

Lib/test/test_new_pyc.py

@@ -93,7 +93,7 @@ def test_consts(self):
    def do_test_speed(self, body, call=False):
        NUM_FUNCS = 100
        functions = [
-            f"def f{num}(a, b):\n{body}"
+            f"def f{num}(a, b):\n{body}".replace("a", f"a{num}_").replace("b", f"b{num}_")


The replace() call bothers me a bit, since it's so context-free. Maybe we should invest a little in a better way of generating N functions with M lines each following a pattern that's a function of N and M?

Agree. We probably want both tests, unique and not unique, to measure with and without the interning benefit.

gvanrossum · 2021-07-10T23:36:02Z

Are those test results significantly better than before? I find the speed tests pretty noisy (two runs may vary rather wildly).

gvanrossum · 2021-07-10T23:37:17Z

PS. Maybe the 4 consecutive execs don't actually help much, and we should just do the first exec? (Only the first one hydrates anything.)

iritkatriel · 2021-07-10T23:42:59Z

PS. Maybe the 4 consecutive execs don't actually help much, and we should just do the first exec? (Only the first one hydrates anything.)

The second exec is interesting because we want it to be the same in both cases once hydrated.

gvanrossum · 2021-07-10T23:43:43Z

The second exec is interesting because we want it to be the same in both cases once hydrated.

Oh, right. NM then.

iritkatriel · 2021-07-10T23:44:19Z

The results I get seem fairly stable, but it doesn't make sense to me that test_speed_many_locals and slower than test_speed_many_locals_with_call for Classic.

iritkatriel · 2021-07-10T23:49:16Z

To your question - with the new test Classes is significantly slower, so New looks better in comparison. The load+1 exec time for test_speed_many_locals_with_call is 2x slower rather than 5x slower.

iritkatriel · 2021-07-10T23:54:12Z

I just got this for the third time, it's weird:
test_speed_many_locals
Classic load: 11.324

test_speed_many_locals_with_call
Classic load: 2.541

iritkatriel · 2021-07-10T23:55:18Z

ah of course: test_speed_many_locals_with_call has for i in range(100): and test_speed_many_locals has for i in range(300):

iritkatriel · 2021-07-11T00:00:23Z

So if I increase it to 300 in test_speed_many_locals_with_call then I get for the old test:

Starting speed test: test_speed_many_locals_with_call
Classic load: 0.388
Classic exec #1: 1.018
Classic exec #2: 1.011
Classic exec #3: 1.014
Classic exec #4: 1.009
       Classic total: 4.441
PYCO: 0.824
New PYC load: 0.000
New PYC exec #1: 5.552
New PYC exec #2: 1.130
New PYC exec #3: 1.130
New PYC exec #4: 1.199
       New PYC total: 9.039
Classic-to-new ratio: 0.49 (new is -51% faster)

and for the new test:

Starting speed test: test_speed_many_locals_with_call
Classic load: 8.153
Classic exec #1: 1.018
Classic exec #2: 1.022
Classic exec #3: 1.029
Classic exec #4: 1.026
       Classic total: 12.248
PYCO: 1.123
New PYC load: 0.000
New PYC exec #1: 8.639
New PYC exec #2: 1.178
New PYC exec #3: 1.119
New PYC exec #4: 1.052
       New PYC total: 12.057
Classic-to-new ratio: 1.02 (new is 2% faster)

gvanrossum · 2021-07-11T01:16:57Z

Hm, of course this is the problem with micro-benchmarks -- you can use them to prove anything. :-)

We need to think more about this. Maybe we need to produce a little mathematical model to predict the approximate cost of loading a code object using the classic and new approach. Something like a constant K plus some factor times the number of instructions plus another factor times the number of unique constants plus yet another factor times the number of unique names. (Did I forget anything?) Microbenchmarking can give us K and the other factors, assuming the model is not too far off. (We need two models, one for classic marshal and another for the new approach,)

We can then attempt to do a static analysis of a known code corpus (e.g. the top 100 or top 4000 PyPI packages) to estimate the average number of instructions, unique constants, names, etc. This can then give us an approximate cut-off point for the one thing that's harder to estimate: the fraction of functions that are never called. For example if 50% of the functions in the corpus (selected randomly) are called, we'll obtain an X speedup.

We can also try to measure X (or more directly the classic and new times) for the modules that are imported during a typical standard run before the first line of user code is run -- this tells us whether we're doing anything about the "startup is slow" problem.

iritkatriel · 2021-07-11T10:48:08Z

The micro benchmarks at this point are useful to see what exactly is going on. For our worst result (many locals with call with repeated variable names) we know now that the difference is probably the interning that marshal does. So if we implement interning as you suggested here then the story will improve for that case.

Once we're done with this and all the other optimizaitons then I agree a model can help understand the tradeoffs.

gvanrossum · 2021-07-11T17:11:06Z

It's simpler than interning even. There was an important missed case, where a string found via a redirect would be loaded even if the redirect target was already loaded. I have pushed a fix.

…of results at the end

iritkatriel · 2021-07-11T17:41:54Z

Cool, I'm trying it now.

(rebased this and pushed some stuff I did on the tests)

iritkatriel · 2021-07-11T17:43:14Z

This prints for each test a breakdown like this:

Classic-to-new comparison:
>> load+exec ratio: 0.41 (new is -59% faster)
>> steady state ratio: 0.95 (new is -5% faster)

And then a summary in the end:

                               load+exec   steady state
many_locals                        5.823          0.881
many_constants                     1.487          1.301
many_globals                       1.319          0.840
few_locals                         0.946          0.884
few_locals_with_call               0.737          0.959
many_locals_with_call              0.413          0.951

gvanrossum · 2021-07-11T21:35:12Z

Ah, sorry about the typo in ceval.c. Fixed in HEAD.

These summaries are very useful, but I keep getting confused about whether larger ratios mean faster or slower. I propose to invert the ratios (so tn/tc instead of tc/tn) so that we can stick to "smaller numbers are better" (which seems to be the mantra of PyPerformance and pyperf).

I still see a lot of noise. E.g. two consecutive runs (with inverted ratios):

                               load+exec   steady state
many_locals_with_call              3.106          1.128
few_locals_with_call               1.935          0.804
few_locals                         1.232          0.898
many_globals                       0.885          0.900
many_constants                     0.726          0.887
many_locals                        0.181          1.239

                               load+exec   steady state
many_locals_with_call              3.179          1.133
few_locals_with_call               1.985          0.883
few_locals                         1.099          1.067
many_constants                     0.761          0.839
many_globals                       0.751          0.987
many_locals                        0.159          1.085

But the trend is still clear: steady state is sometimes faster, sometimes slower, three tests are clearly faster (the many constants, globals, locals) and the tests with calls are still dramatically slower.

If I comment out the hydration of local variable names (as shown before), many_locals_with_calls load+exec time goes down to 2.5, and few_locals_with_call to 1.6. I don't think interning will help (since in the faster version names aren't loaded at all).

gvanrossum

Looks good!

iritkatriel · 2021-07-11T23:27:05Z

almost.. need to flip the direction of the prose.

iritkatriel · 2021-07-11T23:31:32Z

It's either what I just committed, or it needs to be "New is x% slower" (which sounds very negative) or "classic is X% faster" (which sounds like the wrong focus).

iritkatriel · 2021-07-11T23:45:15Z

No, that was wrong, let me revert the last commit.

gvanrossum · 2021-07-12T15:28:36Z

Thanks again!

iritkatriel mentioned this pull request Jul 10, 2021

co_consts #40

Merged

gvanrossum reviewed Jul 10, 2021

View reviewed changes

iritkatriel marked this pull request as draft July 11, 2021 00:05

iritkatriel added 2 commits July 11, 2021 18:38

report load+exec separately from steady-state timing. Report summary …

92520a7

…of results at the end

call gc before timing test

312b9c4

iritkatriel force-pushed the fix-test branch from f6fda7b to 1747291 Compare July 11, 2021 17:41

iritkatriel force-pushed the fix-test branch from 1747291 to 312b9c4 Compare July 11, 2021 22:05

iritkatriel marked this pull request as ready for review July 11, 2021 22:05

tc/tn --> tn/tc

1f49ed8

iritkatriel changed the title ~~make test functions use unique variable names~~ display summary and breakdown of timing tests Jul 11, 2021

gvanrossum approved these changes Jul 11, 2021

View reviewed changes

iritkatriel force-pushed the fix-test branch from 781bbdd to 1f49ed8 Compare July 11, 2021 23:49

tn/tc --> back to tc/tn in verbal description

cf9541a

gvanrossum mentioned this pull request Jul 12, 2021

Refactor timing tests #42

Merged

gvanrossum merged commit 89e264d into gvanrossum:pyco Jul 12, 2021

iritkatriel deleted the fix-test branch August 2, 2021 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

display summary and breakdown of timing tests #41

display summary and breakdown of timing tests #41

iritkatriel commented Jul 10, 2021

iritkatriel commented Jul 10, 2021

gvanrossum Jul 10, 2021

iritkatriel Jul 10, 2021

gvanrossum commented Jul 10, 2021

gvanrossum commented Jul 10, 2021

iritkatriel commented Jul 10, 2021

gvanrossum commented Jul 10, 2021

iritkatriel commented Jul 10, 2021

iritkatriel commented Jul 10, 2021

iritkatriel commented Jul 10, 2021

iritkatriel commented Jul 10, 2021

iritkatriel commented Jul 11, 2021 •

edited

Loading

gvanrossum commented Jul 11, 2021

iritkatriel commented Jul 11, 2021

gvanrossum commented Jul 11, 2021

iritkatriel commented Jul 11, 2021

iritkatriel commented Jul 11, 2021 •

edited

Loading

gvanrossum commented Jul 11, 2021

gvanrossum left a comment

iritkatriel commented Jul 11, 2021

iritkatriel commented Jul 11, 2021

iritkatriel commented Jul 11, 2021

gvanrossum commented Jul 12, 2021

display summary and breakdown of timing tests #41

display summary and breakdown of timing tests #41

Conversation

iritkatriel commented Jul 10, 2021

iritkatriel commented Jul 10, 2021

gvanrossum Jul 10, 2021

Choose a reason for hiding this comment

iritkatriel Jul 10, 2021

Choose a reason for hiding this comment

gvanrossum commented Jul 10, 2021

gvanrossum commented Jul 10, 2021

iritkatriel commented Jul 10, 2021

gvanrossum commented Jul 10, 2021

iritkatriel commented Jul 10, 2021

iritkatriel commented Jul 10, 2021

iritkatriel commented Jul 10, 2021

iritkatriel commented Jul 10, 2021

iritkatriel commented Jul 11, 2021 • edited Loading

gvanrossum commented Jul 11, 2021

iritkatriel commented Jul 11, 2021

gvanrossum commented Jul 11, 2021

iritkatriel commented Jul 11, 2021

iritkatriel commented Jul 11, 2021 • edited Loading

gvanrossum commented Jul 11, 2021

gvanrossum left a comment

Choose a reason for hiding this comment

iritkatriel commented Jul 11, 2021

iritkatriel commented Jul 11, 2021

iritkatriel commented Jul 11, 2021

gvanrossum commented Jul 12, 2021

iritkatriel commented Jul 11, 2021 •

edited

Loading

iritkatriel commented Jul 11, 2021 •

edited

Loading