Performance tracker: Backtracking Parser #2762

isidentical · 2022-01-11T17:17:56Z

The previous PRs successfully optimized the backtracking in the factor of 4X. It might make some sense to profile and see whether there are still other potential bottlenecks.

tanvimoharir · 2022-01-16T14:16:48Z

Hi,
I would like to work on this issue.
I have committed to this repository in the past but I would need some guidance to get started.
Thanks

isidentical · 2022-01-16T15:53:39Z

You can benchmark the backtracking parser (with passing -t py310) and run the profiler to see the hot spots, if there is anything too obvious that might be the target of the optimization. Here is an example
#2763 (comment)

tanvimoharir · 2022-01-17T13:00:53Z

Hi @isidentical,
Perhaps I'm missing some context, I was assuming that I need to run this with cprofile which is what I did:

bash-3.2$ python3 -m cProfile -s tottime -m src.blib2to3.pgen2.parse
         26469 function calls (25215 primitive calls) in 0.132 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       23    0.046    0.002    0.046    0.002 {method 'read' of '_io.BufferedReader' objects}
       10    0.034    0.003    0.034    0.003 {built-in method _imp.create_dynamic}
       23    0.005    0.000    0.005    0.000 {built-in method marshal.loads}
      157    0.005    0.000    0.005    0.000 {built-in method posix.stat}
    100/6    0.003    0.000    0.008    0.001 sre_parse.py:493(_parse)
       23    0.002    0.000    0.002    0.000 {built-in method io.open_code}
       53    0.002    0.000    0.005    0.000 {built-in method builtins.__build_class__}
       79    0.002    0.000    0.008    0.000 <frozen importlib._bootstrap_external>:1505(find_spec)
    194/6    0.001    0.000    0.003    0.000 sre_compile.py:71(_compile)
     1320    0.001    0.000    0.001    0.000 sre_parse.py:164(__getitem__)

However looking at your comment I'm guessing this isn't what is expected.
I looked at the related comment from #2763 (comment)
Are you using pyperf module? (I have started going through pyperf docs)
I don't understand how to pass -t py310 or to be precise, is py310 a benchmark that should be created?
Please let me know what I' missing.
Thanks again for helping me, appreciate it.

JelleZijlstra · 2022-01-17T15:23:58Z

@tanvimoharir You're on the right track with using cProfile, but that benchmark merely imports the module, which isn't very interesting. What you'd want to benchmark instead is a run where Black formats some code that is heavy in use of match.

-t py310 is an option to Black that runs Black in Python 3.10 mode.

tanvimoharir · 2022-01-20T12:55:03Z

@tanvimoharir You're on the right track with using cProfile, but that benchmark merely imports the module, which isn't very interesting. What you'd want to benchmark instead is a run where Black formats some code that is heavy in use of match.

-t py310 is an option to Black that runs Black in Python 3.10 mode.

I tried running with -t py310 on a sample file with some match scenarios:

bash-3.2$ python3 -m cProfile -s tottime -m src.black -t py310 --diff --color cat.py
--- cat.py	2022-01-20 12:49:44.841140 +0000
+++ cat.py	2022-01-20 12:49:49.045038 +0000
would reformat cat.py

All done! ✨ 🍰 ✨
1 file would be reformatted.
         211084 function calls (194385 primitive calls) in 0.942 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      142    0.351    0.002    0.351    0.002 {method 'read' of '_io.BufferedReader' objects}
       25    0.124    0.005    0.131    0.005 {built-in method _imp.create_dynamic}
       14    0.035    0.002    0.035    0.002 {built-in method posix.listdir}
      140    0.034    0.000    0.034    0.000 {built-in method marshal.loads}
      783    0.028    0.000    0.029    0.000 {built-in method posix.stat}
      140    0.017    0.000    0.017    0.000 {built-in method io.open_code}
  398/397    0.014    0.000    0.046    0.000 {built-in method builtins.__build_class__}
   353/67    0.013    0.000    0.035    0.001 sre_parse.py:493(_parse)
    183/1    0.012    0.000    0.943    0.943 {built-in method builtins.exec}
      337    0.008    0.000    0.083    0.000 <frozen importlib._bootstrap_external>:1527(find_spec)
      488    0.007    0.000    0.017    0.000 parse.py:287(_addtoken)
23222/20938    0.007    0.000    0.008    0.000 {built-in method builtins.isinstance}
   620/63    0.006    0.000    0.015    0.000 sre_compile.py:71(_compile)

I will try to add more code and try formatting it.

tanvimoharir · 2022-01-20T15:25:59Z

I inflated the sample file with more match scenarios (got some code from the test data) with some valid and some invalid scenarios:

bash-3.2$ python3 -m cProfile -s tottime -m src.black -t py310 --diff --color cat.py
error: cannot format cat.py: cannot use --safe with this file; failed to parse source file: expected ':' (<unknown>, line 1888)

Oh no! 💥 💔 💥
1 file would fail to reformat.
         2200056 function calls (2013593 primitive calls) in 6.717 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    16446    3.770    0.000    3.912    0.000 driver.py:92(__next__)
      142    0.357    0.003    0.357    0.003 {method 'read' of '_io.BufferedReader' objects}
    18943    0.282    0.000    0.696    0.000 parse.py:287(_addtoken)
       25    0.114    0.005    0.121    0.005 {built-in method _imp.create_dynamic}
    78767    0.100    0.000    0.205    0.000 parse.py:380(pop)
    88795    0.091    0.000    0.103    0.000 parse.py:368(push)
        1    0.091    0.091    4.931    4.931 driver.py:126(parse_tokens)
    16447    0.083    0.000    0.133    0.000 tokenize.py:409(generate_tokens)
    83079    0.077    0.000    0.154    0.000 pytree.py:477(convert)
25502/2127    0.076    0.000    0.661    0.000 linegen.py:72(visit_default)
    13565    0.063    0.000    0.294    0.000 lines.py:48(append)
    13138    0.050    0.000    0.099    0.000 brackets.py:68(mark)
32154/2127    0.047    0.000    0.662    0.000 nodes.py:159(visit)
16311/15057    0.044    0.000    0.895    0.000 parse.py:239(addtoken)
    14271    0.041    0.000    0.041    0.000 {method 'match' of 're.Pattern' objects}
       14    0.035    0.002    0.035    0.002 {built-in method posix.listdir}
      140    0.034    0.000    0.034    0.000 {built-in method marshal.loads}

isidentical added the T: enhancement New feature or request label Jan 11, 2022

ichard26 added the C: performance Black is too slow. Or too fast. label Jan 11, 2022

isidentical mentioned this issue Jan 11, 2022

Don't make redundant copies of the DFA #2763

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance tracker: Backtracking Parser #2762

Performance tracker: Backtracking Parser #2762

isidentical commented Jan 11, 2022

tanvimoharir commented Jan 16, 2022

isidentical commented Jan 16, 2022

tanvimoharir commented Jan 17, 2022

JelleZijlstra commented Jan 17, 2022

tanvimoharir commented Jan 20, 2022 •

edited

Loading

tanvimoharir commented Jan 20, 2022

Performance tracker: Backtracking Parser #2762

Performance tracker: Backtracking Parser #2762

Comments

isidentical commented Jan 11, 2022

tanvimoharir commented Jan 16, 2022

isidentical commented Jan 16, 2022

tanvimoharir commented Jan 17, 2022

JelleZijlstra commented Jan 17, 2022

tanvimoharir commented Jan 20, 2022 • edited Loading

tanvimoharir commented Jan 20, 2022

tanvimoharir commented Jan 20, 2022 •

edited

Loading