Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance tracker: Backtracking Parser #2762

Open
isidentical opened this issue Jan 11, 2022 · 6 comments
Open

Performance tracker: Backtracking Parser #2762

isidentical opened this issue Jan 11, 2022 · 6 comments
Labels
C: performance Black is too slow. Or too fast. T: enhancement New feature or request

Comments

@isidentical
Copy link
Collaborator

The previous PRs successfully optimized the backtracking in the factor of 4X. It might make some sense to profile and see whether there are still other potential bottlenecks.

@isidentical isidentical added the T: enhancement New feature or request label Jan 11, 2022
@ichard26 ichard26 added the C: performance Black is too slow. Or too fast. label Jan 11, 2022
@tanvimoharir
Copy link
Contributor

Hi,
I would like to work on this issue.
I have committed to this repository in the past but I would need some guidance to get started.
Thanks

@isidentical
Copy link
Collaborator Author

You can benchmark the backtracking parser (with passing -t py310) and run the profiler to see the hot spots, if there is anything too obvious that might be the target of the optimization. Here is an example
#2763 (comment)

@tanvimoharir
Copy link
Contributor

Hi @isidentical,
Perhaps I'm missing some context, I was assuming that I need to run this with cprofile which is what I did:

bash-3.2$ python3 -m cProfile -s tottime -m src.blib2to3.pgen2.parse
         26469 function calls (25215 primitive calls) in 0.132 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       23    0.046    0.002    0.046    0.002 {method 'read' of '_io.BufferedReader' objects}
       10    0.034    0.003    0.034    0.003 {built-in method _imp.create_dynamic}
       23    0.005    0.000    0.005    0.000 {built-in method marshal.loads}
      157    0.005    0.000    0.005    0.000 {built-in method posix.stat}
    100/6    0.003    0.000    0.008    0.001 sre_parse.py:493(_parse)
       23    0.002    0.000    0.002    0.000 {built-in method io.open_code}
       53    0.002    0.000    0.005    0.000 {built-in method builtins.__build_class__}
       79    0.002    0.000    0.008    0.000 <frozen importlib._bootstrap_external>:1505(find_spec)
    194/6    0.001    0.000    0.003    0.000 sre_compile.py:71(_compile)
     1320    0.001    0.000    0.001    0.000 sre_parse.py:164(__getitem__)

However looking at your comment I'm guessing this isn't what is expected.
I looked at the related comment from #2763 (comment)
Are you using pyperf module? (I have started going through pyperf docs)
I don't understand how to pass -t py310 or to be precise, is py310 a benchmark that should be created?
Please let me know what I' missing.
Thanks again for helping me, appreciate it.

@JelleZijlstra
Copy link
Collaborator

@tanvimoharir You're on the right track with using cProfile, but that benchmark merely imports the module, which isn't very interesting. What you'd want to benchmark instead is a run where Black formats some code that is heavy in use of match.

-t py310 is an option to Black that runs Black in Python 3.10 mode.

@tanvimoharir
Copy link
Contributor

tanvimoharir commented Jan 20, 2022

@tanvimoharir You're on the right track with using cProfile, but that benchmark merely imports the module, which isn't very interesting. What you'd want to benchmark instead is a run where Black formats some code that is heavy in use of match.

-t py310 is an option to Black that runs Black in Python 3.10 mode.

I tried running with -t py310 on a sample file with some match scenarios:

bash-3.2$ python3 -m cProfile -s tottime -m src.black -t py310 --diff --color cat.py
--- cat.py	2022-01-20 12:49:44.841140 +0000
+++ cat.py	2022-01-20 12:49:49.045038 +0000
would reformat cat.py

All done! ✨ 🍰 ✨
1 file would be reformatted.
         211084 function calls (194385 primitive calls) in 0.942 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      142    0.351    0.002    0.351    0.002 {method 'read' of '_io.BufferedReader' objects}
       25    0.124    0.005    0.131    0.005 {built-in method _imp.create_dynamic}
       14    0.035    0.002    0.035    0.002 {built-in method posix.listdir}
      140    0.034    0.000    0.034    0.000 {built-in method marshal.loads}
      783    0.028    0.000    0.029    0.000 {built-in method posix.stat}
      140    0.017    0.000    0.017    0.000 {built-in method io.open_code}
  398/397    0.014    0.000    0.046    0.000 {built-in method builtins.__build_class__}
   353/67    0.013    0.000    0.035    0.001 sre_parse.py:493(_parse)
    183/1    0.012    0.000    0.943    0.943 {built-in method builtins.exec}
      337    0.008    0.000    0.083    0.000 <frozen importlib._bootstrap_external>:1527(find_spec)
      488    0.007    0.000    0.017    0.000 parse.py:287(_addtoken)
23222/20938    0.007    0.000    0.008    0.000 {built-in method builtins.isinstance}
   620/63    0.006    0.000    0.015    0.000 sre_compile.py:71(_compile)

I will try to add more code and try formatting it.

@tanvimoharir
Copy link
Contributor

I inflated the sample file with more match scenarios (got some code from the test data) with some valid and some invalid scenarios:

bash-3.2$ python3 -m cProfile -s tottime -m src.black -t py310 --diff --color cat.py
error: cannot format cat.py: cannot use --safe with this file; failed to parse source file: expected ':' (<unknown>, line 1888)

Oh no! 💥 💔 💥
1 file would fail to reformat.
         2200056 function calls (2013593 primitive calls) in 6.717 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    16446    3.770    0.000    3.912    0.000 driver.py:92(__next__)
      142    0.357    0.003    0.357    0.003 {method 'read' of '_io.BufferedReader' objects}
    18943    0.282    0.000    0.696    0.000 parse.py:287(_addtoken)
       25    0.114    0.005    0.121    0.005 {built-in method _imp.create_dynamic}
    78767    0.100    0.000    0.205    0.000 parse.py:380(pop)
    88795    0.091    0.000    0.103    0.000 parse.py:368(push)
        1    0.091    0.091    4.931    4.931 driver.py:126(parse_tokens)
    16447    0.083    0.000    0.133    0.000 tokenize.py:409(generate_tokens)
    83079    0.077    0.000    0.154    0.000 pytree.py:477(convert)
25502/2127    0.076    0.000    0.661    0.000 linegen.py:72(visit_default)
    13565    0.063    0.000    0.294    0.000 lines.py:48(append)
    13138    0.050    0.000    0.099    0.000 brackets.py:68(mark)
32154/2127    0.047    0.000    0.662    0.000 nodes.py:159(visit)
16311/15057    0.044    0.000    0.895    0.000 parse.py:239(addtoken)
    14271    0.041    0.000    0.041    0.000 {method 'match' of 're.Pattern' objects}
       14    0.035    0.002    0.035    0.002 {built-in method posix.listdir}
      140    0.034    0.000    0.034    0.000 {built-in method marshal.loads}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: performance Black is too slow. Or too fast. T: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants