Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Benchmarks / Integration tests #1954

Open
brycepg opened this issue Mar 22, 2018 · 7 comments
Open

Performance Benchmarks / Integration tests #1954

brycepg opened this issue Mar 22, 2018 · 7 comments
Labels
Enhancement ✨ Improvement to a component Needs PR This issue is accepted, sufficiently specified and now needs an implementation performance primer

Comments

@brycepg
Copy link
Contributor

brycepg commented Mar 22, 2018

Pylint needs a collection of performance integration tests (see pylint-dev/astroid#519) help further optimization efforts.

They don't need to be run as part of the standard CI or with just calling pytest since they will be slow, but we should have them to make sure optimizations efforts aren't over optimizing specific projects and so we know which pull requests will slow down astroid/pylint.

I'm thinking that we should pick a specific tag of a specific project, and then set up pytest-benchmark to setup by downloading/extracting the correct zip and running pylint against those files on a developer's machine, and timing how long it takes for pylint to execute and if pylint raises any internal errors.

These tests additionally might be used to look for consistency in output and changes in pylint output messages.

@brycepg brycepg self-assigned this Mar 22, 2018
@brycepg
Copy link
Contributor Author

brycepg commented Mar 23, 2018

  • yapf
  • balloob/home-assistant
  • dbcli/pgcli
  • requests
  • pandas

I did a google bigquery for github repos that have a pylintrc file. I'll probably use these as a starting point for integration tests. Here's the top 25:

_stars_stars _stars_repo_name
11301 balloob/home-assistant
7059 google/yapf
6694 dbcli/pgcli
6380 wifiphisher/wifiphisher
3635 NervanaSystems/neon
3499 google/seq2seq
3225 numenta/nupic
2796 lektor/lektor
2549 SirVer/ultisnips
2179 edx/edx-platform
2073 JonnyWong16/plexpy
1982 UnivaqTelegramBot/UnivaqInformaticaBot
1982 giacomocerquone/UnivaqBot
1942 ray-project/ray
1737 boramalper/magnetico
1561 rembo10/headphones
1490 lorien/grab
1452 marcgibbons/django-rest-swagger
1356 tdryer/hangups
1267 EntilZha/PyFunctional
1267 EntilZha/ScalaFunctional
1106 PyCQA/pylint

@PCManticore
Copy link
Contributor

Thanks for tackling this, Bryce! Would it be possible to have a script that reproduces the benchmark dataset? This way us and contributors would have a de-facto way for retrieving the dataset, running the integration tests and getting the performance results out of the dataset.

@PCManticore PCManticore added the Enhancement ✨ Improvement to a component label Mar 26, 2018
@brycepg
Copy link
Contributor Author

brycepg commented Mar 27, 2018

Yeah, I'll be adding some docs to https://github.com/brycepg/pylint-corpus

I've decided to just download source of each python repository into the corpus repository, so I don't have to worry about performance changing over new versions of the same repository.

@brycepg
Copy link
Contributor Author

brycepg commented Mar 27, 2018

Some other notes: I tried to make pylint work on some really big repositories like pandas and numpy but pylint never completed, even over night

@PCManticore
Copy link
Contributor

Wow that is crazy! I wonder what graphs we could extract from pandas and numpy, maybe there are other pathological cases that we might need to look into.

@doublethefish
Copy link
Contributor

The following is edited content taken from PR #3473 .

There are three key parts to achieving this properly:

  1. Generate benchmark data that can be used for comparisons against benchmark data generated under the same conditions for a different commit/branch
  2. Upload data the data from (1) somewhere useful
    • This requires some extra config+work in PyCQA's systems.
    • For fastest implementation, we can use an Amazon S3 bucket.
  3. The final and possibly the most important bit, is that we attempt to establish a baseline of basic performance.
  • This requires some thought for pylint. The rest of this comment makes some suggestions.

Comparable Benchmarks

In PR #3473 we use pytest-benchmark to benchmark the code and we suggest a use-case for how to use it.

The creates .json files that can be compared to .json files created from other runs. Which is very useful. However, for those of us familiar with performance regression testing, without ensuing that the two runs being compared were run under highly-similar conditions we are very likely to get false-postive/negative results. So, some thought needs to be given to how to make use of the .json files generated by pytest-benchmark. Ideas on a post-card.

Side-by-side comparisons

For example, to ensure a fair comparison, we might want to benchmark the target merge branch and the wip-branch within the same tox-run so we know we got comparable profiling information from two runs on the exact-same machine/instance/load. There may be a better way to do this.

Prev runs via Cloud storage

Whilst side-by-side comparisons would likely be the most reliable solution, it is likely to be useful to be able to record a history of performance so we can spot trends, share perfromance heatmaps, logs and so on.

One very quick solution here is to use S3 because travis supports AWS S3 storage out-of-the-box. I already have code to upload benchmark data and heatmaps to an S3 instance.

This means that PyCQA would have to do the same to support this PR. But there would be considerations around costs, monitoring and so on.

Baselines (validation of thinking needed)

To make this useful for any performance work we need to establish robust benchmark baselines. There's one attached to #3473 but we will need more for each of the expensive parts of the system.

To help explain why we need to think about this carefully, whilst creating PR #3458 (which really enables full support of multiprocessing) and dependent work, discovering some hidden problems along the way and reading the various related issues around it (e.g. #1954), it became clear that the improving performance in pylint is not obvious to someone not intimately familiar with pylint, checkers, pylint-configuration and pylint's dependencies (astroid).

@brycepg brycepg removed their assignment Jun 16, 2020
@brandon-leapyear
Copy link

brandon-leapyear commented Feb 25, 2022

This is an old work account. Please reference @brandonchinn178 for all future communication


Some other notes: I tried to make pylint work on some really big repositories like pandas and numpy but pylint never completed, even over night

This also affects repos importing pandas + numpy. #5835 is one minimal example of this, but more generally, running pylint --disable=all on all the files in our repo takes 10s, while it takes 3s after commenting out all pandas + numpy + tqdm imports.

With prospector, it's currently ~40s, after commenting out all pandas + numpy + tqdm imports it's down to 30s, commenting out scipy + matplotlib + seaborn also brings it down to 15s.

@Pierre-Sassoulas Pierre-Sassoulas added the Needs PR This issue is accepted, sufficiently specified and now needs an implementation label Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Improvement to a component Needs PR This issue is accepted, sufficiently specified and now needs an implementation performance primer
Projects
None yet
Development

No branches or pull requests

5 participants