Performance Benchmarks / Integration tests #1954

brycepg · 2018-03-22T02:33:27Z

Pylint needs a collection of performance integration tests (see pylint-dev/astroid#519) help further optimization efforts.

They don't need to be run as part of the standard CI or with just calling pytest since they will be slow, but we should have them to make sure optimizations efforts aren't over optimizing specific projects and so we know which pull requests will slow down astroid/pylint.

I'm thinking that we should pick a specific tag of a specific project, and then set up pytest-benchmark to setup by downloading/extracting the correct zip and running pylint against those files on a developer's machine, and timing how long it takes for pylint to execute and if pylint raises any internal errors.

These tests additionally might be used to look for consistency in output and changes in pylint output messages.

The text was updated successfully, but these errors were encountered:

brycepg · 2018-03-23T04:59:49Z

I did a google bigquery for github repos that have a pylintrc file. I'll probably use these as a starting point for integration tests. Here's the top 25:

_stars_stars	_stars_repo_name
11301	balloob/home-assistant
7059	google/yapf
6694	dbcli/pgcli
6380	wifiphisher/wifiphisher
3635	NervanaSystems/neon
3499	google/seq2seq
3225	numenta/nupic
2796	lektor/lektor
2549	SirVer/ultisnips
2179	edx/edx-platform
2073	JonnyWong16/plexpy
1982	UnivaqTelegramBot/UnivaqInformaticaBot
1982	giacomocerquone/UnivaqBot
1942	ray-project/ray
1737	boramalper/magnetico
1561	rembo10/headphones
1490	lorien/grab
1452	marcgibbons/django-rest-swagger
1356	tdryer/hangups
1267	EntilZha/PyFunctional
1267	EntilZha/ScalaFunctional
1106	PyCQA/pylint

PCManticore · 2018-03-26T15:15:55Z

Thanks for tackling this, Bryce! Would it be possible to have a script that reproduces the benchmark dataset? This way us and contributors would have a de-facto way for retrieving the dataset, running the integration tests and getting the performance results out of the dataset.

brycepg · 2018-03-27T03:11:56Z

Yeah, I'll be adding some docs to https://github.com/brycepg/pylint-corpus

I've decided to just download source of each python repository into the corpus repository, so I don't have to worry about performance changing over new versions of the same repository.

brycepg · 2018-03-27T03:13:14Z

Some other notes: I tried to make pylint work on some really big repositories like pandas and numpy but pylint never completed, even over night

PCManticore · 2018-03-27T06:58:01Z

Wow that is crazy! I wonder what graphs we could extract from pandas and numpy, maybe there are other pathological cases that we might need to look into.

doublethefish · 2020-04-21T11:33:28Z

The following is edited content taken from PR #3473 .

There are three key parts to achieving this properly:

Generate benchmark data that can be used for comparisons against benchmark data generated under the same conditions for a different commit/branch
Upload data the data from (1) somewhere useful
- This requires some extra config+work in PyCQA's systems.
- For fastest implementation, we can use an Amazon S3 bucket.
The final and possibly the most important bit, is that we attempt to establish a baseline of basic performance.

This requires some thought for pylint. The rest of this comment makes some suggestions.

Comparable Benchmarks

In PR #3473 we use pytest-benchmark to benchmark the code and we suggest a use-case for how to use it.

The creates .json files that can be compared to .json files created from other runs. Which is very useful. However, for those of us familiar with performance regression testing, without ensuing that the two runs being compared were run under highly-similar conditions we are very likely to get false-postive/negative results. So, some thought needs to be given to how to make use of the .json files generated by pytest-benchmark. Ideas on a post-card.

Side-by-side comparisons

For example, to ensure a fair comparison, we might want to benchmark the target merge branch and the wip-branch within the same tox-run so we know we got comparable profiling information from two runs on the exact-same machine/instance/load. There may be a better way to do this.

Prev runs via Cloud storage

Whilst side-by-side comparisons would likely be the most reliable solution, it is likely to be useful to be able to record a history of performance so we can spot trends, share perfromance heatmaps, logs and so on.

One very quick solution here is to use S3 because travis supports AWS S3 storage out-of-the-box. I already have code to upload benchmark data and heatmaps to an S3 instance.

This means that PyCQA would have to do the same to support this PR. But there would be considerations around costs, monitoring and so on.

Baselines (validation of thinking needed)

To make this useful for any performance work we need to establish robust benchmark baselines. There's one attached to #3473 but we will need more for each of the expensive parts of the system.

To help explain why we need to think about this carefully, whilst creating PR #3458 (which really enables full support of multiprocessing) and dependent work, discovering some hidden problems along the way and reading the various related issues around it (e.g. #1954), it became clear that the improving performance in pylint is not obvious to someone not intimately familiar with pylint, checkers, pylint-configuration and pylint's dependencies (astroid).

brandon-leapyear · 2022-02-25T05:37:15Z

✨ This is an old work account. Please reference @brandonchinn178 for all future communication ✨

Some other notes: I tried to make pylint work on some really big repositories like pandas and numpy but pylint never completed, even over night

This also affects repos importing pandas + numpy. #5835 is one minimal example of this, but more generally, running pylint --disable=all on all the files in our repo takes 10s, while it takes 3s after commenting out all pandas + numpy + tqdm imports.

With prospector, it's currently ~40s, after commenting out all pandas + numpy + tqdm imports it's down to 30s, commenting out scipy + matplotlib + seaborn also brings it down to 15s.

brycepg self-assigned this Mar 22, 2018

PCManticore added the Enhancement ✨ Improvement to a component label Mar 26, 2018

ceridwen mentioned this issue Aug 27, 2018

WIP: Naive cythonization for performance pylint-dev/astroid#606

Closed

5 tasks

doublethefish mentioned this issue Apr 3, 2020

benchmark| Potential solution for performance regressions #3473

Merged

4 tasks

doublethefish mentioned this issue Apr 21, 2020

profile| Adds (opt-inable) profile-heatmap generation output to tox test-runs #3503

Merged

4 tasks

brycepg removed their assignment Jun 16, 2020

Pierre-Sassoulas added the performance label Feb 25, 2022

Pierre-Sassoulas added the primer label Jul 6, 2022

Pierre-Sassoulas mentioned this issue Jul 6, 2022

Better primer tests (faster, less duplication, more diverse) #5359

Closed

10 tasks

Pierre-Sassoulas added the Needs PR This issue is accepted, sufficiently specified and now needs an implementation label Jul 6, 2022

Pierre-Sassoulas mentioned this issue Oct 2, 2023

Release branch 3.0.0 #9093

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Benchmarks / Integration tests #1954

Performance Benchmarks / Integration tests #1954

brycepg commented Mar 22, 2018

brycepg commented Mar 23, 2018 •

edited

Loading

PCManticore commented Mar 26, 2018

brycepg commented Mar 27, 2018 •

edited

Loading

brycepg commented Mar 27, 2018

PCManticore commented Mar 27, 2018

doublethefish commented Apr 21, 2020

brandon-leapyear commented Feb 25, 2022 •

edited

Loading

Performance Benchmarks / Integration tests #1954

Performance Benchmarks / Integration tests #1954

Comments

brycepg commented Mar 22, 2018

brycepg commented Mar 23, 2018 • edited Loading

PCManticore commented Mar 26, 2018

brycepg commented Mar 27, 2018 • edited Loading

brycepg commented Mar 27, 2018

PCManticore commented Mar 27, 2018

doublethefish commented Apr 21, 2020

Comparable Benchmarks

Side-by-side comparisons

Prev runs via Cloud storage

Baselines (validation of thinking needed)

brandon-leapyear commented Feb 25, 2022 • edited Loading

brycepg commented Mar 23, 2018 •

edited

Loading

brycepg commented Mar 27, 2018 •

edited

Loading

brandon-leapyear commented Feb 25, 2022 •

edited

Loading