Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: add benchmark test suite #8538

Merged
merged 6 commits into from
Oct 19, 2023
Merged

CI: add benchmark test suite #8538

merged 6 commits into from
Oct 19, 2023

Conversation

rouault
Copy link
Member

@rouault rouault commented Oct 10, 2023

  • Uses pytest-benchmark
  • Add a autotest/benchmark module (now with only a few tests for the GTiff and GPKG driver)
  • Add a new CI benchmark configuration, which builds the current tree and a reference tree, and run the benchmark test suite in both

@rouault rouault force-pushed the benchmark branch 2 times, most recently from 86c21dd to ddb00d5 Compare October 10, 2023 16:52
@rouault
Copy link
Member Author

rouault commented Oct 10, 2023

CC @dbaston

@rouault rouault force-pushed the benchmark branch 4 times, most recently from 1b8bd41 to f6466d9 Compare October 10, 2023 17:50
return filename


def test_ogr_gpkg_spatial_index(source_file):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think bringing in the benchmark fixture explicitly would let you more clearly exclude the setup cost. It's not clear to me if the runtime of source_file is part of the timing or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me if the runtime of source_file is part of the timing or not.

it is not, comparing timings with the somewhat more verbose below version. source_file is run once as expected

def test_ogr_gpkg_spatial_index(source_file, benchmark):
    @benchmark
    def do():
        ds = ogr.Open(source_file)
        lyr = ds.GetLayer(0)
        lyr.SetSpatialFilterRect(1000, 1000, 10000, 10000)
        count = 0
        for f in lyr:
            count += 1
        assert count == 10000 - 1000 + 1

autotest/conftest.py Outdated Show resolved Hide resolved
@@ -269,6 +276,13 @@ jobs:
TEST_CMD="ctest -V -j $(nproc)"
fi

if test "${{ matrix.id }}" = "benchmarks"; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move into test.sh ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can be done under Docker. Anyway it doesn't look like this is even hit (perhaps because we already run in a container/VM that doesn't expose turboboost setting)

@rouault
Copy link
Member Author

rouault commented Oct 10, 2023

d35198a fails as expected (https://github.com/OSGeo/gdal/actions/runs/6473107275/job/17575133870?pr=8538):

 Performance has regressed:
	test_gtiff_byte (0001_ref) - Field 'mean' has failed PercentageRegressionCheck: 100.708833475 > 5.000000000

@rouault
Copy link
Member Author

rouault commented Oct 10, 2023

changing to use min statistics as suggest by @DFEvans on the mailing list

@coveralls
Copy link
Collaborator

coveralls commented Oct 10, 2023

Coverage Status

coverage: 67.761% (+0.001%) from 67.76% when pulling 974175b on rouault:benchmark into f3bc13d on OSGeo:master.

@rouault
Copy link
Member Author

rouault commented Oct 11, 2023

@dbaston I've written more tests, and it seems we now hit unreliability of timings. Initial failed run in https://github.com/OSGeo/gdal/actions/runs/6482985166/job/17603597567 using the naive procedure (running the reference test suite, and then the one of the PR)

I've attempted to modify the test procedure (cf updated benchmark/test.sh) to do more runs to attempt mitigating this, but given See https://github.com/OSGeo/gdal/actions/runs/6485533623/job/17611807035?pr=8538
this obviously doesn't work: the first comparison run has just one test that is slightly over my 5% arbitrary criterion, but the retry has 2 tests 11% and 21% slower.

Any ideas (except increasing significantly the tolerance threshold to let's say 30% to be hopefully robust, but we won't catch up small perf regressions) or are we hitting a dead end?
Hum looking at https://pytest-benchmark.readthedocs.io/en/stable/faq.html, maybe use an alternate timer : "You could use something like time.process_time (Python 3.3+ only) as the timer. Process time doesn’t include sleeping or waiting for I/O.", assuming the difference in timings result in the VM being randomly scheduled out by the VM hypervisor.

@rouault
Copy link
Member Author

rouault commented Oct 11, 2023

more stable timings with --benchmark-timer=time.process_time in https://github.com/OSGeo/gdal/actions/runs/6486353872/job/17614377148?pr=8538.

@rouault
Copy link
Member Author

rouault commented Oct 11, 2023

A retry failed : https://github.com/OSGeo/gdal/actions/runs/6486353872/job/17617138599?pr=8538 . Up to 16% of slowdown on one test. Bumping tolerance to 20% ...

@rouault
Copy link
Member Author

rouault commented Oct 19, 2023

merging this

@rouault rouault merged commit 32ec2a9 into OSGeo:master Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants