PERF: Add benchmarking? #1257

max-sixty · 2017-02-09T21:17:40Z

Because xarray is all python and generally not doing much compute itself (i.e. it marshals other libraries to do that), this hasn't been that important.

IIRC most of the performance issues have arisen where xarray builds on (arguably) shaky foundations, like PeriodIndex.

Though as we mature, is it worth adding some benchmarks?

If so, what's a good way to do this? Pandas uses asv successfully. I don't have experience with https://github.com/ionelmc/pytest-benchmark but that could be a lower cost way of getting started. Any others?

The text was updated successfully, but these errors were encountered:

shoyer · 2017-02-09T22:02:00Z

Yes, some sort of automated benchmarking could be valuable, especially for noticing and fixing regressions. I've done occasional benchmarks before to optimize bottlenecks (e.g., class constructors) but it's all been ad-hoc stuff with %timeit in IPython.

ASV seems like a pretty sane way to do this. pytest-benchmark can trigger test failures if performance goes below some set level but I suspect performance is too subjective and stochastic to be reliable.

max-sixty · 2017-02-10T01:40:41Z

Yes ASV is good. I'm surprised there isn't something you can ask to just "robustly time these tests", so it can bolt on without writing new code.
Although maybe the overlap between test code and benchmark code isn't as great as I imagine

shoyer · 2017-02-10T01:58:03Z

One issue is that unit tests are often not good benchmarks. Ideal unit tests are as fast as possible, whereas ideal benchmarks should be run on more typical inputs, which may be much slower.

rabernat · 2017-02-10T03:04:31Z

Another 👍 for benchmarking. Especially as we start to get deep into integrating dask.distributed, having robust performance benchmarks will be very useful. One challenge is where to deploy the benchmarks. TravisCI might not be ideal, since performance can vary depending on competition from other virtual machines on the same system.

pwolfram · 2017-02-10T16:15:22Z

We would also benefit from this specifically for #1198 👍

jhamman · 2017-06-12T21:28:12Z

Is anyone interested in working on this with me over the next few months? Given the number of issues we've been seeing, I'd like to see this come together this summer. I think ASV is the natural starting point.

rabernat · 2017-06-13T01:08:07Z

I am very interested. I have been doing a lot of benchmarking already wrt dask.distributed on my local cluster, focusing on performance with multi-terabyte datasets. At this scale, certain operations emerge as performance bottlenecks (e.g. index alignment of multi-file netcdf datasets, #1385).

I think this should probably be done in AWS or Google Cloud. That way we can establish a consistent test environment for benchmarking. I might be able to pay for that (especially if our proposal gets funded)!

jhamman · 2017-06-13T04:21:48Z

@rabernat - great. I've setup a ASV project and am in the process of teaching myself how that all works. I'm just playing with some simple arithmatic benchmarks for now but, of course, most of our interested will be in the i/o and dask arenas.

I'm wondering if @mrocklin has seen ASV used with any dask projects. We'll just need to make sure we choose the appropriate timer when profiling dask functions.

mrocklin · 2017-06-13T11:12:26Z

@TomAugspurger has done some ASV work with Dask itself

max-sixty added the topic-performance label Feb 9, 2017

This was referenced Jun 15, 2017

If a NetCDF file is chunked on disk, open it with compatible dask chunks #1440

Closed

Feature/benchmark #1457

Merged

jhamman closed this as completed in #1457 Jul 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: Add benchmarking? #1257

PERF: Add benchmarking? #1257

max-sixty commented Feb 9, 2017

shoyer commented Feb 9, 2017

max-sixty commented Feb 10, 2017

shoyer commented Feb 10, 2017

rabernat commented Feb 10, 2017

pwolfram commented Feb 10, 2017

jhamman commented Jun 12, 2017

rabernat commented Jun 13, 2017

jhamman commented Jun 13, 2017

mrocklin commented Jun 13, 2017

PERF: Add benchmarking? #1257

PERF: Add benchmarking? #1257

Comments

max-sixty commented Feb 9, 2017

shoyer commented Feb 9, 2017

max-sixty commented Feb 10, 2017

shoyer commented Feb 10, 2017

rabernat commented Feb 10, 2017

pwolfram commented Feb 10, 2017

jhamman commented Jun 12, 2017

rabernat commented Jun 13, 2017

jhamman commented Jun 13, 2017

mrocklin commented Jun 13, 2017