Arrow Rust + Conbench Integration #1289

dianaclarke · 2022-02-09T02:03:07Z

Here's a minimal Arrow Rust + Conbench[1] proof of concept.

[1] https://github.com/conbench/conbench

A few notes (areas for improvement, caveats, etc):

Criterion results are in nanoseconds, but the smallest unit
Conbench currently speaks is seconds (because Conbench was initially
for macro not micro benchmarking). I suspect most places in Conbench
would work just fine if nanoseconds were passed in, but I need to
audit the code for any places that assume seconds if it isn't a
throughput benchmark.
If the Criterion benchmarks were named better, I could tag them
better in Conbench. For example, I suspect sqrt_20_12, sqrt_20_9,
sqrt_22_12, and sqrt_22_14 are parameterized variations of the same
benchmark, and if they were named something like "sqrt, foo=20,
bar=12", I could batch them together & tag their parameters so that
Conbench would automatically graph them in relation to each other. I
was sort of able to do this with the following benchmarks (because
there was a machine readable pattern). Anyhoo, that's easy enough to
do down the road as a last integration step, and it does appear from
the Criterion docs that they have their own recommendations for how to
do this.
- window partition by, u64_narrow, aggregate functions
- window partition by, u64_narrow, built-in functions
- window partition by, u64_wide, aggregate functions
- window partition by, u64_wide, built-in functions
While Criterion benchmarks can also measure throughput in some
cases, all the arrow-datafusion benchmarks were in elapsed time (not
sure about the arrow-rs benchmarks), so I didn't bother writing code
to support potential throughput results from
arrow-datafusion/arrow-rs, but we may need to revisit that.
We probably want to add some additional context, like the
arrow-rs/arrow-datafusion version, rust version, any compiler flags,
etc.

codecov-commenter · 2022-02-09T02:30:32Z

Codecov Report

Merging #1289 (6c3e7b8) into master (35e16be) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #1289   +/-   ##
=======================================
  Coverage   83.04%   83.04%           
=======================================
  Files         180      180           
  Lines       52424    52424           
=======================================
  Hits        43537    43537           
  Misses       8887     8887

Impacted Files	Coverage Δ
arrow/src/datatypes/datatype.rs	`66.40% <0.00%> (-0.40%)`	⬇️
arrow/src/datatypes/field.rs	`53.79% <0.00%> (-0.31%)`	⬇️
parquet_derive/src/parquet_field.rs	`66.21% <0.00%> (ø)`
arrow/src/array/transform/mod.rs	`84.77% <0.00%> (+0.25%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 35e16be...6c3e7b8. Read the comment docs.

dianaclarke · 2022-02-09T03:30:36Z

conbench/.flake8

@@ -0,0 +1,19 @@
+# Licensed to the Apache Software Foundation (ASF) under one


From a Python perspective, it's probably a mistake to name this directory conbench. Thoughts on what you would like arrow-rs/conbench/ to be named?

arrow-rs/_conbench/?

arrow-rs/conbench-benchmarks/?

arrow-rs/conbench-integration/?

conbench/README.md

andygrove · 2022-02-09T05:12:32Z

I pulled the branch locally and followed the instructions in the README and benchmarks are currently running. It looks like it will take a while so I will check in on results in the morning and continue reviewing the PR.

dianaclarke · 2022-02-09T05:29:02Z

I pulled the branch locally and followed the instructions in the README and benchmarks are currently running. It looks like it will take a while so I will check in on results in the morning and continue reviewing the PR.

Nice!

99% of the time is spent executing cargo bench.

Note that this is only the first step – you should see json formatted benchmark results spew to your terminal, but they won't yet publish to the Arrow Conbench server[1].

You'll want to work with @ElenaHenderson on getting the results to publish to the Arrow Conbench server via buildkite etc.

[1] https://conbench.ursa.dev/

dianaclarke · 2022-02-09T06:08:46Z

conbench/.gitignore

@@ -0,0 +1,130 @@
+# Byte-compiled / optimized / DLL files


This is just the GitHub generated .gitignore for Python packages. I can trim in down, but I usually don't bother.

andygrove · 2022-02-09T15:32:00Z

One of the tests failed because it makes an assumption that a target directory exists and it did not. My assumption is that somehow the current working directory was not correct. This is how I initiated the benchmark run.

~/git/arrow-rs/conbench$ conbench arrow-rs --src-dir=`pwd`/..

I see that the Python code should have changed the directory so I am not sure what went wrong. I will start debugging this tonight. We may also want to have our tests write output to a temporary directory rather than making any assumptions about current working directory.

dianaclarke · 2022-02-09T15:55:10Z

One of the tests failed because it makes an assumption that a target directory exists and it did not. My assumption is that somehow the current working directory was not correct. This is how I initiated the benchmark run.
~/git/arrow-rs/conbench$ conbench arrow-rs --src-dir=`pwd`/..
I see that the Python code should have changed the directory so I am not sure what went wrong. I will start debugging this tonight. We may also want to have our tests write output to a temporary directory rather than making any assumptions about current working directory.

If you pull the latest, I just removed the need for --src-dir. Hopefully that works for you.

$ cd ~/arrow-rs/conbench/
$ conda create -y -n conbench python=3.9
$ conda activate conbench
(conbench) $ pip install -r requirements.txt
(conbench) $ conbench arrow-rs

PS. I don't work on arrow at all anymore, so I better go back to my day job now – just didn't want to leave you hanging. Cheers!

andygrove · 2022-02-09T16:05:12Z

Thanks @dianaclarke I can take it from here for sure .. and it turns out this benchmark was failing even before this PR. I should have checked that first.

dianaclarke · 2022-02-09T16:15:53Z

Thanks @dianaclarke I can take it from here for sure .. and it turns out this benchmark was failing even before this PR. I should have checked that first.

Ah, I was getting an error with cargo bench in arrow-rs, but I assumed I just didn't have my env set up correctly (I don't know Rust aside from what I learned doing this).

But I can cargo bench in arrow-datafusion, so perhaps it's easier to start with this pull request:

DataFusion + Conbench Integration datafusion#1791

andygrove

LGTM. After commenting out the failing csv_writer bench, conbench runs to completion and produces reports. A fix to csv_writer has already been merged to master.

andygrove · 2022-02-19T18:24:18Z

If there are no objections, I plan to merge this PR next week so that I can proceed to the next step of integrating this with CI. The changes in this PR are self-contained so I think it is low risk to merge.

houqp

Thank you @dianaclarke ! Great to see conbench finally lands in arrow-rs :)

dianaclarke force-pushed the conbench branch from ffb3226 to 8720507 Compare February 9, 2022 02:08

dianaclarke mentioned this pull request Feb 9, 2022

What is the process for running benchmarks on repos other than apache/arrow? voltrondata-labs/arrow-benchmarks-ci#45

Closed

dianaclarke force-pushed the conbench branch from 8720507 to ca33a0a Compare February 9, 2022 02:16

dianaclarke force-pushed the conbench branch 3 times, most recently from 359e459 to dc59492 Compare February 9, 2022 03:22

dianaclarke commented Feb 9, 2022

View reviewed changes

dianaclarke force-pushed the conbench branch from dc59492 to 4d7e6bb Compare February 9, 2022 03:40

andygrove reviewed Feb 9, 2022

View reviewed changes

conbench/README.md Outdated Show resolved Hide resolved

dianaclarke mentioned this pull request Feb 9, 2022

Spike adding arrow-datafusion & arrow-rust voltrondata-labs/benchmarks#79

Closed

dianaclarke commented Feb 9, 2022

View reviewed changes

Arrow Rust + Conbench Integration

ce8b4d0

dianaclarke force-pushed the conbench branch from 6c3e7b8 to ce8b4d0 Compare February 9, 2022 15:26

remove --src-dir

19a56c1

dianaclarke force-pushed the conbench branch from b22e96f to 19a56c1 Compare February 9, 2022 16:00

andygrove approved these changes Feb 10, 2022

View reviewed changes

nevi-me approved these changes Feb 20, 2022

View reviewed changes

houqp approved these changes Feb 20, 2022

View reviewed changes

nevi-me merged commit 4b89f7e into apache:master Feb 22, 2022

alamb mentioned this pull request Aug 31, 2024

Remove vestigal conbench integration #6339

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arrow Rust + Conbench Integration #1289

Arrow Rust + Conbench Integration #1289

dianaclarke commented Feb 9, 2022 •

edited

Loading

codecov-commenter commented Feb 9, 2022 •

edited

Loading

dianaclarke Feb 9, 2022 •

edited

Loading

andygrove commented Feb 9, 2022

dianaclarke commented Feb 9, 2022

dianaclarke Feb 9, 2022

andygrove commented Feb 9, 2022

dianaclarke commented Feb 9, 2022

andygrove commented Feb 9, 2022

dianaclarke commented Feb 9, 2022

andygrove left a comment

andygrove commented Feb 19, 2022

houqp left a comment

		@@ -0,0 +1,19 @@
		# Licensed to the Apache Software Foundation (ASF) under one

Arrow Rust + Conbench Integration #1289

Arrow Rust + Conbench Integration #1289

Conversation

dianaclarke commented Feb 9, 2022 • edited Loading

codecov-commenter commented Feb 9, 2022 • edited Loading

Codecov Report

dianaclarke Feb 9, 2022 • edited Loading

Choose a reason for hiding this comment

andygrove commented Feb 9, 2022

dianaclarke commented Feb 9, 2022

dianaclarke Feb 9, 2022

Choose a reason for hiding this comment

andygrove commented Feb 9, 2022

dianaclarke commented Feb 9, 2022

andygrove commented Feb 9, 2022

dianaclarke commented Feb 9, 2022

andygrove left a comment

Choose a reason for hiding this comment

andygrove commented Feb 19, 2022

houqp left a comment

Choose a reason for hiding this comment

dianaclarke commented Feb 9, 2022 •

edited

Loading

codecov-commenter commented Feb 9, 2022 •

edited

Loading

dianaclarke Feb 9, 2022 •

edited

Loading