Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-10792: [Rust] [CI] Modularize builds for faster build and smaller caches #8821

Closed
wants to merge 5 commits into from
Closed

Conversation

jorgecarleitao
Copy link
Member

@jorgecarleitao jorgecarleitao commented Dec 2, 2020

This PR refactors the CI with the purpose of making it faster, easier to extend, and more robust. The main consequences of this PR are:

  1. build time of non-integration jobs reduced by half (10m vs 20m)
  2. coverage is now part of all builds
  3. integration with the c data interface is now part of all builds
  4. significantly reduced the risk of cache misses, and the size of the caches
  5. reduced complexity by removing the need to build a docker image
  6. macos now have caching in place, like other builds
  7. significantly easier to extend to other architectures, os, etc.

The overall design of this PR is that most steps of the CI process should be independent of all others as much as possible, so that

  • they can run in parallel
  • it is clearer where the failure is
  • it is easier to extend the build to other variations (arch, os, flags, etc.)

Intrinsically, the CI pipeline is just a DAG where certain jobs benefit from shared compute and thus depend on common nodes, while others run independently.

This PR re-writes most of our CI as simple github jobs that share caches whenever makes sense, thereby describing the aforementioned DAG. Consequently, many of the sequential steps are now executed in parallel, causing the items describe above.

There are two tasks depending on rust that are not in the rust.yaml:

  • lint, that happens via archery
  • integration tests, that happens via archery

The bottleneck of a green pipeline are the integration tests, that currently re-compile a significant number of components (as I already flagged on the mailing list), but at least we get faster failures for compilation (2.5m) and clippy (4.5m).

@jorgecarleitao
Copy link
Member Author

@nevi-me , a follow-up of the issue that we were discussing in another PR. IMO this design would enable us to much more easily add new architectures to the CI (we just need to add them to the matrix and would run in parallel).

cc @kszucs

@github-actions
Copy link

github-actions bot commented Dec 2, 2020

@codecov-io
Copy link

codecov-io commented Dec 2, 2020

Codecov Report

Merging #8821 (774e704) into master (1d2b4a5) will decrease coverage by 7.08%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8821      +/-   ##
==========================================
- Coverage   84.48%   77.39%   -7.09%     
==========================================
  Files         190      170      -20     
  Lines       46854    39528    -7326     
==========================================
- Hits        39584    30593    -8991     
- Misses       7270     8935    +1665     
Impacted Files Coverage Δ
rust/datafusion/src/test/variable.rs 0.00% <0.00%> (-100.00%) ⬇️
rust/datafusion/src/optimizer/filter_push_down.rs 0.00% <0.00%> (-99.33%) ⬇️
rust/datafusion/src/datasource/parquet.rs 0.00% <0.00%> (-97.00%) ⬇️
rust/datafusion/src/physical_plan/hash_join.rs 0.00% <0.00%> (-96.12%) ⬇️
...tafusion/src/physical_plan/datetime_expressions.rs 0.00% <0.00%> (-95.09%) ⬇️
rust/datafusion/src/logical_plan/display.rs 0.00% <0.00%> (-94.81%) ⬇️
...tafusion/src/physical_plan/distinct_expressions.rs 0.00% <0.00%> (-91.31%) ⬇️
rust/datafusion/src/execution/dataframe_impl.rs 0.00% <0.00%> (-91.25%) ⬇️
rust/datafusion/src/physical_plan/aggregates.rs 0.00% <0.00%> (-91.14%) ⬇️
rust/datafusion/src/physical_plan/sort.rs 0.00% <0.00%> (-91.13%) ⬇️
... and 78 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d2b4a5...774e704. Read the comment docs.

restore-keys: ${{ runner.os }}-${{ matrix.arch }}-target-cache-${{ matrix.rust }}-
- name: Run tests
run: |
cd rust/arrow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GregBowyer we'd be able to add parquet to stable tests here, so we can leave out adding it to CI in your PR

@nevi-me nevi-me requested a review from kszucs December 3, 2020 04:19
@jorgecarleitao jorgecarleitao deleted the build_tests branch December 4, 2020 07:41
@jorgecarleitao jorgecarleitao restored the build_tests branch December 4, 2020 07:41
@jorgecarleitao jorgecarleitao reopened this Dec 4, 2020
.github/workflows/rust.yml Outdated Show resolved Hide resolved
container:
image: ${{ matrix.arch }}/rust
env:
ARROW_TEST_DATA: /__w/arrow/arrow/testing/data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the __w prefix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the absolute path to where the container action places the checkout code on. I tried using relative paths but I was unable to make it work.

@@ -903,35 +902,6 @@ services:
/bin/bash -c "
/arrow/ci/scripts/r_sanitize.sh /arrow"

################################ Rust #######################################

debian-rust:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we plan to use plain GHA jobs without docker-compose it'd be nice to keep it around to let contributors execute rust builds without installing the toolchain locally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arguments I used to remove it:

  • this compose and associated dockerfile are no longer tested under the CI, and their primary purpose was to be used in the CI
  • the dockerfile is not needed, as the default rust image is sufficient
  • the compose contains non-trivial aspects, such as:
    • volume mounting with delegate
    • using the no-longer needed rust_test.sh
    • non-default CARGO_* env variables
  • docker-compose run debian-rust is IMO less transparent to a user about what is happening, and the person needs to go to a +1k LOC file to understand.

In opposition, this whole project can now be built using

docker run --rm -v $(pwd)/rust:/rust -it rust /bin/bash -c "cd /rust && cargo build"

which IMO is really simple to understand and reproduce. Wouldn't this command on the README be sufficient for contributors?

In the end, I evaluated that having these around would be more confusing than just having the user to follow standard rust practices of building a rust project (including using the official docker image), and therefore there was no need for the maintenance burden of having these around.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sounds good to me - lucky us that we have cargo around.

Please document how to reproduce the builds locally with docker.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a section on the README on how to compile, which includes both pure cargo and docker variants.

@kszucs
Copy link
Member

kszucs commented Dec 4, 2020

Thanks Jorge for working on this!

In general it looks good to me, but this PR makes harder to reproduce the exact same environment locally. Considering that cargo makes the building process straightforward and if it indeed speeds up the build execution by a factor of 2 I can accept that tradeoff.

@github-actions github-actions bot added Component: Rust needs-rebase A PR that needs to be rebased by the author labels Dec 4, 2020
@github-actions github-actions bot removed the needs-rebase A PR that needs to be rebased by the author label Dec 5, 2020
@jorgecarleitao
Copy link
Member Author

I rebased it against latest master and placed the c interface as part of the tests, instead of the cronjob, which will allow us to verify that part on every pr, as it should be.

@jorgecarleitao
Copy link
Member Author

cc @alamb and @andygrove , as this will have a (hopefully positive) impact on all our builds.

@alamb
Copy link
Contributor

alamb commented Dec 7, 2020

I will try and read this more carefully later this evening -- it sounds pretty awesome

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a really nice improvement @jorgecarleitao

I checked that the commands in ci/scripts made it into the rust.yaml and I poked around in this PR's output -- things look good to me. 👍

I don't fully follow the concern about the lack of a way to build inside of docker-compose but I have never used that feature and if I wanted to build arrow in docker I would indeed use the image that @jorgecarleitao has used

The only failure on the PR at this time is some C++ / windows job which looked unrelated to this change: https://github.com/apache/arrow/pull/8821/checks?check_run_id=1504026175

 unity_1_c.c
C:\Program Files (x86)\Windows Kits\10\Include\10.0.17763.0\ucrt\stdio.h(378,9): error C2220: the following warning is treated as an error [D:\a\arrow\arrow\build\cpp\src\arrow\arrow_shared.vcxproj]
C:\Program Files (x86)\Windows Kits\10\Include\10.0.17763.0\ucrt\stdio.h(378,9): warning C5105: macro expansion producing 'defined' has undefined behavior [D:\a\arrow\arrow\build\cpp\src\arrow\arrow_shared.vcxproj]
C:\Program Files (x86)\Windows Kits\10\Include\10.0.17763.0\ucrt\stdio.h(378,9): message : to simplify migration, consider the temporary use of /Wv:18 flag with the version of the compiler with which you used to build without warnings [D:\a\arrow\arrow\build\cpp\src\arrow\arrow_shared.vcxproj]

@alamb
Copy link
Contributor

alamb commented Dec 8, 2020

I retriggered the C++ CI tests on github in hopes it passes on a subsequent run.

@jorgecarleitao I propose we merge this PR in and then send a note to the dev@arrow mailing list with a heads up ("major refactor of rust CI configuration, please let us know if you see issues") -- I am happy to do so if you would prefer

@jorgecarleitao
Copy link
Member Author

Let's do that. I can write the note, as I can also describe the changes in more depth, as IMO they are relevant to other parts of the project also.

@alamb
Copy link
Contributor

alamb commented Dec 8, 2020

@jorgecarleitao -- sounds good (with you writing the note)

The windows C++ test is still failing in a strange way -- I can't seem to find AMD64 Windows 2019 C++ running on any other (recent) PRs. So it looks to me like it may not run regularly anymore?

Screen Shot 2020-12-08 at 8 43 33 AM

https://github.com/apache/arrow/pull/8831/files#r537152984

@jorgecarleitao
Copy link
Member Author

@alamb , these are unrelated and I (unfortunately) see them often whenever I need to change dockerfiles and other CI files. I have been ignoring them, as I expect / hope that people from the respective implementations to be handling them already.

I have been also trying to reduce the number of builds that we trigger when we change rust-specific code, e.g. #8824

@alamb
Copy link
Contributor

alamb commented Dec 8, 2020

Ok, 🚀 it is then!

cc @nevi-me @andygrove @kszucs

@alamb alamb closed this in 7d509dc Dec 8, 2020
@jorgecarleitao jorgecarleitao deleted the build_tests branch December 8, 2020 16:52
alamb pushed a commit that referenced this pull request Dec 18, 2020
This is a requirement fielded to merge #7767, and so I though about adding it to our CI.

I am not sure whether this is exactly what we need to test #7767, and so I would like to request help from @paddyhoran and @rj-atw here.

NOTE: this is built on top of #8821 . Only the last commit is relevant. Basically, this runs

```
rustup add target wasm32-unknown-unknown
cd rust/arrow
cargo build --target wasm32-unknown-unknown
```

Note that this does not actually run the tests yet. I think that that is #7767 's goal.

Closes #8864 from jorgecarleitao/wasm32

Authored-by: Jorge C. Leitao <[email protected]>
Signed-off-by: Andrew Lamb <[email protected]>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
…er caches

This PR refactors the CI with the purpose of making it faster, easier to extend, and more robust. The main consequences of this PR are:

1. build time of non-integration jobs reduced by half (10m vs 20m)
2. coverage is now part of all builds
3. integration with the c data interface is now part of all builds
4. significantly reduced the risk of cache misses, and the size of the caches
5. reduced complexity by removing the need to build a docker image
6. macos now have caching in place, like other builds
7. significantly easier to extend to other architectures, os, etc.

The overall design of this PR is that most steps of the CI process should be independent of all others as much as possible, so that

* they can run in parallel
* it is clearer where the failure is
* it is easier to extend the build to other variations (arch, os, flags, etc.)

Intrinsically, the CI pipeline is just a DAG where certain jobs benefit from shared compute and thus depend on common nodes, while others run independently.

This PR re-writes most of our CI as simple github jobs that share caches whenever makes sense, thereby describing the aforementioned DAG. Consequently, many of the sequential steps are now executed in parallel, causing the items describe above.

There are two tasks depending on rust that are not in the `rust.yaml`:

* lint, that happens via archery
* integration tests, that happens via archery

The bottleneck of a green pipeline are the integration tests, that currently re-compile a significant number of components (as I already flagged on the mailing list), but at least we get faster failures for compilation (2.5m) and clippy (4.5m).

Closes apache#8821 from jorgecarleitao/build_tests

Authored-by: Jorge C. Leitao <[email protected]>
Signed-off-by: Andrew Lamb <[email protected]>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
This is a requirement fielded to merge apache#7767, and so I though about adding it to our CI.

I am not sure whether this is exactly what we need to test apache#7767, and so I would like to request help from @paddyhoran and @rj-atw here.

NOTE: this is built on top of apache#8821 . Only the last commit is relevant. Basically, this runs

```
rustup add target wasm32-unknown-unknown
cd rust/arrow
cargo build --target wasm32-unknown-unknown
```

Note that this does not actually run the tests yet. I think that that is apache#7767 's goal.

Closes apache#8864 from jorgecarleitao/wasm32

Authored-by: Jorge C. Leitao <[email protected]>
Signed-off-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants