ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles #8824

jorgecarleitao · 2020-12-03T17:17:48Z

This PR restricts some CI workflows from getting triggered when any dockerfile changes, thereby reducing the workload on our queue when someone modifies a dockerfile.

This is particularly important when testing the CI itself, as devs will trigger multiple runs of the CI for single change of a single dockerfile, which triggers the execution of a large number of workflows.

github-actions · 2020-12-03T17:37:06Z

https://issues.apache.org/jira/browse/ARROW-10798

jorgecarleitao · 2020-12-03T17:54:46Z

ccing @pitrou , since I have now added cpp to the mix.

pitrou · 2020-12-03T17:55:40Z

@kszucs Can you take a look here?

nealrichardson

Thanks for tackling this. It's a little tricky, which is why I punted on doing this before, but here's one note to watch out for.

nealrichardson · 2020-12-03T19:10:55Z

.github/workflows/cpp.yml

@@ -21,7 +21,7 @@ on:
  push:
    paths:
      - '.github/workflows/cpp.yml'
-      - 'ci/docker/**'
+      - 'ci/docker/*cpp*'


I think you need to add this one (or similar) to the R, Python, and Ruby jobs too. Their dockerfiles generally build on the cpp images (you can trace through the base parameter in docker-compose.yml, e.g. https://github.com/apache/arrow/blob/master/docker-compose.yml#L430)

Sorry it took me so long to go through this comment. I actually think that this PR is correct, but it surfaces a bigger problem present in master.

Let's say that we have an image X (typically build via archery build X), that depends on a base image (typically build via archery build base).

Let's analyze what happens in the following 4 cases:

push to PR with no change to base image's dockerfile

push to master with no change to base image's dockerfile

push to PR with a change to base image's dockerfile

push to master with a change to base image's dockerfile

Let's also call the hash of the current base image in the registry as base_v1.

push to PR with no change to base image's dockerfile

When a PR is open, image X is built using the dockerfile on the PR for that image, and it uses base_v1 from the registry to base its build from.

push to master with no change to base image's dockerfile

Same as before.

push to PR with a change to base image's dockerfile

In this situation, the PR triggers two CI jobs: the build of image X and the build image base. Since these two jobs are independent workflows, image X will pick base_v1, while the base image job will build base_v2.

This happens because docker-compose run X does not know that X depends on the compose job base to create the image base, and instead has that image available from the remote docker registry ${REPO}.

Because PRs do not push images to the registry, a new push to the PR will not alter this behavior: the image X will continue to be built based on base_v1. Thus, any error derived from building image X from the new image base_v2 will not surface.

push to master with a change to base image's dockerfile (e.g. PR is merged)

In this situation, there is a race condition (undefined behavior): image X will be built using base_v1 if it starts running before the flow build image base finishes (the likely scenario), or it picks base_v2 if the other flow has finished.

Thus, atm, whenever a PR changes both dockerfiles, X and base, the image X on the registry is based of base_v1, but base:latest points to base_v2.

Thus, even if the new dockerfile is incompatible with base_v2, the CI will actually pass in master because it picks base_v1. On the next push to master that triggers X to be built, it will pick base_v2, and the error surfaces 🤯

Summary

In both cases where the dockerfile of base changes, that change is not propagated to the build of X. Therefore, IMO this PR is correct: changes to the dockerfile of image base do not require to trigger a change to the build of the image X, because there are no constraints in the jobs to build X to wait for the build of image base.

This issue seems to indicate that our builds based of docker images that depend on other docker images that are build using docker-compose build may be too complex as there is hidden state through the registry.

I don't think that's correct but maybe @kszucs can weigh in definitively. My understanding of how docker works (which may be wrong) is that if image Y is based dockerfile y, and y is changed, then Y gets rebuilt locally in docker-compose build Y. If y is based on image X (and therefore dockerfile x), and x is changed, building Y will first build X and then proceed using y to build Y.

I see a few options:

Empirically determine which is correct

Defensively, make the change I suggested (add cpp dockerfile triggers to ruby, python, and r workflows) and we're safe either way

Make no additional change, as you propose, and risk merging changes to C++ dockerfiles that break downstream jobs that we'll only see after merging.

The risks entailed by #3 are probably low--these files change infrequently, and unless we're doing a wholesale refactor of docker images, they're probably low risk changes. At the same time, the cost of doing #2 is also very low and prevents the surprise failures that #3 could later cause. Given the low costs (and my hesitance to make definitive statements about how docker/docker-compose handles dependent dockerfiles), I still advocate for option #2.

Sorry for the confusion: I agree with you and I do think that we should use 2 regardless, I side-tracked into a potential issue.

I just did a quick check to verify this, and AFAI can tell my hypothesis holds:

services: s1: image: t1 build: context: . dockerfile: docker1.dockerfile s2: image: t2 build: context: . dockerfile: docker2.dockerfile

# docker1.dockerfile FROM python:3.7.4 CMD ls

# docker2.dockerfile FROM t1 CMD ls

docker-compose build s2 Building s2 Step 1/3 : FROM t1 ERROR: Service 's2' failed to build : pull access denied for t1, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

My understanding is that docker has no idea about docker-compose and how images get built; it only knows about where images are located, via registries. To declare such a dependency of image builds, Docker uses Multi-stage builds. Furthermore, docker-compose also does not declare image dependencies, only service dependencies (i.e. containers, not images).

Indeed, neither does docker nor docker-compose have an idea about the image hierarchy but archery does.

archery build and archery run commands are idempotent, meaning that the produced docker images are going to be the same for the same input configuration regardless of the state in the docker registry.

See the commands called underneath the archery docker run command:

❯ archery docker --dry-run run ubuntu-docs COMPOSE_DOCKER_CLI_BUILD=1 docker-compose pull --ignore-pull-failures ubuntu-cpp COMPOSE_DOCKER_CLI_BUILD=1 docker-compose pull --ignore-pull-failures ubuntu-python COMPOSE_DOCKER_CLI_BUILD=1 docker-compose pull --ignore-pull-failures ubuntu-docs COMPOSE_DOCKER_CLI_BUILD=1 docker-compose build --build-arg BUILDKIT_INLINE_CACHE=1 ubuntu-cpp COMPOSE_DOCKER_CLI_BUILD=1 docker-compose build --build-arg BUILDKIT_INLINE_CACHE=1 ubuntu-python COMPOSE_DOCKER_CLI_BUILD=1 docker-compose build --build-arg BUILDKIT_INLINE_CACHE=1 ubuntu-docs COMPOSE_DOCKER_CLI_BUILD=1 docker-compose run --rm ubuntu-docs

kszucs · 2021-06-30T12:03:27Z

.github/workflows/ruby.yml

@@ -21,7 +21,7 @@ on:
  push:
    paths:
      - '.github/workflows/ruby.yml'
-      - 'ci/docker/**'
+      - 'ci/docker/linux-apt-ruby.dockerfile'


We would need to list all dockerfiles used in the parent images, in this case:

ci/docker/ubuntu-18.04-cpp.dockerfile ci/docker/linux-apt-c-glib.dockerfile ci/docker/linux-apt-ruby.dockerfile

though if we wan't to run UBUNTU=20.04 archery docker run ubuntu-ruby then:

ci/docker/ubuntu-20.04-cpp.dockerfile ci/docker/linux-apt-c-glib.dockerfile ci/docker/linux-apt-ruby.dockerfile

This hierarchy is encoded in the docker-compose.yml depending on the env variables as well.

kszucs · 2021-06-30T12:07:45Z

I think the safest to keep the current configuration with docker/** so we won't forget to update the workflow files if we change the image dependencies. A better approach could be to generate the workflow files from the docker-compose.yml but currently it may not worth the effort.

I'm closing this PR for now, feel free to reopen if there is anything I missed.

Made workflows be less triggered.

e14c809

jorgecarleitao requested a review from nealrichardson December 3, 2020 17:17

jorgecarleitao changed the title ~~ARROW-10798: [CI] Made workflows be less triggered.~~ ARROW-10798: [CI] Made certain workflows be less triggered. Dec 3, 2020

Added CPP to the mix.

b2ecf6e

jorgecarleitao changed the title ~~ARROW-10798: [CI] Made certain workflows be less triggered.~~ ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles Dec 3, 2020

pitrou requested a review from kszucs December 3, 2020 17:55

Reduce risk of false negative.

86a2fb0

nealrichardson requested changes Dec 3, 2020

View reviewed changes

jorgecarleitao closed this Dec 4, 2020

jorgecarleitao deleted the improve_dev branch December 4, 2020 07:41

jorgecarleitao restored the improve_dev branch December 4, 2020 07:41

jorgecarleitao reopened this Dec 4, 2020

jorgecarleitao mentioned this pull request Dec 8, 2020

ARROW-10792: [Rust] [CI] Modularize builds for faster build and smaller caches #8821

Closed

jorgecarleitao force-pushed the master branch from d4608a9 to 356c300 Compare February 14, 2021 12:09

kszucs reviewed Jun 30, 2021

View reviewed changes

kszucs closed this Jun 30, 2021

asfimport mentioned this pull request Jul 12, 2022

[CI] Do not run R, Go and Ruby on any change to any dockerfile. #26737

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles #8824

ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles #8824

jorgecarleitao commented Dec 3, 2020 •

edited

Loading

github-actions bot commented Dec 3, 2020

jorgecarleitao commented Dec 3, 2020

pitrou commented Dec 3, 2020

nealrichardson left a comment

nealrichardson Dec 3, 2020

jorgecarleitao Dec 8, 2020 •

edited

Loading

nealrichardson Dec 8, 2020

jorgecarleitao Dec 8, 2020

jorgecarleitao Dec 8, 2020

kszucs Jun 30, 2021

kszucs Jun 30, 2021

kszucs commented Jun 30, 2021

ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles #8824

ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles #8824

Conversation

jorgecarleitao commented Dec 3, 2020 • edited Loading

github-actions bot commented Dec 3, 2020

jorgecarleitao commented Dec 3, 2020

pitrou commented Dec 3, 2020

nealrichardson left a comment

Choose a reason for hiding this comment

nealrichardson Dec 3, 2020

Choose a reason for hiding this comment

jorgecarleitao Dec 8, 2020 • edited Loading

Choose a reason for hiding this comment

push to PR with no change to base image's dockerfile

push to master with no change to base image's dockerfile

push to PR with a change to base image's dockerfile

push to master with a change to base image's dockerfile (e.g. PR is merged)

Summary

nealrichardson Dec 8, 2020

Choose a reason for hiding this comment

jorgecarleitao Dec 8, 2020

Choose a reason for hiding this comment

jorgecarleitao Dec 8, 2020

Choose a reason for hiding this comment

kszucs Jun 30, 2021

Choose a reason for hiding this comment

kszucs Jun 30, 2021

Choose a reason for hiding this comment

kszucs commented Jun 30, 2021

jorgecarleitao commented Dec 3, 2020 •

edited

Loading

jorgecarleitao Dec 8, 2020 •

edited

Loading