-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles #8824
Conversation
ccing @pitrou , since I have now added cpp to the mix. |
@kszucs Can you take a look here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling this. It's a little tricky, which is why I punted on doing this before, but here's one note to watch out for.
@@ -21,7 +21,7 @@ on: | |||
push: | |||
paths: | |||
- '.github/workflows/cpp.yml' | |||
- 'ci/docker/**' | |||
- 'ci/docker/*cpp*' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to add this one (or similar) to the R, Python, and Ruby jobs too. Their dockerfiles generally build on the cpp images (you can trace through the base
parameter in docker-compose.yml, e.g. https://github.com/apache/arrow/blob/master/docker-compose.yml#L430)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it took me so long to go through this comment. I actually think that this PR is correct, but it surfaces a bigger problem present in master.
Let's say that we have an image X (typically build via archery build X
), that depends on a base
image (typically build via archery build base
).
Let's analyze what happens in the following 4 cases:
- push to PR with no change to base image's dockerfile
- push to master with no change to base image's dockerfile
- push to PR with a change to base image's dockerfile
- push to master with a change to base image's dockerfile
Let's also call the hash of the current base image in the registry as base_v1
.
push to PR with no change to base image's dockerfile
When a PR is open, image X is built using the dockerfile on the PR for that image, and it uses base_v1
from the registry to base its build from.
push to master with no change to base image's dockerfile
Same as before.
push to PR with a change to base image's dockerfile
In this situation, the PR triggers two CI jobs: the build of image X
and the build image base
. Since these two jobs are independent workflows, image X
will pick base_v1
, while the base image job will build base_v2
.
This happens because docker-compose run X
does not know that X
depends on the compose job base
to create the image base
, and instead has that image available from the remote docker registry ${REPO}
.
Because PRs do not push images to the registry, a new push to the PR will not alter this behavior: the image X
will continue to be built based on base_v1
. Thus, any error derived from building image X
from the new image base_v2
will not surface.
push to master with a change to base image's dockerfile (e.g. PR is merged)
In this situation, there is a race condition (undefined behavior): image X will be built using base_v1
if it starts running before the flow build image base
finishes (the likely scenario), or it picks base_v2
if the other flow has finished.
Thus, atm, whenever a PR changes both dockerfiles, X
and base
, the image X on the registry is based of base_v1
, but base:latest
points to base_v2
.
Thus, even if the new dockerfile is incompatible with base_v2
, the CI will actually pass in master because it picks base_v1
. On the next push to master that triggers X
to be built, it will pick base_v2
, and the error surfaces 🤯
Summary
In both cases where the dockerfile of base
changes, that change is not propagated to the build of X
. Therefore, IMO this PR is correct: changes to the dockerfile
of image base
do not require to trigger a change to the build of the image X
, because there are no constraints in the jobs to build X
to wait for the build of image base
.
This issue seems to indicate that our builds based of docker images that depend on other docker images that are build using docker-compose build
may be too complex as there is hidden state through the registry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's correct but maybe @kszucs can weigh in definitively. My understanding of how docker works (which may be wrong) is that if image Y
is based dockerfile y
, and y
is changed, then Y
gets rebuilt locally in docker-compose build Y
. If y
is based on image X
(and therefore dockerfile x
), and x
is changed, building Y
will first build X
and then proceed using y
to build Y
.
I see a few options:
- Empirically determine which is correct
- Defensively, make the change I suggested (add cpp dockerfile triggers to ruby, python, and r workflows) and we're safe either way
- Make no additional change, as you propose, and risk merging changes to C++ dockerfiles that break downstream jobs that we'll only see after merging.
The risks entailed by #3 are probably low--these files change infrequently, and unless we're doing a wholesale refactor of docker images, they're probably low risk changes. At the same time, the cost of doing #2 is also very low and prevents the surprise failures that #3 could later cause. Given the low costs (and my hesitance to make definitive statements about how docker/docker-compose handles dependent dockerfiles), I still advocate for option #2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion: I agree with you and I do think that we should use 2 regardless, I side-tracked into a potential issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did a quick check to verify this, and AFAI can tell my hypothesis holds:
services:
s1:
image: t1
build:
context: .
dockerfile: docker1.dockerfile
s2:
image: t2
build:
context: .
dockerfile: docker2.dockerfile
# docker1.dockerfile
FROM python:3.7.4
CMD ls
# docker2.dockerfile
FROM t1
CMD ls
docker-compose build s2
Building s2
Step 1/3 : FROM t1
ERROR: Service 's2' failed to build : pull access denied for t1, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
My understanding is that docker has no idea about docker-compose
and how images get built; it only knows about where images are located, via registries. To declare such a dependency of image builds, Docker uses Multi-stage builds. Furthermore, docker-compose also does not declare image dependencies, only service dependencies (i.e. containers, not images).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, neither does docker nor docker-compose have an idea about the image hierarchy but archery does.
archery build
and archery run
commands are idempotent, meaning that the produced docker images are going to be the same for the same input configuration regardless of the state in the docker registry.
See the commands called underneath the archery docker run
command:
❯ archery docker --dry-run run ubuntu-docs
COMPOSE_DOCKER_CLI_BUILD=1 docker-compose pull --ignore-pull-failures ubuntu-cpp
COMPOSE_DOCKER_CLI_BUILD=1 docker-compose pull --ignore-pull-failures ubuntu-python
COMPOSE_DOCKER_CLI_BUILD=1 docker-compose pull --ignore-pull-failures ubuntu-docs
COMPOSE_DOCKER_CLI_BUILD=1 docker-compose build --build-arg BUILDKIT_INLINE_CACHE=1 ubuntu-cpp
COMPOSE_DOCKER_CLI_BUILD=1 docker-compose build --build-arg BUILDKIT_INLINE_CACHE=1 ubuntu-python
COMPOSE_DOCKER_CLI_BUILD=1 docker-compose build --build-arg BUILDKIT_INLINE_CACHE=1 ubuntu-docs
COMPOSE_DOCKER_CLI_BUILD=1 docker-compose run --rm ubuntu-docs
d4608a9
to
356c300
Compare
@@ -21,7 +21,7 @@ on: | |||
push: | |||
paths: | |||
- '.github/workflows/ruby.yml' | |||
- 'ci/docker/**' | |||
- 'ci/docker/linux-apt-ruby.dockerfile' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to list all dockerfiles used in the parent images, in this case:
ci/docker/ubuntu-18.04-cpp.dockerfile
ci/docker/linux-apt-c-glib.dockerfile
ci/docker/linux-apt-ruby.dockerfile
though if we wan't to run UBUNTU=20.04 archery docker run ubuntu-ruby
then:
ci/docker/ubuntu-20.04-cpp.dockerfile
ci/docker/linux-apt-c-glib.dockerfile
ci/docker/linux-apt-ruby.dockerfile
This hierarchy is encoded in the docker-compose.yml depending on the env variables as well.
I think the safest to keep the current configuration with I'm closing this PR for now, feel free to reopen if there is anything I missed. |
This PR restricts some CI workflows from getting triggered when any dockerfile changes, thereby reducing the workload on our queue when someone modifies a dockerfile.
This is particularly important when testing the CI itself, as devs will trigger multiple runs of the CI for single change of a single dockerfile, which triggers the execution of a large number of workflows.