-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request: provide versioned docker images in ghcr/dockerhub #1086
Comments
@merryHunter, are you using GitHub Actions? In that case, you can pin an exact CML version by using the following setup step instead of a container: - uses: iterative/setup-cml@v1
with:
version: 0.14.0 Additionally, this will allow you to use any container image or just remove it altogether. |
Hi @0x2b3bfa0 , thanks for references to issues! No, we are using Gitlab CI, that's why we cannot access it at the moment. |
Then, you can try installing one of our binary releases: curl https://github.com/iterative/cml/releases/download/v0.16.1/cml-linux-x64 --output /usr/bin/cml && chmod a+x $_ |
I am not aware about the recent changes, but just to give a bit of more context to the problem we faced: we have a CI in Gitlab where a cml-runner is launching training on AWS with a startup script that mounts EFS to access the data. For no reason, our startup script started to silently fail while the cml runner job was successful. After debugging and looking into script logs at ec2 instance, we saw error like We also have problems with passing down env variable 'DOCKER_SHM_SIZE=4g', but that's another issue. |
Thank you for the detailed description of the issue. 🙏 Pinning CML might not suffice to solve this issue, as it depends internally on https://github.com/iterative/terraform-provider-iterative (unpinned) to provision cloud instances. Moreover machine images aren't pinned either. |
The provided startup script runs synchronously. Therefore, your issue can only (?) be caused by an ongoing automatic upgrade. 🤔 |
Exactly, that's the problem with the software upgrade we identified. However, that only means that as CML depends on TPI (which a new tool btw) that can be changed in unexpected way, it would be really great to have at least major releases tagged in dockerhub. I have read the threads, I see it's a hard decision to use certain tag naming, yet the problem is there. |
there is a hidden cml option you can use to pin a tpi/cml version for your created instance.
|
@merryHunter we just did the same in iterative/terraform-provider-iterative#621 (what |
@casperdcl @0x2b3bfa0 that's amazing patch!:) Glad our issue helped identify the problem. Then let's close this issue. |
For those who are looking for proper, production-grade container images: there aren't any.
See also |
*cough* --tpi-version
…On Mon, Jul 4, 2022, 12:57 Helio Machado ***@***.***> wrote:
Thank you for the detailed description of the issue. 🙏
Pinning CML might not suffice to solve this issue, as it depends
internally on https://github.com/iterative/terraform-provider-iterative
(unpinned) to provision cloud instances. Moreover machine images aren't
pinned either.
—
Reply to this email directly, view it on GitHub
<#1086 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIN7M57SHD6L7NZWRCLVFTVSM637ANCNFSM52T6ADPQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hi! Currently, the base CML Docker images are rebuilt based on the latest code and pushed every day to e.g. docker://ghcr.io/iterative/cml:0-dvc2-base1 or https://hub.docker.com/r/iterativeai/cml/tags. That means there is no way to make a rollback to previous version. Unfortunately, recent changes affected our cloud training pipelines and we had to make adjustments to them.
In my opinion it would be beneficial to have stable, fixed versioned docker images. That would ensure that once we pull from them, there is no chance something is updated or broken.
The text was updated successfully, but these errors were encountered: