Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update runs-on in workflow files #26435

Merged
merged 2 commits into from
Sep 27, 2023
Merged

Update runs-on in workflow files #26435

merged 2 commits into from
Sep 27, 2023

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Sep 27, 2023

What does this PR do?

I also updated the labels on runners to have t4 and daily-ci, push-ci etc.

@ydshieh ydshieh requested a review from LysandreJik September 27, 2023 10:19
@@ -20,7 +20,7 @@ env:

jobs:
run_doctests:
runs-on: [self-hosted, doc-tests-gpu]
runs-on: [single-gpu, nvidia-gpu, t4, past-ci]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the t4? If we want to move to V100s or better we'll need to update again. As long as it's nvidia-gpu it should suffice, no?

Copy link
Member

@mfuntowicz mfuntowicz Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can keep the t4 tag on the runner enrollment to identity which hardware it is running on and, in the future, if you end up having multiple flavours like we'll have for amd [mi210, mi250, mi300] you can easily target them (or not).

See how we would like to approach it for AMD gpu flavours: https://github.com/huggingface/transformers/pull/26346/files#diff-0517e7d6ca0b03d887391fd62602e2891e13f279d32d464e3f73c6e864b8152fR48

Within the workflow I agree, at the moment, it might not bring any additional constraints.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically no need t4, it's there just to keep more info. explicitly.

@LysandreJik
Copy link
Member

Also is there a difference between the past-ci and daily-ci? Or is it just that you'd like to have runners dedicated to these tasks?

@mfuntowicz
Copy link
Member

Also is there a difference between the past-ci and daily-ci? Or is it just that you'd like to have runners dedicated to these tasks?

It's a typo for push-ci no?

@LysandreJik
Copy link
Member

I see runners with the following tags on nvidia's side: push-ci, daily-ci, past-ci, doctest-ci

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 27, 2023

The documentation is not available anymore as the PR was closed or merged.

@ydshieh
Copy link
Collaborator Author

ydshieh commented Sep 27, 2023

Also is there a difference between the past-ci and daily-ci? Or is it just that you'd like to have runners dedicated to these tasks?

So far, each CI workflow is running in a dedicated runner by using tags/labels like single-gpu-docker (daily CI has it) or docker-gpu (push CI has it). However I find this is difficult to maintain (they are all single gpu and docker but with differnt kind of label names to mean it).

Using explicit daily-ci, push-ci, past-ci etc. makes things clear and easy to understand IMO. But the machines are all of the same.

@ydshieh
Copy link
Collaborator Author

ydshieh commented Sep 27, 2023

Talking to @glegendre01 we decide to keep these extra labels (push-ci etc.) at this moment, and will remove them just before we switch to AWS with the auto-scaling runners setting.

@glegendre01
Copy link
Contributor

Yes, i'm not ready to deploy new runners on transformers now (focusing on 2 others topics). But will be able to handle it next week. at this moment we will remove extra labels

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ydshieh ydshieh merged commit 6ae71ec into main Sep 27, 2023
@ydshieh ydshieh deleted the update_workflow branch September 27, 2023 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants