Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backend): Upgrade Argo server image version to v2.11.6 #4693

Closed

Conversation

xinbinhuang
Copy link

@xinbinhuang xinbinhuang commented Oct 30, 2020

Description of your changes:

Part of #4553

Checklist:

  • The title for your pull request (PR) should follow our title convention. Learn more about the pull request title convention used in this repository.

    PR titles examples:

    • fix(frontend): fixes empty page. Fixes #1234
      Use fix to indicate that this PR fixes a bug.
    • feat(backend): configurable service account. Fixes #1234, fixes #1235
      Use feat to indicate that this PR adds a new feature.
    • chore: set up changelog generation tools
      Use chore to indicate that this PR makes some changes that users don't need to know.
    • test: fix CI failure. Part of #1234
      Use part of to indicate that a PR is working on an issue, but shouldn't close the issue when merged.
  • Do you want this pull request (PR) cherry-picked into the current release branch?

    If yes, use one of the following options:

    • (Recommended.) Ask the PR approver to add the cherrypick-approved label to this PR. The release manager adds this PR to the release branch in a batch update.
    • After this PR is merged, create a cherry-pick PR to add these changes to the release branch. (For more information about creating a cherry-pick PR, see the Kubeflow Pipelines release guide.)

@kubeflow-bot
Copy link

This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

Hi @xinbinhuang. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@xinbinhuang xinbinhuang force-pushed the upgrade_argo_controller_version branch from 3b702c4 to 94b2947 Compare October 30, 2020 00:52
@xinbinhuang xinbinhuang changed the title Upgrade Argo server version feat(backend): Upgrade Argo server version Oct 30, 2020
@@ -15,7 +15,7 @@ COPY backend/requirements.txt .
RUN python3 -m pip install -r requirements.txt

# Downloading Argo CLI so that the samples are validated
#ADD https://github.com/argoproj/argo/releases/download/v2.7.5/argo-linux-amd64 /usr/local/bin/argo
#ADD https://github.com/argoproj/argo/releases/download/v2.11.6/argo-linux-amd64 /usr/local/bin/argo
Copy link
Author

@xinbinhuang xinbinhuang Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replace all v2.7.5 -> v2.11.6 in the repo, but I am not sure if I need to update this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should leave this one as well since it is the Kubeflow backend which will be running 2.7.7. Argo is in this case used to validate the workflows to check if they are valid according to argo.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we have several places that use Argo but they can be somehow independent of each other. Can you tell me how each of them is used so that I can get a clearer picture of how they work together?

@@ -25,7 +25,7 @@ ACCOUNT=$(gcloud info --format='value(config.account)')
kubectl create clusterrolebinding PROW_BINDING --clusterrole=cluster-admin --user=$ACCOUNT --dry-run -o yaml | kubectl apply -f -
kubectl create clusterrolebinding DEFAULT_BINDING --clusterrole=cluster-admin --serviceaccount=default:default --dry-run -o yaml | kubectl apply -f -

ARGO_VERSION=v2.7.5
ARGO_VERSION=v2.11.6
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I need to update this one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be needed for all the test infrastructure to run with argo 2.11.6

@@ -27,7 +27,8 @@ spec:
- --configmap
- workflow-controller-configmap
- --executor-image
- gcr.io/ml-pipeline/argoexec:v2.7.5-license-compliance
- gcr.io/ml-pipeline/argoexec:v2.11.6-license-compliance
- --namespaced
Copy link
Author

@xinbinhuang xinbinhuang Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the extra argument that I need to specifically add for my deployment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a corresponding documentation? Does omitting the argument make it cluster mode? We need to make sure it is customizable from an overlay to continue support multi-user mode.

Copy link
Author

@xinbinhuang xinbinhuang Nov 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster vs namespace install seems to start since v2.5 - this can also be found in the v2.5 release branch.

In my case, it did become a cluster install if I simply replace the image version to argoproj/argoexec:v2.11.6 - I got permission issues when trying to launch new pipelines.

I am not familiar with the multi-user mode. Does it install the workflow-controller in cluster scope?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error log from when trying to launch pipelines without --namespaced

E1107 01:14:06.943364       1 reflector.go:153] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:105: Failed to list *unstructured.Unstructured: workflowtemplates.argoproj.io is forbidden: User "system:serviceaccount:kubeflow:argo" cannot list resource "workflowtemplates" in API group "argoproj.io" at the cluster scope

Comment on lines +25 to 26
#RUN ARGO_VERSION=v2.11.6 && curl -sSL -o /usr/local/bin/argo \
RUN ARGO_VERSION=v2.4.3 && curl -sSL -o /usr/local/bin/argo \
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this one either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used to kick of the ML pipeline sample e2e test, since the changes you do is only for the server side I think we should not update this to 2.11.6. But I think we should go with 2.7.7 since that is the version that is used by Kubeflow today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a bug from argo that prevented using the argo cli from a container, that's why we were stuck on 2.4.3. I'm not sure about latest status, if the latest version works, we can upgrade too.
/cc @Ark-kun can you provide context about that issue?

@xinbinhuang
Copy link
Author

xinbinhuang commented Oct 30, 2020

Hey, @NikeNano @Bobgy . PTAL

My deployment is actually running v2.11.1 but I expect that there is no big difference. If there are any weird issues coming up, I can also upgrade my deployment to test it out.

/assign @Bobgy

@xinbinhuang xinbinhuang changed the title feat(backend): Upgrade Argo server version feat(backend): Upgrade Argo server image version Oct 30, 2020
@xinbinhuang
Copy link
Author

BTW, I was following the instructions from here: https://github.com/kubeflow/pipelines/blob/master/third_party/README.md

I skipped step 6.

@juliusvonkohout
Copy link
Member

I am also using 2.11.6 on my cluster. You should remove the executor image from the configmap. As far as I know it is deprecated and should only be in the workflow-controller deployment

@NikeNano
Copy link
Member

I realised that your changes are only for the argo server and not for Kubeflow it self, thus this PR will not completely solve: #4553 as I understand it. I am also not sure we we should update the argo server on its own since this will result in that Kubeflow pipelines use 2.7 and the argo server uses 2.11 of argo.

Copy link
Contributor

@Bobgy Bobgy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd also need to update licenses, I created an issue to track updating documentation for these instructions: #4715.

Please keep your eyes on it.

@@ -27,7 +27,8 @@ spec:
- --configmap
- workflow-controller-configmap
- --executor-image
- gcr.io/ml-pipeline/argoexec:v2.7.5-license-compliance
- gcr.io/ml-pipeline/argoexec:v2.11.6-license-compliance
- --namespaced
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a corresponding documentation? Does omitting the argument make it cluster mode? We need to make sure it is customizable from an overlay to continue support multi-user mode.

Comment on lines +25 to 26
#RUN ARGO_VERSION=v2.11.6 && curl -sSL -o /usr/local/bin/argo \
RUN ARGO_VERSION=v2.4.3 && curl -sSL -o /usr/local/bin/argo \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a bug from argo that prevented using the argo cli from a container, that's why we were stuck on 2.4.3. I'm not sure about latest status, if the latest version works, we can upgrade too.
/cc @Ark-kun can you provide context about that issue?

@Bobgy Bobgy changed the title feat(backend): Upgrade Argo server image version feat(backend): Upgrade Argo server image version to v2.11.6 Nov 4, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: xinbinhuang
To complete the pull request process, please assign bobgy after the PR has been reviewed.
You can assign the PR to them by writing /assign @bobgy in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@xinbinhuang
Copy link
Author

I realised that your changes are only for the argo server and not for Kubeflow it self, thus this PR will not completely solve: #4553 as I understand it. I am also not sure we we should update the argo server on its own since this will result in that Kubeflow pipelines use 2.7 and the argo server uses 2.11 of argo.

You are correct - this PR only upgrades argo server only. I can also bake in the changes for upgrading Kubeflow, but I wonder if we should separate it into a different PR? or should they go into the same one? @Bobgy @NikeNano

@Bobgy
Copy link
Contributor

Bobgy commented Nov 11, 2020

@xinbinhuang I think we can go with separate, previously we've upgraded both at different pace because they had different complexity.

@Bobgy
Copy link
Contributor

Bobgy commented Nov 11, 2020

/ok-to-test

@xinbinhuang
Copy link
Author

/test kubeflow-pipeline-sample-test

@xinbinhuang
Copy link
Author

/retest

@k8s-ci-robot
Copy link
Contributor

@xinbinhuang: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
kubeflow-pipeline-upgrade-test 44e41e6 link /test kubeflow-pipeline-upgrade-test
kubeflow-pipeline-e2e-test 44e41e6 link /test kubeflow-pipeline-e2e-test
kubeflow-pipeline-sample-test 44e41e6 link /test kubeflow-pipeline-sample-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@Bobgy
Copy link
Contributor

Bobgy commented Nov 12, 2020

Hi @xinbinhuang, you'll need to update https://github.com/kubeflow/pipelines/tree/master/third_party/argo with instructions that should be documented in #4715. I'd suggest creating a separate PR for it. I'll publish images like gcr.io/ml-pipeline/workflow-controller:v2.11.6-license-compliance once your PR is merged.

@xinbinhuang

This comment has been minimized.

@xinbinhuang
Copy link
Author

xinbinhuang commented Jan 16, 2021

@NikeNano @Bobgy @capri-xiyue I noticed some conversation on a more robust upgrading strategy to Argo3 #4999. I didn't quite have much time to finish this PR in the last year ending. I wonder if I should just close this PR so that whoever are upgrading it can start from scratch?

@alexec
Copy link

alexec commented Jan 16, 2021

You might want to go to v2.12 which is LTS.

@Bobgy
Copy link
Contributor

Bobgy commented Jan 20, 2021

@alexec will we have the kubernetes client update backported to argo 2.12?

It was one of the big reasons we wanted to upgrade.

@Bobgy
Copy link
Contributor

Bobgy commented Jan 20, 2021

To @xinbinhuang, agree we can close this one if you do not have bandwidth to finish it.

/close

@k8s-ci-robot
Copy link
Contributor

@Bobgy: Closed this PR.

In response to this:

To @xinbinhuang, agree we can close this one if you do not have bandwidth to finish it.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants