-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Testing] Use google/cloud-sdk:279.0.0 to resolve workload identity flakiness #3019
Conversation
1.15.8 alone doesn't fix the issue, will need to try also use new google/cloud-sdk version |
As mentioned in b/148920399, we also need to update clients to latest. |
/retest |
/hold |
1.15.8 doesn't seem compatible with kfp, it triggers some stable integration test failures. |
xgboost timed out, we should increase time limit |
/test kubeflow-pipeline-e2e-test |
/test kubeflow-pipeline-e2e-test |
Cleaned up unrelated changes |
Thanks @Bobgy ! |
/test kubeflow-pipeline-e2e-test |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: numerology The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Took a look and it seems like Travis is complaining about the version of six. Perhaps we need to pin it in .travis.yaml to get it fixed. Specifically, test on py35 passed because it got six==1.12.0 while other failed with six==1.11.0 |
New changes are detected. LGTM label has been removed. |
@numerology Thank you! Let me try that |
FYI the fix is pending here #3035 |
2025bdb
to
b6fa509
Compare
@numerology separated travis test failure to a separate PR: #3039 UPDATE: ohh, thanks I didn't know there's an existing PR to fix it. I will close mine. |
/hold cancel |
…lakiness (kubeflow#3019) * [Testing] Use gke 1.15.8 to mitigate workload identity flakiness * Upgrade gcloud version * Update image builder image too * Turn on workload identity * Update deploy-cluster.sh * secret sample uses python3 instead * Increase xgboost time limit * Revert files with bad format * Update component and pipelines to use gcloud 279.0.0 * Fix secret sample using python3 * Upgrade frontend integration test image * Rebuild frontend integration test image
Same rationale as #3018,
but this PR tries if the new version is stable
Discussion about which versions are good: b/146669263
It also mentions client version is also related, so I will try upgrading google/cloud-sdk version soon
UPDATE 2020.2.10: client version is the most important, after updating to google/cloud-sdk/279.0.0 (latest one), flakiness disappears.
Requirements to fix GKE workload identity intermittent timeouts:
This change isdata:image/s3,"s3://crabby-images/d0bb7/d0bb7f7625ca5bf5c3cf7a2b7a514cf841ab8395" alt="Reviewable"