Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodically run IAP endpoint ready test against auto deployments #51

Closed
jlewi opened this issue Jun 16, 2020 · 15 comments
Closed

Periodically run IAP endpoint ready test against auto deployments #51

jlewi opened this issue Jun 16, 2020 · 15 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Jun 16, 2020

Follow on to #42

We should setup a periodic test that runs the IAP endpoint is ready test against the auto deployments.

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
platform/gcp 0.94
kind/feature 0.93

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@jlewi
Copy link
Contributor Author

jlewi commented Jun 16, 2020

Needed to grant workload identity the serviceAccountTokenCreator

gcloud iam service-accounts add-iam-policy-binding --project=kubeflow-ci  --role roles/iam.serviceAccountTokenCreator   --member "serviceAccount:kubeflow-ci.svc.id.goog[kf-ci/kf-ci]"   [email protected]

@jlewi
Copy link
Contributor Author

jlewi commented Jun 16, 2020

The endpoit_ready_test.py is still failing when using workload identity with error

google.auth.exceptions.TransportError: Error calling the IAM signBytes API: b'{\n  "error": {\n    "code": 403,\n    "message": "Permission iam.serviceAccounts.signBlob is required to perform this operation on service account projects/-/serviceAccounts/[email protected].",\n    "status": "PERMISSION_DENIED"\n  }\n}\n'

jlewi pushed a commit to jlewi/kfctl that referenced this issue Jun 17, 2020
Fix code to work with IAP using workload identity.

* The existing code to get an ID token didn't seem to work with workload
  identity and didn't match the latest code on the IAP. The latest
  code appears to use a helper function to get the id token

Related to: GoogleCloudPlatform/kubeflow-distribution#51 Create a tekton test for blueprints
to verify the endpoint is ready.
jlewi pushed a commit to jlewi/kfctl that referenced this issue Jun 17, 2020
Fix code to work with IAP using workload identity.

* The existing code to get an ID token didn't seem to work with workload
  identity and didn't match the latest code on the IAP. The latest
  code appears to use a helper function to get the id token

Related to: GoogleCloudPlatform/kubeflow-distribution#51 Create a tekton test for blueprints
to verify the endpoint is ready.
k8s-ci-robot pushed a commit to kubeflow/kfctl that referenced this issue Jun 17, 2020
Fix code to work with IAP using workload identity.

* The existing code to get an ID token didn't seem to work with workload
  identity and didn't match the latest code on the IAP. The latest
  code appears to use a helper function to get the id token

Related to: GoogleCloudPlatform/kubeflow-distribution#51 Create a tekton test for blueprints
to verify the endpoint is ready.
jlewi pushed a commit to jlewi/gcp-blueprints that referenced this issue Jun 17, 2020
* GoogleCloudPlatform#51: We want to periodically run a test to verify that IAP is
  working on the auto-deployed clusters.

* Document the testing.
k8s-ci-robot pushed a commit that referenced this issue Jun 18, 2020
…ers (#54)

* Periodically run test to verify IAP is working on auto-deployed clusters

* #51: We want to periodically run a test to verify that IAP is
  working on the auto-deployed clusters.

* Document the testing.

* Fix urls.

* Remove testing-repo; its now baked into the container.
@jlewi
Copy link
Contributor Author

jlewi commented Jun 30, 2020

Latest test still failed
https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/kubeflow-gcp-blueprints-master-periodic/1277776211300323329
https://kf-ci-v1.endpoints.kubeflow-ci.cloud.goog/tekton/#/namespaces/kf-ci/pipelineruns/gcp-kf-ready-6vlzk

The task executed command:

args:
  - '-m'
  - kubeflow.testing.get_kf_testing_cluster
  - '--base=$(inputs.params.testing-cluster-pattern)'
  - '--location=$(inputs.params.testing-cluster-location)'
  - get-credentials

It looks like its not running the new version
https://github.com/kubeflow/testing/blob/master/acm-repos/kf-ci-v1/namespaces/kf-ci/tekton.dev_v1alpha1_task_iap-ready.yaml#L40

@jlewi
Copy link
Contributor Author

jlewi commented Jun 30, 2020

Config management isn't syncing master.

   git:
      policyDir: /acm-repos/kf-ci-v1
      proxy: {}
      secretType: none
      syncBranch: gcp_blueprints
      syncRepo: https://github.com/jlewi/testing.git

jlewi pushed a commit to jlewi/testing that referenced this issue Jun 30, 2020
jlewi pushed a commit to jlewi/testing that referenced this issue Jun 30, 2020
k8s-ci-robot pushed a commit to kubeflow/testing that referenced this issue Jun 30, 2020
* ACM should point at kubeflowt/testing/master

related to GoogleCloudPlatform/kubeflow-distribution#51

* Delete old config for auto-deploy.yaml

* Fix the makefile
@jlewi
Copy link
Contributor Author

jlewi commented Jun 30, 2020

Nomos sync is still failing

nomos --contexts=kf-ci-v1 status
Connecting to clusters...
Current   Context    Status           Last Synced Token   Sync Branch
-------   -------    ------           -----------------   -----------
          kf-ci-v1   ERROR            ccefc51a            master   


Config Management Errors:
kf-ci-v1   KNV2010: unable to update resource: could not patch: Internal error occurred: failed calling webhook "pilot.validation.istio.io": Post https://istio-galley.istio-system.svc:443/admitpilot?timeout=30s: no endpoints available for service "istio-galley"

source: namespaces/auto-deploy/networking.istio.io_v1alpha3_virtualservice_auto-deploy-server.yaml
namespace: auto-deploy
metadata.name: auto-deploy-server
group: networking.istio.io
version: v1alpha3
kind: VirtualService

For more information, see https://cloud.google.com/anthos-config-management/docs/reference/errors#knv2010 

Looks like its: kubeflow/testing#708

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
area/engprod 0.64

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@jlewi
Copy link
Contributor Author

jlewi commented Jun 30, 2020

It should be sync'd now

nomos --contexts=kf-ci-v1 status
Connecting to clusters...
Current   Context    Status           Last Synced Token   Sync Branch
-------   -------    ------           -----------------   -----------
          kf-ci-v1   SYNCED           28a61500            master 

@jlewi
Copy link
Contributor Author

jlewi commented Jul 1, 2020

Latest run:
https://kf-ci-v1.endpoints.kubeflow-ci.cloud.goog/tekton/#/namespaces/kf-ci/pipelineruns/gcp-kf-ready-mzpcs

Get credential failed for the iap-ready test.

INFO|2020-07-01T01:31:51|/usr/local/lib/python3.8/dist-packages/oauth2client/transport.py|157| Attempting refresh to obtain initial access_token
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 192, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/srcCache/kubeflow/testing/py/kubeflow/testing/get_kf_testing_cluster.py", line 429, in <module>
    fire.Fire(CredentialHelper)
  File "/usr/local/lib/python3.8/dist-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.8/dist-packages/fire/core.py", line 463, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.8/dist-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/srcCache/kubeflow/testing/py/kubeflow/testing/get_kf_testing_cluster.py", line 384, in get_credentials
    raise ValueError(message)
ValueError: No clusters found matching: project: kubeflow-ci-deployment, location: us-central1-c, pattern: kf-master-(?!n\d\d)

It used docker image: gcr.io/kubeflow-ci/test-worker-py3@sha256:804c6cc8a73face69d79c60bcd85b93fa04218db9ee31127fb529b41fcab43ac

The kf ready test which worked used it correctly as well
gcr.io/kubeflow-ci/test-worker-py3@sha256:804c6cc8a73face69d79c60bcd85b93fa04218db9ee31127fb529b41fcab43ac

nomos --contexts=kf-ci-v1 status
Connecting to clusters...
Current   Context    Status           Last Synced Token   Sync Branch
-------   -------    ------           -----------------   -----------
          kf-ci-v1   SYNCED           3fbde6d3            master   

@jlewi
Copy link
Contributor Author

jlewi commented Jul 1, 2020

@jlewi
Copy link
Contributor Author

jlewi commented Jul 1, 2020

@jlewi
Copy link
Contributor Author

jlewi commented Jul 1, 2020

The flag name is wrong it should be pattern not location.

jlewi pushed a commit to jlewi/testing that referenced this issue Jul 1, 2020
* Related GoogleCloudPlatform/kubeflow-distribution#51 get-credentials isn't finding any clusters
  because when using Fire the parameter should be --pattern not --location

* Related GoogleCloudPlatform/kubeflow-distribution#65 When copying the bucket output
  in the notebook tests the parameter should be params.notebook-output
  not params.output
@jlewi
Copy link
Contributor Author

jlewi commented Jul 1, 2020

kubeflow/testing#712 seems to fix the get-credentials issue. Now the test is timing out with.

INFO|02:09:16|/workspace/kfctl-repo/py/kubeflow/kfctl/testing/util/gcp_util.py|93| https://kf-vbp-0701-ab4.endpoints.kubeflow-ci-deployment.cloud.goog: Endpoint not ready, exception caught 'str' object has no attribute 'text', request number: 3
INFO|02:09:26|/workspace/kfctl-repo/py/kubeflow/kfctl/testing/util/gcp_util.py|81| Trying url: https://kf-vbp-0701-ab4.endpoints.kubeflow-ci-deployment.cloud.goog

@jlewi
Copy link
Contributor Author

jlewi commented Jul 1, 2020

k8s-ci-robot pushed a commit to kubeflow/testing that referenced this issue Jul 1, 2020
* Related GoogleCloudPlatform/kubeflow-distribution#51 get-credentials isn't finding any clusters
  because when using Fire the parameter should be --pattern not --location

* Related GoogleCloudPlatform/kubeflow-distribution#65 When copying the bucket output
  in the notebook tests the parameter should be params.notebook-output
  not params.output
jlewi pushed a commit to jlewi/kfctl that referenced this issue Jul 1, 2020
* kubeflow#355 recently changed the logic for making IAP requests to use newer libraries

* In the new code make_iap_request returns a string and not a response object
  so we need to update the calling code otherwise we get problems.

* Related to GoogleCloudPlatform/kubeflow-distribution#51
k8s-ci-robot pushed a commit to kubeflow/kfctl that referenced this issue Jul 1, 2020
* #355 recently changed the logic for making IAP requests to use newer libraries

* In the new code make_iap_request returns a string and not a response object
  so we need to update the calling code otherwise we get problems.

* Related to GoogleCloudPlatform/kubeflow-distribution#51
@jlewi jlewi closed this as completed Jul 1, 2020
vpavlin pushed a commit to vpavlin/kfctl that referenced this issue Jul 10, 2020
Fix code to work with IAP using workload identity.

* The existing code to get an ID token didn't seem to work with workload
  identity and didn't match the latest code on the IAP. The latest
  code appears to use a helper function to get the id token

Related to: GoogleCloudPlatform/kubeflow-distribution#51 Create a tekton test for blueprints
to verify the endpoint is ready.
vpavlin pushed a commit to vpavlin/kfctl that referenced this issue Jul 10, 2020
* kubeflow#355 recently changed the logic for making IAP requests to use newer libraries

* In the new code make_iap_request returns a string and not a response object
  so we need to update the calling code otherwise we get problems.

* Related to GoogleCloudPlatform/kubeflow-distribution#51
vpavlin pushed a commit to vpavlin/kfctl that referenced this issue Jul 20, 2020
Fix code to work with IAP using workload identity.

* The existing code to get an ID token didn't seem to work with workload
  identity and didn't match the latest code on the IAP. The latest
  code appears to use a helper function to get the id token

Related to: GoogleCloudPlatform/kubeflow-distribution#51 Create a tekton test for blueprints
to verify the endpoint is ready.
vpavlin pushed a commit to vpavlin/kfctl that referenced this issue Jul 20, 2020
* kubeflow#355 recently changed the logic for making IAP requests to use newer libraries

* In the new code make_iap_request returns a string and not a response object
  so we need to update the calling code otherwise we get problems.

* Related to GoogleCloudPlatform/kubeflow-distribution#51
vpavlin pushed a commit to vpavlin/kfctl that referenced this issue Jul 22, 2020
Fix code to work with IAP using workload identity.

* The existing code to get an ID token didn't seem to work with workload
  identity and didn't match the latest code on the IAP. The latest
  code appears to use a helper function to get the id token

Related to: GoogleCloudPlatform/kubeflow-distribution#51 Create a tekton test for blueprints
to verify the endpoint is ready.
vpavlin pushed a commit to vpavlin/kfctl that referenced this issue Jul 22, 2020
* kubeflow#355 recently changed the logic for making IAP requests to use newer libraries

* In the new code make_iap_request returns a string and not a response object
  so we need to update the calling code otherwise we get problems.

* Related to GoogleCloudPlatform/kubeflow-distribution#51
vpavlin pushed a commit to vpavlin/kfctl that referenced this issue Jul 22, 2020
Fix code to work with IAP using workload identity.

* The existing code to get an ID token didn't seem to work with workload
  identity and didn't match the latest code on the IAP. The latest
  code appears to use a helper function to get the id token

Related to: GoogleCloudPlatform/kubeflow-distribution#51 Create a tekton test for blueprints
to verify the endpoint is ready.
vpavlin pushed a commit to vpavlin/kfctl that referenced this issue Jul 22, 2020
* kubeflow#355 recently changed the logic for making IAP requests to use newer libraries

* In the new code make_iap_request returns a string and not a response object
  so we need to update the calling code otherwise we get problems.

* Related to GoogleCloudPlatform/kubeflow-distribution#51
crobby pushed a commit to crobby/kfctl that referenced this issue Feb 25, 2021
Fix code to work with IAP using workload identity.

* The existing code to get an ID token didn't seem to work with workload
  identity and didn't match the latest code on the IAP. The latest
  code appears to use a helper function to get the id token

Related to: GoogleCloudPlatform/kubeflow-distribution#51 Create a tekton test for blueprints
to verify the endpoint is ready.
crobby pushed a commit to crobby/kfctl that referenced this issue Feb 25, 2021
* kubeflow#355 recently changed the logic for making IAP requests to use newer libraries

* In the new code make_iap_request returns a string and not a response object
  so we need to update the calling code otherwise we get problems.

* Related to GoogleCloudPlatform/kubeflow-distribution#51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant