Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-push Kubeflow deployments corresponding to different Kubeflow versions. #95

Closed
jlewi opened this issue Mar 31, 2018 · 9 comments
Closed
Assignees
Labels

Comments

@jlewi
Copy link
Contributor

jlewi commented Mar 31, 2018

It would be nice if we could auto-push dev.kubeflow.org or perhaps create another environment nightly-dev.kubeflow.org.

There are certain changes like UI that are difficult to test manually. It would be useful if we had an up todate environment.

I think we could easily adapt our E2E tests to do this and then just run it as a cron job.

@jlewi
Copy link
Contributor Author

jlewi commented Apr 4, 2018

It might be interesting to look at using Weave Flux
https://www.weave.works/blog/gitops-operations-by-pull-request

Weave Flux supports keeping K8s infrastructure in sync with config defined in a git repository.

Not sure if ksonnet is supported yet.

@jlewi
Copy link
Contributor Author

jlewi commented May 16, 2018

Here's an idea for how to make progress

  1. We could use cron job to regularly run recreate_app.sh
  2. Create a PR using hub CLI for GitHub
  3. Manually approve the PR

Either use argo-cd or a cron job to call redeploy_app.sh to keep the deployed instance in sync with what's checked in.

@jlewi
Copy link
Contributor Author

jlewi commented Jun 29, 2018

kubeflow/kubeflow#1100

Added a script for upgrading the ksonnet application.

@jlewi jlewi added priority/p1 area/testing help wanted Extra attention is needed labels Dec 28, 2018
@jlewi jlewi changed the title Auto-push dev.kubeflow.org Auto-push Kubeflow deployments corresponding to different Kubeflow versions. Dec 28, 2018
@jlewi
Copy link
Contributor Author

jlewi commented Dec 28, 2018

We'd like to maintain a pool of Kubeflow deployments corresponding to different Kubeflow versions.

e.g

kf-vX.Y-n00
...
kf-vX.Y-n05

The reason for having a pool of deployments for a given major release (X.Y) is that we want to run tests against these clusters. We don't want to interrupt the tests when deploying an updated version. So by maintaining a pool and cycling through them we can create a new version for new tests while letting already running tests run to completion.

We will want to recycle the names because we want to have a fixed set of endpoints e.g

https://kf-v0-4-n00.endpoints.kubeflow-ci.cloud.goog/

The reason for having a fixed set of endpoints is that we have to manually set the redirect URIs on the OAuth credentials used for IAP. So by recycling the endpoints we don't have to manually update the OAuth credential with new redirect URIs.

#269 created a python script to deploy Kubeflow.

The next steps would be

  • Update the script in A script to automate creation of Kubeflow clusters for testing. #269 to add logic to recycle the deployments
    * Check if there is an unused name kf-vX.Y-n00,...kf-vX.Y-n05 and if there is use it
    * Otherwise delete the oldest deployment and create a new one using its name.

  • Create a cron job to run the above script and redeploy using the latest commit for each release

  • Create a simple flask app to provide information about all the different deployments
    * E.g. so we can see when each endpoint was last updated

  • Recycle SSL certificates for each endpoint

@jlewi
Copy link
Contributor Author

jlewi commented Jan 3, 2019

Boskos might be useful for managing clusters.

@gabrielwen
Copy link
Contributor

/assign @gabrielwen

@gabrielwen
Copy link
Contributor

Remaining tasks for the cronjob itself:

  • Add github repos (kubeflow/testing) version snapshot.
  • Dockerfile dependencies clean up.
  • Deployed clusters labeling (used to identify the time deployment is made)
  • Add service account role bindings garbage collection to workflow.
  • Add a workflow to create git PR.

@jlewi
Copy link
Contributor Author

jlewi commented Feb 27, 2019

Some follow on issues:
#314 - Use self signed certificates to avoid lets-encrypt quota issues
#273 - E2E tests need a way to get the cluster to use
#316 - Web app to auto redirect to latest cluster

I think we can close this issue once the auto-deployed endpoints are working. I think #314 is the only blocking issue for that.

@jlewi
Copy link
Contributor Author

jlewi commented Mar 10, 2019

This is working for master.

For 0.4 it looks like there was a typo in the cron job that should be fixed by #327.

#315 is tracking fixing this for 0.4. So I'm going to mark this as fixed.

@jlewi jlewi closed this as completed Mar 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants