-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for entire kubeflow pipelines as trial target (in addition to containers) #1914
Comments
Just digged in a little more - actually IMHO we need a new TrialTemplate - as Argo and Tekton are already supported I'll go for one of those initially... |
Hi @romeokienzler , I'm also very interested in this feature and it's surprising actually that it wasn't provided already as it sounds natural. I have a question though, I see there is an example of integration argo workflows with katib here kubeflow pipelines are implemented via argo workflows aren't they? If so why can we do this already? I gave it a try and I'm constantly getting some new errors which makes me think they are not compatible. |
I think I have kubeflow pipelines with Katib - almost - working: I followed the Argo/Katib installation instruction (https://github.com/kubeflow/katib/tree/master/examples/v1beta1/argo). Then I for now manually generated a manifest based on the example provided there, using a Kubeflow Adaptions
My example file is attached. To reproduce in your own env, also adapt the I found that currently the reason this does not work is, that when the pod with the metrics collector sidecar is setup, the run command is re-written to be a single command. In the Argo example, Katib re-writes:
into
Now as for pipelines this commands are much more complex, I believe this re-writing breaks them.
Removing the Anyways, I think if this re-writing of the command for metrics collection would be fixed, combining pipelines with Katib should work. Run with:
Example pipeline: katib-pipeline.txt Sorry for making this hasty/sloppy report. I am working on another deadline but thought reporting this even in this state may be interesting. |
Hey all, Now managed to get the Workflows running: Two issues I run into:
A working example can be found here: What I found nice is that caching works as one would hope to expect: so only the steps affected by the parameter tuning and downstream will be re-run. Eg initial data loading/parsing/... will be only run once. For me the next steps will be:
|
Hi there,
I am currently re-writing this to a simple MNIST example. Something I was wondering is, if it wouldnt make sense, that to have a metrics collector that works with Kubeflow |
I started here a separate repo where I am developing the mnist example: The general approach seems to work and the pipeline produces a yaml that can be submitted to Katib. Unfortunately I need to adapt things for v2 kubeflow pipeline syntax, thus this is with v1. |
@votti This is great start. Thank you for taking this. We can add this MNIST example to https://github.com/kubeflow/katib/tree/master/examples/v1beta1/kubeflow-pipelines Few items to be completed before merge
|
Thank you for this great examples @votti! |
Hi @andreyvelich I will try to find time, but currently am a bit stopped as I lost access to my previous Kubeflow environment and am now moving to a local installation. Lets see. |
Sure, no rush @votti. We can contribute this example even after the release. |
Currently I am struggling to get a setup where KFP V2 is properly working to build compatibility with the new While above point is really unclear to me and hard to explore due to my lacking working KFP V2 setup, I am now moving on to polish the solution for |
@votti Maybe @zijianjoy, @chensun, @connor-mccarthy @Linchin has some insights on KFP v2 setup? |
This uses the new custom KFP V1 metrics collector that can directly extract metrics from Kubeflow Pipeline metrics. With this collector, to measure metrics of a Kubeflow pipeline only requires to a) add the label which step is the `model-training` b) diseable caching for this step c) configure the katib metrics collector. Also all the information is added now, such that the Katib pipeline can be run via the KatibClient. Addresses: - kubeflow/katib#1914 - kubeflow/katib#2019 k
Hi everyone, I have now a working implementation based on a custom metrics collector: Thus the following points are addressed:
Currently the v2 kfp support is a bit of a moving target for me, as I do not manage to have a fully working installation of KFP=2.0.0a/b. I think as a first step supporting V1 properly should already be a good progress form the current situation. Next I will integrate my example into the Given how sparse the documentation/examples of a |
This example illustrates how a full kfp pipeline can be tuned using Katib. It is based on a metrics collector to collect kubeflow pipeline metrics (kubeflow#2019). This is used as a Custom Collector. Addresses: kubeflow#1914, kubeflow#2019
This example illustrates how a full kfp pipeline can be tuned using Katib. It is based on a metrics collector to collect kubeflow pipeline metrics (kubeflow#2019). This is used as a Custom Collector. Addresses: kubeflow#1914, kubeflow#2019
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/remove-lifecycle stale |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/lifecycle frozen |
/kind feature
Describe the solution you'd like
Hyper parameters not only affect the training step but also upstream pipeline components like feature transformation for example (e.g. parameters of a normalization transformation). In addition, transformation and training steps should be able to make use of kfp's parallel components (e.g. SparkJob, TFJob, ...). It would be helpful to not only allow to specify containers as trial targets but also complete kubeflow pipelines. As the latter also expose parameters they can be either set directly (non-hyperparameters) or added to the hyper parameter space.
Anything else you would like to add:
I've started to create simple container image which can be used as trial target which acts as a proxy and downstream triggers parameterized Kubeflow pipeline executions with the respective hyper parameters. A Kubernetes Custom Resource can be created as well down the line.
Love this feature? Give it a 👍 We prioritize the features with the most 👍
The text was updated successfully, but these errors were encountered: