-
Notifications
You must be signed in to change notification settings - Fork 455
/
Copy pathargo-workflow.yaml
83 lines (81 loc) · 2.76 KB
/
argo-workflow.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
# This example shows how you can use Argo Workflows in Katib, transfer parameters from one Step to another and run HP job.
# It uses a simple random algorithm and tunes only learning rate.
# Workflow contains 2 Steps, first is data-preprocessing second is model-training.
# First Step shows how you can prepare your training data (here: simply divide number of training examples) before running HP job.
# Number of training examples is transferred to the second Step.
# Second Step is the actual training which metrics collector sidecar is injected.
# Note that for this example Argo Container Runtime Executor must be "emissary".
apiVersion: kubeflow.org/v1beta1
kind: Experiment
metadata:
namespace: argo
name: katib-argo-workflow
spec:
objective:
type: minimize
goal: 0.001
objectiveMetricName: loss
algorithm:
algorithmName: random
parallelTrialCount: 2
maxTrialCount: 5
maxFailedTrialCount: 1
parameters:
- name: lr
parameterType: double
feasibleSpace:
min: "0.01"
max: "0.03"
trialTemplate:
retain: true
primaryPodLabels:
katib.kubeflow.org/model-training: "true"
primaryContainerName: main
successCondition: status.[@this].#(phase=="Succeeded")#
failureCondition: status.[@this].#(phase=="Failed")#
trialParameters:
- name: learningRate
description: Learning rate for the training model
reference: lr
trialSpec:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
spec:
serviceAccountName: argo
entrypoint: hp-workflow
templates:
- name: hp-workflow
steps:
- - name: data-preprocessing
template: gen-epochs
- - name: model-training
template: model-training
arguments:
parameters:
- name: epochs
value: "{{steps.data-preprocessing.outputs.result}}"
- name: gen-epochs
script:
image: python:alpine3.6
command:
- python
source: |
import random
print(60000//random.randint(3000, 30000))
- name: model-training
metadata:
labels:
katib.kubeflow.org/model-training: "true"
inputs:
parameters:
- name: epochs
container:
name: model-training
image: docker.io/kubeflowkatib/pytorch-mnist-cpu:latest
command:
- "python3"
- "/opt/pytorch-mnist/mnist.py"
- "--lr=${trialParameters.learningRate}"
- "--epochs={{inputs.parameters.epochs}}"
- "--batch-size=16"