-
Notifications
You must be signed in to change notification settings - Fork 458
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Support JSON format logs in
file-metrics-collector
(#1765)
* support JSON format logs in file-metrics-collector * review: convert fileFormat to type FileSystemFileFormat * Update cmd/metricscollector/v1beta1/file-metricscollector/main.go Co-authored-by: Andrey Velichkevich <[email protected]> * review: remove func (f FileSystemFileFormat) String() * review: get metricRegList only when the format is TEXT * review: change var name in a script for e2e * review: explict specify the cloudml-hypyertune in the Dockerfile * review: use reflect.DeepEqual instead of go-cmp.Diff * review: stop using 'JSON' directly in error statements * review: install specific version cloudml-hypertune * review: get objType in the updateStopRules function * review: save optimalObjValue across multiple stopRules * review: add warning messages to parseTimestamp func * review: generate test files with go test command * review: change api for new feature Co-authored-by: Andrey Velichkevich <[email protected]>
- Loading branch information
1 parent
36d0a57
commit d443ed3
Showing
22 changed files
with
824 additions
and
123 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
72 changes: 72 additions & 0 deletions
72
examples/v1beta1/early-stopping/median-stop-with-json-format.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
# This is example with median stopping early stopping rule with logs in JSON format. | ||
# It has bad feasible space for learning rate to show more early stopped Trials. | ||
apiVersion: kubeflow.org/v1beta1 | ||
kind: Experiment | ||
metadata: | ||
namespace: kubeflow | ||
name: median-stop-with-json-format | ||
spec: | ||
objective: | ||
type: maximize | ||
goal: 0.99 | ||
objectiveMetricName: accuracy | ||
additionalMetricNames: | ||
- loss | ||
metricsCollectorSpec: | ||
source: | ||
fileSystemPath: | ||
path: "/katib/mnist.json" | ||
kind: File | ||
format: JSON | ||
collector: | ||
kind: File | ||
algorithm: | ||
algorithmName: random | ||
earlyStopping: | ||
algorithmName: medianstop | ||
algorithmSettings: | ||
- name: min_trials_required | ||
value: "1" | ||
- name: start_step | ||
value: "2" | ||
parallelTrialCount: 2 | ||
maxTrialCount: 15 | ||
maxFailedTrialCount: 3 | ||
parameters: | ||
- name: lr | ||
parameterType: double | ||
feasibleSpace: | ||
min: "0.01" | ||
max: "0.5" | ||
- name: num-epochs | ||
parameterType: int | ||
feasibleSpace: | ||
min: "3" | ||
max: "4" | ||
trialTemplate: | ||
retain: true | ||
primaryContainerName: training-container | ||
trialParameters: | ||
- name: learningRate | ||
description: Learning rate for the training model | ||
reference: lr | ||
- name: numberEpochs | ||
description: Number of epochs to train the model | ||
reference: num-epochs | ||
trialSpec: | ||
apiVersion: batch/v1 | ||
kind: Job | ||
spec: | ||
template: | ||
spec: | ||
containers: | ||
- name: training-container | ||
image: docker.io/kubeflowkatib/pytorch-mnist:latest | ||
command: | ||
- "python3" | ||
- "/opt/pytorch-mnist/mnist.py" | ||
- "--epochs=${trialParameters.numberEpochs}" | ||
- "--log-path=/katib/mnist.json" | ||
- "--lr=${trialParameters.learningRate}" | ||
- "--logger=hypertune" | ||
restartPolicy: Never |
Oops, something went wrong.