-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use local copy of RunPolicy by MPI-operator #513
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,6 +37,34 @@ type MPIJobList struct { | |
Items []MPIJob `json:"items"` | ||
} | ||
|
||
// RunPolicy encapsulates various runtime policies of the distributed training | ||
// job, for example how to clean up resources and how long the job can stay | ||
// active. | ||
type RunPolicy struct { | ||
// CleanPodPolicy defines the policy to kill pods after the job completes. | ||
// Default to Running. | ||
CleanPodPolicy *common.CleanPodPolicy `json:"cleanPodPolicy,omitempty"` | ||
|
||
// TTLSecondsAfterFinished is the TTL to clean up jobs. | ||
// It may take extra ReconcilePeriod seconds for the cleanup, since | ||
// reconcile gets called periodically. | ||
// Default to infinite. | ||
TTLSecondsAfterFinished *int32 `json:"ttlSecondsAfterFinished,omitempty"` | ||
|
||
// Specifies the duration in seconds relative to the startTime that the job may be active | ||
// before the system tries to terminate it; value must be positive integer. | ||
// +optional | ||
ActiveDeadlineSeconds *int64 `json:"activeDeadlineSeconds,omitempty"` | ||
|
||
// Optional number of retries before marking this job failed. | ||
// +optional | ||
BackoffLimit *int32 `json:"backoffLimit,omitempty"` | ||
|
||
// SchedulingPolicy defines the policy related to scheduling, e.g. gang-scheduling | ||
// +optional | ||
SchedulingPolicy *common.SchedulingPolicy `json:"schedulingPolicy,omitempty"` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you copy the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As above, about to copy too. |
||
} | ||
|
||
type MPIJobSpec struct { | ||
|
||
// Specifies the number of slots per worker used in hostfile. | ||
|
@@ -46,7 +74,7 @@ type MPIJobSpec struct { | |
SlotsPerWorker *int32 `json:"slotsPerWorker,omitempty"` | ||
|
||
// RunPolicy encapsulates various runtime policies of the job. | ||
RunPolicy common.RunPolicy `json:"runPolicy,omitempty"` | ||
RunPolicy RunPolicy `json:"runPolicy,omitempty"` | ||
|
||
// MPIReplicaSpecs contains maps from `MPIReplicaType` to `ReplicaSpec` that | ||
// specify the MPI replicas to run. | ||
|
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you copy the
CleanPodPolicy
from kubeflow/common to this repo?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it wasn't clear to me from the so far discussions so did it with minimal changes required to place the suspend field, but let me update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to copy all types, and members to this repo since moving only part of those might make double management.
cc: @alculquicondor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied, PTAL at the last commit.