-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-2170: Deploy JobSet in kubeflow-system
namespace
#2388
KEP-2170: Deploy JobSet in kubeflow-system
namespace
#2388
Conversation
@andreyvelich: GitHub didn't allow me to request PR reviews from the following users: seanlaii, saileshd1402, astefanutti, kubeflow/release-team, doris-xm, vsoch, kannon92, kubeflow/wg-manifests-leads. Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/hold for kubernetes-sigs/jobset#751 |
- ../../base/manager | ||
- ../../base/rbac | ||
- ../../base/webhook | ||
- https://github.com/kubernetes-sigs/jobset/releases/download/v0.7.2/manifests.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would that make sense to group the JobSet related resources into a dedicated directory / overlay?
It could make things more modular and accommodate environments where the JobSet API / controller is already installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we can make it similar to KFP and move it into third-party/jobset/kustomization.yaml
.
So the manifests will looks like this:
/v2/base/...
/v2/overlays/...
/v2/third-party/...
@kubeflow/wg-training-leads WDYT about it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich SGTM:)
1dd0225
to
9fe7112
Compare
Signed-off-by: Andrey Velichkevich <[email protected]>
Signed-off-by: Andrey Velichkevich <[email protected]>
Signed-off-by: Andrey Velichkevich <[email protected]>
Signed-off-by: Andrey Velichkevich <[email protected]>
9fe7112
to
6021349
Compare
Pull Request Test Coverage Report for Build 12977326722Details
💛 - Coveralls |
I updated JobSet to v0.7.3. |
@andreyvelich: GitHub didn't allow me to request PR reviews from the following users: kannon92, seanlaii. Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@@ -0,0 +1,3 @@ | |||
apiVersion: config.jobset.x-k8s.io/v1alpha1 | |||
kind: Configuration | |||
namespace: kubeflow-system |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this.
https://github.com/kubernetes-sigs/jobset/blob/release-7.1/api/config/v1alpha1/defaults.go#L44
If jobset is not configured with a namespace option, it will take the namespace from the serviceaccount.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good point.
Even without my PR, it will take namespace from the SA.
Signed-off-by: Andrey Velichkevich <[email protected]>
f81e911
to
ce42837
Compare
Signed-off-by: Andrey Velichkevich <[email protected]>
@@ -1,3 +1,2 @@ | |||
apiVersion: config.jobset.x-k8s.io/v1alpha1 | |||
kind: Configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just using the default configuration would be sufficient.
I guess a nil configuration should be fine but it looks a bit weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way this is just a nit.
everything LGTM to me now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I just keep this empty config, so in the future we can configure it.
I think, we can enable leader election in the future, if we want to have it by default.
/hold cancel |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/assign @tenzen-y @Electronic-Waste |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM. Thanks for this @andreyvelich! Just a question for you.
Also cc👀 @Doris-xm. Currently jobset team has cherry-picked this PR kubernetes-sigs/jobset#719 into their newest release branch, unblocking your work #2382.
patchesStrategicMerge: | ||
- patches/jobset_remove_namespace.yaml # Remove namespace from the JobSet release manifests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we also patch those namespace related configurations outside of the .metadata
field? Like: https://github.com/kubeflow/training-operator/pull/2382/files#r1914094979
It seems that we also need to patch those webhook configurations to ensure jobset
working correctly:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that even without this config, the kustomize correctly patches the JobSet's manifests with:
clientConfig:
service:
name: jobset-webhook-service
namespace: kubeflow-system
path: /validate--v1-pod
I think, it works without this configuration since the JobSet manifests includes this service which tells kustomize that namespace in clientConfig
needs to be patched.
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: webhook
app.kubernetes.io/created-by: jobset
app.kubernetes.io/instance: webhook-service
app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: service
app.kubernetes.io/part-of: jobset
name: jobset-webhook-service
namespace: jobset-system
spec:
ports:
- port: 443
protocol: TCP
targetPort: 9443
selector:
control-plane: controller-manager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM!
/hold for the question |
/lgtm |
Part of: #2170
Depends on: kubernetes-sigs/jobset#751, we should update the JobSet version once this fix is released.
I updated the Kubeflow Training V2 manifests to deploy JobSet in the
kubeflow-system
namespace. Also, given that the Training Runtimes need to be installed after the manager is deployed and is running, I think we should split out overlays between these two:manager
runtimes
/cc @kannon92 @kubeflow/wg-training-leads @kubeflow/wg-manifests-leads @kubeflow/release-team @Electronic-Waste @Doris-xm @astefanutti @seanlaii @deepanker13 @saileshd1402 @vsoch