Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconcile Ray Workers when VolumeMounts change #945

Closed
5 changes: 5 additions & 0 deletions ray-operator/controllers/ray/utils/util.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import (
"time"
"unicode"

"k8s.io/apimachinery/pkg/api/equality"
"k8s.io/apimachinery/pkg/util/json"

"k8s.io/apimachinery/pkg/util/rand"
Expand Down Expand Up @@ -346,6 +347,10 @@ func PodNotMatchingTemplate(pod corev1.Pod, template corev1.PodTemplateSpec) boo
// resource entries do not match
return true
}
if !equality.Semantic.DeepDerivative(container1.VolumeMounts, container2.VolumeMounts) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use DeepDerivative rather than DeepEqual?

Copy link
Author

@peterghaddad peterghaddad Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeepEqual causes issues when default volumeMounts are mounted via the Kubernetes cluster. When using DeepEqual, I noticed pods would continuously reconcile.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"I also have the problem using reflect.DeepEqual or equality.Semantic.DeepEqual because some fields are set with default non-zero values by some controller (like ImagePullPolicy, RestartPolicy, and so on) if these fields are not set by the operator" - kubernetes-sigs/kubebuilder#592

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubernetes-sigs/kubebuilder#592 (comment)
kubernetes-sigs/kubebuilder#592 (comment)

DeepDerivative seems to be risky. Would you mind explaining the behavior of this function and why does it work in our case?

Copy link
Author

@peterghaddad peterghaddad Mar 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @kevin85421, I think those comments are actually what we want. Kubernetes generates empty or nil values, and using DeepEqual will cause problems since it will detect a drift when it shouldn't..

DeepDerivative is similar to DeepEqual except that unset fields in a1 are ignored (not compared). This allows us to focus on the fields that matter to the semantic comparison.

Hope this clears things up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeepDerivative is similar to DeepEqual except that unset fields in a1 are ignored (not compared). This allows us to focus on the fields that matter to the semantic comparison.

I knew this. My point is: are there any situations where the worker update could be triggered unintentionally? I have this concern because:

(1) DeepDeriative is much more complex than what we want to compare. (code)
(2) Which fields will be updated by Kubernetes APIServer? Will the changes trigger the worker update unintentionally?

Copy link
Author

@peterghaddad peterghaddad Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Kevin. I believe we have the same concern. Using DeepDeriative for comparing volumeMounts does not trigger unintentionally.

  1. Is is more complex, but it seems that the complexity comes for ensuring fields are properly compared based off data type.
  2. The common example is empty and kubeApiAccess volumeMounts are added via the APIServer causing workers to update unintentionally.

// volume mounts do not match
return true
}

resources1 := []corev1.ResourceList{
container1.Resources.Requests,
Expand Down
79 changes: 71 additions & 8 deletions ray-operator/controllers/ray/utils/util_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -211,14 +211,55 @@ func TestReconcile_CheckNeedRemoveOldPod(t *testing.T) {
Image: "rayproject/autoscaler",
Command: []string{"echo"},
Args: []string{"Hello Ray"},
Env: []corev1.EnvVar{
},
},
},
}

pod = corev1.Pod{
ObjectMeta: metav1.ObjectMeta{
Name: "pod1",
Namespace: namespaceStr,
},
Spec: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "ray-worker",
Image: "rayproject/autoscaler",
Command: []string{"echo"},
Args: []string{"Hello Ray"},
},
},
},
Status: corev1.PodStatus{
Phase: corev1.PodRunning,
},
}

assert.Equal(t, PodNotMatchingTemplate(pod, workerTemplate), false, "expect template & pod matching")

workerTemplate = corev1.PodTemplateSpec{
Spec: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "ray-worker",
Image: "rayproject/autoscaler",
Command: []string{"echo"},
Args: []string{"Hello Ray"},
Resources: corev1.ResourceRequirements{
Limits: corev1.ResourceList{
corev1.ResourceCPU: resource.MustParse("500m"),
corev1.ResourceMemory: resource.MustParse("512Mi"),
},
Requests: corev1.ResourceList{
corev1.ResourceCPU: resource.MustParse("256m"),
corev1.ResourceMemory: resource.MustParse("256Mi"),
},
},
VolumeMounts: []corev1.VolumeMount{
{
Name: "MY_POD_IP",
ValueFrom: &corev1.EnvVarSource{
FieldRef: &corev1.ObjectFieldSelector{
FieldPath: "status.podIP",
},
},
Name: "MY_VOLUME_MOUNT",
MountPath: "/test/",
},
},
},
Expand All @@ -238,6 +279,26 @@ func TestReconcile_CheckNeedRemoveOldPod(t *testing.T) {
Image: "rayproject/autoscaler",
Command: []string{"echo"},
Args: []string{"Hello Ray"},
Resources: corev1.ResourceRequirements{
Limits: corev1.ResourceList{
corev1.ResourceCPU: resource.MustParse("500m"),
corev1.ResourceMemory: resource.MustParse("512Mi"),
},
Requests: corev1.ResourceList{
corev1.ResourceCPU: resource.MustParse("256m"),
corev1.ResourceMemory: resource.MustParse("256Mi"),
},
},
VolumeMounts: []corev1.VolumeMount{
{
Name: "MY_VOLUME_MOUNT",
MountPath: "/test/",
},
{
Name: "shared-mem",
MountPath: "/dev/shm",
},
},
},
},
},
Expand All @@ -246,7 +307,9 @@ func TestReconcile_CheckNeedRemoveOldPod(t *testing.T) {
},
}

assert.Equal(t, PodNotMatchingTemplate(pod, workerTemplate), false, "expect template & pod matching")
assert.Equal(t, PodNotMatchingTemplate(pod, workerTemplate), false, "expect template & pod matching volumeMounts")
pod.Spec.Containers[0].VolumeMounts[0].MountPath = "/test1/"
assert.Equal(t, PodNotMatchingTemplate(pod, workerTemplate), true, "expect template & pod not matching volumeMounts")

workerTemplate = corev1.PodTemplateSpec{
Spec: corev1.PodSpec{
Expand Down
4 changes: 2 additions & 2 deletions ray-operator/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ require (
go.uber.org/zap v1.19.1
gopkg.in/natefinch/lumberjack.v2 v2.0.0
k8s.io/api v0.23.0
k8s.io/apiextensions-apiserver v0.23.0
k8s.io/apimachinery v0.23.0
k8s.io/apiserver v0.23.0
k8s.io/client-go v0.23.0
k8s.io/code-generator v0.23.0
k8s.io/utils v0.0.0-20210930125809-cb0fa318a74b
Expand Down Expand Up @@ -75,8 +77,6 @@ require (
gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
k8s.io/apiextensions-apiserver v0.23.0 // indirect
k8s.io/apiserver v0.23.0 // indirect
k8s.io/component-base v0.23.0 // indirect
k8s.io/gengo v0.0.0-20210813121822-485abfe95c7c // indirect
k8s.io/klog/v2 v2.30.0 // indirect
Expand Down
2 changes: 1 addition & 1 deletion ray-operator/go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ github.com/Azure/go-autorest/autorest/date v0.3.0/go.mod h1:BI0uouVdmngYNUzGWeSY
github.com/Azure/go-autorest/autorest/mocks v0.4.1/go.mod h1:LTp+uSrOhSkaKrUy935gNZuuIPPVsHlr9DSOxSayd+k=
github.com/Azure/go-autorest/logger v0.2.1/go.mod h1:T9E3cAhj2VqvPOtCYAvby9aBXkZmbF5NWuPV8+WeEW8=
github.com/Azure/go-autorest/tracing v0.6.0/go.mod h1:+vhtPC754Xsa23ID7GlGsrdKBpUA79WCAKPPZVC2DeU=
github.com/BurntSushi/toml v0.3.1 h1:WXkYYl6Yr3qBf1K79EBnL4mak0OimBfB0XUf9Vl28OQ=
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
github.com/BurntSushi/xgb v0.0.0-20160522181843-27f122750802/go.mod h1:IVnqGOEym/WlBOVXweHU+Q+/VP0lqqI8lqeDx9IjBqo=
github.com/NYTimes/gziphandler v0.0.0-20170623195520-56545f4a5d46/go.mod h1:3wb06e3pkSAbeQ52E9H9iFoQsEEwGN64994WTCIhntQ=
Expand Down Expand Up @@ -390,7 +391,6 @@ github.com/prometheus/client_golang v0.9.1/go.mod h1:7SWBe2y4D6OKWSNQJUaRYU/AaXP
github.com/prometheus/client_golang v0.9.3/go.mod h1:/TN21ttK/J9q6uSwhBd54HahCDft0ttaMvbicHlPoso=
github.com/prometheus/client_golang v1.0.0/go.mod h1:db9x61etRT2tGnBNRi70OPL5FsnadC4Ky3P0J6CfImo=
github.com/prometheus/client_golang v1.7.1/go.mod h1:PY5Wy2awLA44sXw4AOSfFBetzPP4j5+D6mVACh+pe2M=
github.com/prometheus/client_golang v1.11.0 h1:HNkLOAEQMIDv/K+04rukrLx6ch7msSRwf3/SASFAGtQ=
github.com/prometheus/client_golang v1.11.0/go.mod h1:Z6t4BnS23TR94PD6BsDNk8yVqroYurpAkEiz0P2BEV0=
github.com/prometheus/client_golang v1.11.1 h1:+4eQaD7vAZ6DsfsxB15hbE0odUjGI5ARs9yskGu1v4s=
github.com/prometheus/client_golang v1.11.1/go.mod h1:Z6t4BnS23TR94PD6BsDNk8yVqroYurpAkEiz0P2BEV0=
Expand Down