-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add service account section in helm chart #969
[Feature] Add service account section in helm chart #969
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Use
serviceAccountName
instead to keep the consistency with Pod (Link).serviceAccount
is deprecated. -
Head and worker may need different service account. For example, you only allow head Pod to have the access to access your private S3 buckets, and do not allow this for workers.
-
Do not set the default value of serviceAccount.
-
Consider using the following syntax.
{{- if .Values.head.autoscalerOptions }}
autoscalerOptions: {{- toYaml .Values.head.autoscalerOptions | nindent 4 }}
{{- end }}
thanks for your comments. I updated PR as your suggestions ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good for me. However, we do not have related tests except the Helm linter. Hence, I need to manually test this PR before I approve it. Would you mind adding details and screenshots about which tests you have tested?
Surely, I also did manual test. This # Default values for ray-cluster.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
# The KubeRay community welcomes PRs to expose additional configuration
# in this Helm chart.
image:
repository: rayproject/ray
tag: 2.0.0
pullPolicy: IfNotPresent
nameOverride: "kuberay"
fullnameOverride: ""
imagePullSecrets:
[]
# - name: an-existing-secret
head:
# If enableInTreeAutoscaling is true, the autoscaler sidecar will be added to the Ray head pod.
# Ray autoscaler integration is supported only for Ray versions >= 1.11.0
# Ray autoscaler integration is Beta with KubeRay >= 0.3.0 and Ray >= 2.0.0.
# enableInTreeAutoscaling: true
# autoscalerOptions is an OPTIONAL field specifying configuration overrides for the Ray autoscaler.
# The example configuration shown below below represents the DEFAULT values.
# autoscalerOptions:
# upscalingMode: Default
# idleTimeoutSeconds: 60
# securityContext: {}
# env: []
# envFrom: []
# resources specifies optional resource request and limit overrides for the autoscaler container.
# For large Ray clusters, we recommend monitoring container resource usage to determine if overriding the defaults is required.
# resources:
# limits:
# cpu: "500m"
# memory: "512Mi"
# requests:
# cpu: "500m"
# memory: "512Mi"
labels: {}
serviceAccountName: ""
rayStartParams:
dashboard-host: "0.0.0.0"
block: "true"
# containerEnv specifies environment variables for the Ray container,
# Follows standard K8s container env schema.
containerEnv: []
# - name: EXAMPLE_ENV
# value: "1"
envFrom:
[]
# - secretRef:
# name: my-env-secret
# ports optionally allows specifying ports for the Ray container.
# ports: []
# resource requests and limits for the Ray head container.
# Modify as needed for your application.
# Note that the resources in this example are much too small for production;
# we don't recommend allocating less than 8G memory for a Ray pod in production.
# Ray pods should be sized to take up entire K8s nodes when possible.
# Always set CPU and memory limits for Ray pods.
# It is usually best to set requests equal to limits.
# See https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html#resources
# for further guidance.
resources:
limits:
cpu: "1"
# To avoid out-of-memory issues, never allocate less than 2G memory for the Ray head.
memory: "2G"
requests:
cpu: "1"
memory: "2G"
annotations: {}
nodeSelector: {}
tolerations: []
affinity: {}
# Ray container security context.
securityContext: {}
volumes:
- name: log-volume
emptyDir: {}
# Ray writes logs to /tmp/ray/session_latests/logs
volumeMounts:
- mountPath: /tmp/ray
name: log-volume
# sidecarContainers specifies additional containers to attach to the Ray pod.
# Follows standard K8s container spec.
sidecarContainers: []
# See docs/guidance/pod-command.md for more details about how to specify
# container command for head Pod.
command: []
args: []
worker:
# If you want to disable the default workergroup
# uncomment the line below
# disabled: true
groupName: workergroup
replicas: 1
labels: {}
serviceAccountName: "workergroup-service-account"
rayStartParams:
block: "true"
initContainerImage: "busybox:1.28" # Enable users to specify the image for init container. Users can pull the busybox image from their private repositories.
# Security context for the init container.
initContainerSecurityContext: {}
# containerEnv specifies environment variables for the Ray container,
# Follows standard K8s container env schema.
containerEnv: []
# - name: EXAMPLE_ENV
# value: "1"
envFrom:
[]
# - secretRef:
# name: my-env-secret
# ports optionally allows specifying ports for the Ray container.
# ports: []
# resource requests and limits for the Ray head container.
# Modify as needed for your application.
# Note that the resources in this example are much too small for production;
# we don't recommend allocating less than 8G memory for a Ray pod in production.
# Ray pods should be sized to take up entire K8s nodes when possible.
# Always set CPU and memory limits for Ray pods.
# It is usually best to set requests equal to limits.
# See https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html#resources
# for further guidance.
resources:
limits:
cpu: "1"
memory: "1G"
requests:
cpu: "1"
memory: "1G"
annotations: {}
nodeSelector: {}
tolerations: []
affinity: {}
# Ray container security context.
securityContext: {}
volumes:
- name: log-volume
emptyDir: {}
# Ray writes logs to /tmp/ray/session_latests/logs
volumeMounts:
- mountPath: /tmp/ray
name: log-volume
# sidecarContainers specifies additional containers to attach to the Ray pod.
# Follows standard K8s container spec.
sidecarContainers: []
# See docs/guidance/pod-command.md for more details about how to specify
# container command for worker Pod.
command: []
args: []
# The map's key is used as the groupName.
# For example, key:small-group in the map below
# will be used as the groupName
additionalWorkerGroups:
smallGroup:
# Disabled by default
disabled: false
replicas: 1
minReplicas: 1
maxReplicas: 3
labels: {}
serviceAccountName: "additional-workergroup-service-account"
rayStartParams:
block: "true"
initContainerImage: "busybox:1.28" # Enable users to specify the image for init container. Users can pull the busybox image from their private repositories.
# Security context for the init container.
initContainerSecurityContext: {}
# containerEnv specifies environment variables for the Ray container,
# Follows standard K8s container env schema.
containerEnv:
[]
# - name: EXAMPLE_ENV
# value: "1"
envFrom:
[]
# - secretRef:
# name: my-env-secret
# ports optionally allows specifying ports for the Ray container.
# ports: []
# resource requests and limits for the Ray head container.
# Modify as needed for your application.
# Note that the resources in this example are much too small for production;
# we don't recommend allocating less than 8G memory for a Ray pod in production.
# Ray pods should be sized to take up entire K8s nodes when possible.
# Always set CPU and memory limits for Ray pods.
# It is usually best to set requests equal to limits.
# See https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html#resources
# for further guidance.
resources:
limits:
cpu: 1
memory: "1G"
requests:
cpu: 1
memory: "1G"
annotations: {}
nodeSelector: {}
tolerations: []
affinity: {}
# Ray container security context.
securityContext: {}
volumes:
- name: log-volume
emptyDir: {}
# Ray writes logs to /tmp/ray/session_latests/logs
volumeMounts:
- mountPath: /tmp/ray
name: log-volume
sidecarContainers: []
# See docs/guidance/pod-command.md for more details about how to specify
# container command for worker Pod.
command: []
args: []
# Configuration for Head's Kubernetes Service
service:
# This is optional, and the default is ClusterIP.
type: ClusterIP
---
# Source: ray-cluster/templates/raycluster-cluster.yaml
apiVersion: ray.io/v1alpha1
kind: RayCluster
metadata:
labels:
app.kubernetes.io/name: kuberay
helm.sh/chart: ray-cluster-0.4.0
app.kubernetes.io/instance: test
app.kubernetes.io/managed-by: Helm
name: test-kuberay
spec:
headGroupSpec:
serviceType: ClusterIP
rayStartParams:
block: "true"
dashboard-host: "0.0.0.0"
template:
spec:
imagePullSecrets:
[]
containers:
- volumeMounts:
- mountPath: /tmp/ray
name: log-volume
name: ray-head
image: rayproject/ray:2.0.0
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: "1"
memory: 2G
requests:
cpu: "1"
memory: 2G
securityContext:
{}
env:
[]
volumes:
- emptyDir: {}
name: log-volume
affinity:
{}
tolerations:
[]
nodeSelector:
{}
metadata:
annotations:
{}
labels:
app.kubernetes.io/name: kuberay
helm.sh/chart: ray-cluster-0.4.0
app.kubernetes.io/instance: test
app.kubernetes.io/managed-by: Helm
workerGroupSpecs:
- rayStartParams:
block: "true"
replicas: 1
minReplicas: 1
maxReplicas: 3
groupName: smallGroup
template:
spec:
imagePullSecrets:
[]
initContainers:
- name: init
image: busybox:1.28
command: ['sh', '-c', "until nslookup $FQ_RAY_IP; do echo waiting for K8s Service $FQ_RAY_IP; sleep 2; done"]
securityContext:
{}
serviceAccountName: additional-workergroup-service-account
containers:
- volumeMounts:
- mountPath: /tmp/ray
name: log-volume
name: ray-worker
image: rayproject/ray:2.0.0
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 1
memory: 1G
requests:
cpu: 1
memory: 1G
securityContext:
{}
env:
[]
ports:
null
volumes:
- emptyDir: {}
name: log-volume
affinity:
{}
tolerations:
[]
nodeSelector:
{}
metadata:
annotations:
{}
labels:
app.kubernetes.io/name: kuberay
helm.sh/chart: ray-cluster-0.4.0
app.kubernetes.io/instance: test
app.kubernetes.io/managed-by: Helm
- rayStartParams:
block: "true"
replicas: 1
minReplicas: 1
maxReplicas: 2147483647
groupName: workergroup
template:
spec:
imagePullSecrets:
[]
initContainers:
- name: init
image: busybox:1.28
command: ['sh', '-c', "until nslookup $FQ_RAY_IP; do echo waiting for K8s Service $FQ_RAY_IP; sleep 2; done"]
securityContext:
{}
serviceAccountName: workergroup-service-account
containers:
- volumeMounts:
- mountPath: /tmp/ray
name: log-volume
name: ray-worker
image: rayproject/ray:2.0.0
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: "1"
memory: 1G
requests:
cpu: "1"
memory: 1G
securityContext:
{}
env:
[]
ports:
null
volumes:
- emptyDir: {}
name: log-volume
affinity:
{}
tolerations:
[]
nodeSelector:
{}
metadata:
annotations:
{}
labels:
app.kubernetes.io/name: kuberay
helm.sh/chart: ray-cluster-0.4.0
app.kubernetes.io/instance: test
app.kubernetes.io/managed-by: Helm |
LGTM! also tested it: # create service account
kubectl create serviceaccount test
# create token
kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
name: test-secret
annotations:
kubernetes.io/service-account.name: test
type: kubernetes.io/service-account-token
EOF
# create role
kubectl apply -f - <<EOF
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: default
name: role-test
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list"]
EOF
# create RoleBinding
kubectl apply -f - <<EOF
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: rolebinding-test
namespace: default
subjects:
- kind: ServiceAccount
name: test
namespace: default
roleRef:
kind: Role
name: role-test
apiGroup: rbac.authorization.k8s.io
EOF
# in {kuberay repo}/helm-chart/ray-cluster/values.yaml, set serviceAccountName: "test" only for head pod
# install ray cluster
helm install raycluster .
# in head pod, use the TOKEN to access the API Server and list all the pods
kubectl exec -it $(kubectl get pods -o=name | grep head) -- bash
export CURL_CA_BUNDLE=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
sudo apt update
sudo apt -y install curl
# will see all the pod info. In the worker pod, list all the pods that will be forbidden.
curl -H "Authorization: Bearer $TOKEN" https://kubernetes/api/v1/namespaces/default/pods
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank @ducviet00 for this contribution and @Yicheng-Lu-llll for the review!
This PR only updates Helm chart, so it has no relationship with the failed tests. Merged. |
Hi, is it possible to provide an end-to-end example, including all required configs, e.g. do you need to create those Kubernetes service account separately and how to mount the secret into the ray node? |
Add service account section in helm chart
Why are these changes needed?
For computing on cloud, service account is important. For example, we need query data from a private database, but workers dont have permission because missing service account.
Related issue number
Open #967
Checks