[Feature] Add service account section in helm chart #969

ducviet00 · 2023-03-16T04:40:38Z

Why are these changes needed?

For computing on cloud, service account is important. For example, we need query data from a private database, but workers dont have permission because missing service account.

Related issue number

Open #967

Checks

[x ] I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

kevin85421

Use serviceAccountName instead to keep the consistency with Pod (Link). serviceAccount is deprecated.
Head and worker may need different service account. For example, you only allow head Pod to have the access to access your private S3 buckets, and do not allow this for workers.
Do not set the default value of serviceAccount.
Consider using the following syntax.

  {{- if .Values.head.autoscalerOptions }}
  autoscalerOptions: {{- toYaml .Values.head.autoscalerOptions | nindent 4 }}
  {{- end }}

ducviet00 · 2023-03-17T02:57:06Z

thanks for your comments. I updated PR as your suggestions !

kevin85421

This looks good for me. However, we do not have related tests except the Helm linter. Hence, I need to manually test this PR before I approve it. Would you mind adding details and screenshots about which tests you have tested?

ducviet00 · 2023-03-17T08:37:55Z

This looks good for me. However, we do not have related tests except the Helm linter. Hence, I need to manually test this PR before I approve it. Would you mind adding details and screenshots about which tests you have tested?

Surely, I also did manual test. This values.yaml with config disabled service account for head-group, enabled service account for worker group and additional worker group.

# Default values for ray-cluster.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

# The KubeRay community welcomes PRs to expose additional configuration
# in this Helm chart.

image:
  repository: rayproject/ray
  tag: 2.0.0
  pullPolicy: IfNotPresent

nameOverride: "kuberay"
fullnameOverride: ""

imagePullSecrets:
  []
  # - name: an-existing-secret

head:
  # If enableInTreeAutoscaling is true, the autoscaler sidecar will be added to the Ray head pod.
  # Ray autoscaler integration is supported only for Ray versions >= 1.11.0
  # Ray autoscaler integration is Beta with KubeRay >= 0.3.0 and Ray >= 2.0.0.
  # enableInTreeAutoscaling: true
  # autoscalerOptions is an OPTIONAL field specifying configuration overrides for the Ray autoscaler.
  # The example configuration shown below below represents the DEFAULT values.
  # autoscalerOptions:
  # upscalingMode: Default
  # idleTimeoutSeconds: 60
  # securityContext: {}
  # env: []
  # envFrom: []
  # resources specifies optional resource request and limit overrides for the autoscaler container.
  # For large Ray clusters, we recommend monitoring container resource usage to determine if overriding the defaults is required.
  # resources:
  #   limits:
  #     cpu: "500m"
  #     memory: "512Mi"
  #   requests:
  #     cpu: "500m"
  #     memory: "512Mi"
  labels: {}
  serviceAccountName: ""
  rayStartParams:
    dashboard-host: "0.0.0.0"
    block: "true"
  # containerEnv specifies environment variables for the Ray container,
  # Follows standard K8s container env schema.
  containerEnv: []
  # - name: EXAMPLE_ENV
  #   value: "1"
  envFrom:
    []
    # - secretRef:
    #     name: my-env-secret
  # ports optionally allows specifying ports for the Ray container.
  # ports: []
  # resource requests and limits for the Ray head container.
  # Modify as needed for your application.
  # Note that the resources in this example are much too small for production;
  # we don't recommend allocating less than 8G memory for a Ray pod in production.
  # Ray pods should be sized to take up entire K8s nodes when possible.
  # Always set CPU and memory limits for Ray pods.
  # It is usually best to set requests equal to limits.
  # See https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html#resources
  # for further guidance.
  resources:
    limits:
      cpu: "1"
      # To avoid out-of-memory issues, never allocate less than 2G memory for the Ray head.
      memory: "2G"
    requests:
      cpu: "1"
      memory: "2G"
  annotations: {}
  nodeSelector: {}
  tolerations: []
  affinity: {}
  # Ray container security context.
  securityContext: {}
  volumes:
    - name: log-volume
      emptyDir: {}
  # Ray writes logs to /tmp/ray/session_latests/logs
  volumeMounts:
    - mountPath: /tmp/ray
      name: log-volume
  # sidecarContainers specifies additional containers to attach to the Ray pod.
  # Follows standard K8s container spec.
  sidecarContainers: []
  # See docs/guidance/pod-command.md for more details about how to specify
  # container command for head Pod.
  command: []
  args: []

worker:
  # If you want to disable the default workergroup
  # uncomment the line below
  # disabled: true
  groupName: workergroup
  replicas: 1
  labels: {}
  serviceAccountName: "workergroup-service-account"
  rayStartParams:
    block: "true"
  initContainerImage: "busybox:1.28" # Enable users to specify the image for init container. Users can pull the busybox image from their private repositories.
  # Security context for the init container.
  initContainerSecurityContext: {}
  # containerEnv specifies environment variables for the Ray container,
  # Follows standard K8s container env schema.
  containerEnv: []
  # - name: EXAMPLE_ENV
  #   value: "1"
  envFrom:
    []
    # - secretRef:
    #     name: my-env-secret
  # ports optionally allows specifying ports for the Ray container.
  # ports: []
  # resource requests and limits for the Ray head container.
  # Modify as needed for your application.
  # Note that the resources in this example are much too small for production;
  # we don't recommend allocating less than 8G memory for a Ray pod in production.
  # Ray pods should be sized to take up entire K8s nodes when possible.
  # Always set CPU and memory limits for Ray pods.
  # It is usually best to set requests equal to limits.
  # See https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html#resources
  # for further guidance.
  resources:
    limits:
      cpu: "1"
      memory: "1G"
    requests:
      cpu: "1"
      memory: "1G"
  annotations: {}
  nodeSelector: {}
  tolerations: []
  affinity: {}
  # Ray container security context.
  securityContext: {}
  volumes:
    - name: log-volume
      emptyDir: {}
  # Ray writes logs to /tmp/ray/session_latests/logs
  volumeMounts:
    - mountPath: /tmp/ray
      name: log-volume
  # sidecarContainers specifies additional containers to attach to the Ray pod.
  # Follows standard K8s container spec.
  sidecarContainers: []
  # See docs/guidance/pod-command.md for more details about how to specify
  # container command for worker Pod.
  command: []
  args: []

# The map's key is used as the groupName.
# For example, key:small-group in the map below
# will be used as the groupName
additionalWorkerGroups:
  smallGroup:
    # Disabled by default
    disabled: false
    replicas: 1
    minReplicas: 1
    maxReplicas: 3
    labels: {}
    serviceAccountName: "additional-workergroup-service-account"
    rayStartParams:
      block: "true"
    initContainerImage: "busybox:1.28" # Enable users to specify the image for init container. Users can pull the busybox image from their private repositories.
    # Security context for the init container.
    initContainerSecurityContext: {}
    # containerEnv specifies environment variables for the Ray container,
    # Follows standard K8s container env schema.
    containerEnv:
      []
      # - name: EXAMPLE_ENV
      #   value: "1"
    envFrom:
      []
      # - secretRef:
      #     name: my-env-secret
    # ports optionally allows specifying ports for the Ray container.
    # ports: []
    # resource requests and limits for the Ray head container.
    # Modify as needed for your application.
    # Note that the resources in this example are much too small for production;
    # we don't recommend allocating less than 8G memory for a Ray pod in production.
    # Ray pods should be sized to take up entire K8s nodes when possible.
    # Always set CPU and memory limits for Ray pods.
    # It is usually best to set requests equal to limits.
    # See https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html#resources
    # for further guidance.
    resources:
      limits:
        cpu: 1
        memory: "1G"
      requests:
        cpu: 1
        memory: "1G"
    annotations: {}
    nodeSelector: {}
    tolerations: []
    affinity: {}
    # Ray container security context.
    securityContext: {}
    volumes:
      - name: log-volume
        emptyDir: {}
    # Ray writes logs to /tmp/ray/session_latests/logs
    volumeMounts:
      - mountPath: /tmp/ray
        name: log-volume
    sidecarContainers: []
    # See docs/guidance/pod-command.md for more details about how to specify
    # container command for worker Pod.
    command: []
    args: []

# Configuration for Head's Kubernetes Service
service:
  # This is optional, and the default is ClusterIP.
  type: ClusterIP

helm template --debug test . output:

---
# Source: ray-cluster/templates/raycluster-cluster.yaml
apiVersion: ray.io/v1alpha1
kind: RayCluster
metadata:
  labels:
    app.kubernetes.io/name: kuberay
    helm.sh/chart: ray-cluster-0.4.0
    app.kubernetes.io/instance: test
    app.kubernetes.io/managed-by: Helm
  name: test-kuberay
  
spec:
  headGroupSpec:
    serviceType: ClusterIP
    rayStartParams:
      block: "true"
      dashboard-host: "0.0.0.0"
    template:
      spec:
        imagePullSecrets:
          []
        containers:
          - volumeMounts:
            - mountPath: /tmp/ray
              name: log-volume
            name: ray-head
            image: rayproject/ray:2.0.0
            imagePullPolicy: IfNotPresent
            resources:
              limits:
                cpu: "1"
                memory: 2G
              requests:
                cpu: "1"
                memory: 2G
            securityContext:
              {}
            env:
              []
        volumes:
          - emptyDir: {}
            name: log-volume
        affinity:
          {}
        tolerations:
          []
        nodeSelector:
          {}
      metadata:
        annotations:
          {}
        labels: 
          app.kubernetes.io/name: kuberay
          helm.sh/chart: ray-cluster-0.4.0
          app.kubernetes.io/instance: test
          app.kubernetes.io/managed-by: Helm 

  workerGroupSpecs:
  - rayStartParams:
      block: "true"
    replicas: 1
    minReplicas: 1
    maxReplicas: 3
    groupName: smallGroup
    template:
      spec:
        imagePullSecrets:
          []
        initContainers:
          - name: init
            image: busybox:1.28
            command: ['sh', '-c', "until nslookup $FQ_RAY_IP; do echo waiting for K8s Service $FQ_RAY_IP; sleep 2; done"]
            securityContext:
              {}
        serviceAccountName: additional-workergroup-service-account
        containers:
          - volumeMounts:
            - mountPath: /tmp/ray
              name: log-volume
            name: ray-worker
            image: rayproject/ray:2.0.0
            imagePullPolicy: IfNotPresent
            resources:
              limits:
                cpu: 1
                memory: 1G
              requests:
                cpu: 1
                memory: 1G
            securityContext:
              {}
            env:
              []
            ports:
              null
        volumes:
          - emptyDir: {}
            name: log-volume
        affinity:
          {}
        tolerations:
          []
        nodeSelector:
          {}
      metadata:
        annotations:
          {}
        labels: 
          app.kubernetes.io/name: kuberay
          helm.sh/chart: ray-cluster-0.4.0
          app.kubernetes.io/instance: test
          app.kubernetes.io/managed-by: Helm
  - rayStartParams:
      block: "true"
    replicas: 1
    minReplicas: 1
    maxReplicas: 2147483647
    groupName: workergroup
    template:
      spec:
        imagePullSecrets:
          []
        initContainers:
          - name: init
            image: busybox:1.28
            command: ['sh', '-c', "until nslookup $FQ_RAY_IP; do echo waiting for K8s Service $FQ_RAY_IP; sleep 2; done"]
            securityContext:
              {}
        serviceAccountName: workergroup-service-account
        containers:
          - volumeMounts:
            - mountPath: /tmp/ray
              name: log-volume
            name: ray-worker
            image: rayproject/ray:2.0.0
            imagePullPolicy: IfNotPresent
            resources:
              limits:
                cpu: "1"
                memory: 1G
              requests:
                cpu: "1"
                memory: 1G
            securityContext:
              {}
            env:
              []
            ports:
              null
        volumes:
          - emptyDir: {}
            name: log-volume
        affinity:
          {}
        tolerations:
          []
        nodeSelector:
          {}
      metadata:
        annotations:
          {}
        labels: 
          app.kubernetes.io/name: kuberay
          helm.sh/chart: ray-cluster-0.4.0
          app.kubernetes.io/instance: test
          app.kubernetes.io/managed-by: Helm

Also I tested and deployed on our staging environment

Yicheng-Lu-llll · 2023-03-21T20:06:24Z

LGTM！

also tested it:

# create service account
kubectl create serviceaccount test

# create token
kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: test-secret
  annotations:
    kubernetes.io/service-account.name: test
type: kubernetes.io/service-account-token
EOF

# create role
kubectl apply -f - <<EOF
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: default                      
  name: role-test
rules:
- apiGroups: [""]
  resources: ["pods"]                        
  verbs: ["get", "list"]            
EOF

# create RoleBinding
kubectl apply -f - <<EOF
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: rolebinding-test
  namespace: default
subjects:                                 
- kind: ServiceAccount                  
  name: test
  namespace: default
roleRef:                                  
  kind: Role
  name: role-test
  apiGroup: rbac.authorization.k8s.io      
EOF

# in {kuberay repo}/helm-chart/ray-cluster/values.yaml, set serviceAccountName: "test" only for head pod
# install ray cluster
helm install raycluster .

# in head pod, use the TOKEN to access the API Server and list all the pods
kubectl exec -it $(kubectl get pods -o=name | grep head) -- bash
export CURL_CA_BUNDLE=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
sudo apt update 
sudo apt -y install curl
# will see all the pod info. In the worker pod, list all the pods that will be forbidden.
curl -H "Authorization: Bearer $TOKEN" https://kubernetes/api/v1/namespaces/default/pods

kevin85421

Thank @ducviet00 for this contribution and @Yicheng-Lu-llll for the review!

kevin85421 · 2023-03-21T21:02:54Z

This PR only updates Helm chart, so it has no relationship with the failed tests. Merged.

yc2984 · 2023-04-05T14:06:10Z

Hi, is it possible to provide an end-to-end example, including all required configs, e.g. do you need to create those Kubernetes service account separately and how to mount the secret into the ray node?

Add service account section in helm chart

feat: allow zero replica of workers

b522e23

kevin85421 reviewed Mar 16, 2023

View reviewed changes

fix: update depreciated and remove default SA

698b2f5

kevin85421 reviewed Mar 17, 2023

View reviewed changes

kevin85421 approved these changes Mar 21, 2023

View reviewed changes

kevin85421 merged commit a0ee1c8 into ray-project:master Mar 21, 2023

yuduber mentioned this pull request May 8, 2023

[GCP] Setup Ray cluster on GCP so that it can read / write to Google Storage (GCS) ray-project/ray#35140

Closed

yc2984 mentioned this pull request May 9, 2023

[Doc] Example setting up access for google cloud storage #1073

Closed

2 tasks

lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023

[Feature] Add service account section in helm chart (ray-project#969)

cd643dc

Add service account section in helm chart

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add service account section in helm chart #969

[Feature] Add service account section in helm chart #969

ducviet00 commented Mar 16, 2023

kevin85421 left a comment

ducviet00 commented Mar 17, 2023

kevin85421 left a comment

ducviet00 commented Mar 17, 2023 •

edited

Loading

Yicheng-Lu-llll commented Mar 21, 2023

kevin85421 left a comment

kevin85421 commented Mar 21, 2023

yc2984 commented Apr 5, 2023

[Feature] Add service account section in helm chart #969

[Feature] Add service account section in helm chart #969

Conversation

ducviet00 commented Mar 16, 2023

Why are these changes needed?

Related issue number

Checks

kevin85421 left a comment

Choose a reason for hiding this comment

ducviet00 commented Mar 17, 2023

kevin85421 left a comment

Choose a reason for hiding this comment

ducviet00 commented Mar 17, 2023 • edited Loading

Yicheng-Lu-llll commented Mar 21, 2023

kevin85421 left a comment

Choose a reason for hiding this comment

kevin85421 commented Mar 21, 2023

yc2984 commented Apr 5, 2023

ducviet00 commented Mar 17, 2023 •

edited

Loading