Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add service account section in helm chart #969

Merged
merged 2 commits into from
Mar 21, 2023

Conversation

ducviet00
Copy link
Contributor

Why are these changes needed?

For computing on cloud, service account is important. For example, we need query data from a private database, but workers dont have permission because missing service account.

Related issue number

Open #967

Checks

  • [x ] I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Use serviceAccountName instead to keep the consistency with Pod (Link). serviceAccount is deprecated.

  2. Head and worker may need different service account. For example, you only allow head Pod to have the access to access your private S3 buckets, and do not allow this for workers.

  3. Do not set the default value of serviceAccount.

  4. Consider using the following syntax.

  {{- if .Values.head.autoscalerOptions }}
  autoscalerOptions: {{- toYaml .Values.head.autoscalerOptions | nindent 4 }}
  {{- end }}

@ducviet00
Copy link
Contributor Author

thanks for your comments. I updated PR as your suggestions !

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good for me. However, we do not have related tests except the Helm linter. Hence, I need to manually test this PR before I approve it. Would you mind adding details and screenshots about which tests you have tested?

@ducviet00
Copy link
Contributor Author

ducviet00 commented Mar 17, 2023

This looks good for me. However, we do not have related tests except the Helm linter. Hence, I need to manually test this PR before I approve it. Would you mind adding details and screenshots about which tests you have tested?

Surely, I also did manual test. This values.yaml with config disabled service account for head-group, enabled service account for worker group and additional worker group.

# Default values for ray-cluster.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

# The KubeRay community welcomes PRs to expose additional configuration
# in this Helm chart.

image:
  repository: rayproject/ray
  tag: 2.0.0
  pullPolicy: IfNotPresent

nameOverride: "kuberay"
fullnameOverride: ""

imagePullSecrets:
  []
  # - name: an-existing-secret

head:
  # If enableInTreeAutoscaling is true, the autoscaler sidecar will be added to the Ray head pod.
  # Ray autoscaler integration is supported only for Ray versions >= 1.11.0
  # Ray autoscaler integration is Beta with KubeRay >= 0.3.0 and Ray >= 2.0.0.
  # enableInTreeAutoscaling: true
  # autoscalerOptions is an OPTIONAL field specifying configuration overrides for the Ray autoscaler.
  # The example configuration shown below below represents the DEFAULT values.
  # autoscalerOptions:
  # upscalingMode: Default
  # idleTimeoutSeconds: 60
  # securityContext: {}
  # env: []
  # envFrom: []
  # resources specifies optional resource request and limit overrides for the autoscaler container.
  # For large Ray clusters, we recommend monitoring container resource usage to determine if overriding the defaults is required.
  # resources:
  #   limits:
  #     cpu: "500m"
  #     memory: "512Mi"
  #   requests:
  #     cpu: "500m"
  #     memory: "512Mi"
  labels: {}
  serviceAccountName: ""
  rayStartParams:
    dashboard-host: "0.0.0.0"
    block: "true"
  # containerEnv specifies environment variables for the Ray container,
  # Follows standard K8s container env schema.
  containerEnv: []
  # - name: EXAMPLE_ENV
  #   value: "1"
  envFrom:
    []
    # - secretRef:
    #     name: my-env-secret
  # ports optionally allows specifying ports for the Ray container.
  # ports: []
  # resource requests and limits for the Ray head container.
  # Modify as needed for your application.
  # Note that the resources in this example are much too small for production;
  # we don't recommend allocating less than 8G memory for a Ray pod in production.
  # Ray pods should be sized to take up entire K8s nodes when possible.
  # Always set CPU and memory limits for Ray pods.
  # It is usually best to set requests equal to limits.
  # See https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html#resources
  # for further guidance.
  resources:
    limits:
      cpu: "1"
      # To avoid out-of-memory issues, never allocate less than 2G memory for the Ray head.
      memory: "2G"
    requests:
      cpu: "1"
      memory: "2G"
  annotations: {}
  nodeSelector: {}
  tolerations: []
  affinity: {}
  # Ray container security context.
  securityContext: {}
  volumes:
    - name: log-volume
      emptyDir: {}
  # Ray writes logs to /tmp/ray/session_latests/logs
  volumeMounts:
    - mountPath: /tmp/ray
      name: log-volume
  # sidecarContainers specifies additional containers to attach to the Ray pod.
  # Follows standard K8s container spec.
  sidecarContainers: []
  # See docs/guidance/pod-command.md for more details about how to specify
  # container command for head Pod.
  command: []
  args: []

worker:
  # If you want to disable the default workergroup
  # uncomment the line below
  # disabled: true
  groupName: workergroup
  replicas: 1
  labels: {}
  serviceAccountName: "workergroup-service-account"
  rayStartParams:
    block: "true"
  initContainerImage: "busybox:1.28" # Enable users to specify the image for init container. Users can pull the busybox image from their private repositories.
  # Security context for the init container.
  initContainerSecurityContext: {}
  # containerEnv specifies environment variables for the Ray container,
  # Follows standard K8s container env schema.
  containerEnv: []
  # - name: EXAMPLE_ENV
  #   value: "1"
  envFrom:
    []
    # - secretRef:
    #     name: my-env-secret
  # ports optionally allows specifying ports for the Ray container.
  # ports: []
  # resource requests and limits for the Ray head container.
  # Modify as needed for your application.
  # Note that the resources in this example are much too small for production;
  # we don't recommend allocating less than 8G memory for a Ray pod in production.
  # Ray pods should be sized to take up entire K8s nodes when possible.
  # Always set CPU and memory limits for Ray pods.
  # It is usually best to set requests equal to limits.
  # See https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html#resources
  # for further guidance.
  resources:
    limits:
      cpu: "1"
      memory: "1G"
    requests:
      cpu: "1"
      memory: "1G"
  annotations: {}
  nodeSelector: {}
  tolerations: []
  affinity: {}
  # Ray container security context.
  securityContext: {}
  volumes:
    - name: log-volume
      emptyDir: {}
  # Ray writes logs to /tmp/ray/session_latests/logs
  volumeMounts:
    - mountPath: /tmp/ray
      name: log-volume
  # sidecarContainers specifies additional containers to attach to the Ray pod.
  # Follows standard K8s container spec.
  sidecarContainers: []
  # See docs/guidance/pod-command.md for more details about how to specify
  # container command for worker Pod.
  command: []
  args: []

# The map's key is used as the groupName.
# For example, key:small-group in the map below
# will be used as the groupName
additionalWorkerGroups:
  smallGroup:
    # Disabled by default
    disabled: false
    replicas: 1
    minReplicas: 1
    maxReplicas: 3
    labels: {}
    serviceAccountName: "additional-workergroup-service-account"
    rayStartParams:
      block: "true"
    initContainerImage: "busybox:1.28" # Enable users to specify the image for init container. Users can pull the busybox image from their private repositories.
    # Security context for the init container.
    initContainerSecurityContext: {}
    # containerEnv specifies environment variables for the Ray container,
    # Follows standard K8s container env schema.
    containerEnv:
      []
      # - name: EXAMPLE_ENV
      #   value: "1"
    envFrom:
      []
      # - secretRef:
      #     name: my-env-secret
    # ports optionally allows specifying ports for the Ray container.
    # ports: []
    # resource requests and limits for the Ray head container.
    # Modify as needed for your application.
    # Note that the resources in this example are much too small for production;
    # we don't recommend allocating less than 8G memory for a Ray pod in production.
    # Ray pods should be sized to take up entire K8s nodes when possible.
    # Always set CPU and memory limits for Ray pods.
    # It is usually best to set requests equal to limits.
    # See https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html#resources
    # for further guidance.
    resources:
      limits:
        cpu: 1
        memory: "1G"
      requests:
        cpu: 1
        memory: "1G"
    annotations: {}
    nodeSelector: {}
    tolerations: []
    affinity: {}
    # Ray container security context.
    securityContext: {}
    volumes:
      - name: log-volume
        emptyDir: {}
    # Ray writes logs to /tmp/ray/session_latests/logs
    volumeMounts:
      - mountPath: /tmp/ray
        name: log-volume
    sidecarContainers: []
    # See docs/guidance/pod-command.md for more details about how to specify
    # container command for worker Pod.
    command: []
    args: []

# Configuration for Head's Kubernetes Service
service:
  # This is optional, and the default is ClusterIP.
  type: ClusterIP

helm template --debug test . output:

---
# Source: ray-cluster/templates/raycluster-cluster.yaml
apiVersion: ray.io/v1alpha1
kind: RayCluster
metadata:
  labels:
    app.kubernetes.io/name: kuberay
    helm.sh/chart: ray-cluster-0.4.0
    app.kubernetes.io/instance: test
    app.kubernetes.io/managed-by: Helm
  name: test-kuberay
  
spec:
  headGroupSpec:
    serviceType: ClusterIP
    rayStartParams:
      block: "true"
      dashboard-host: "0.0.0.0"
    template:
      spec:
        imagePullSecrets:
          []
        containers:
          - volumeMounts:
            - mountPath: /tmp/ray
              name: log-volume
            name: ray-head
            image: rayproject/ray:2.0.0
            imagePullPolicy: IfNotPresent
            resources:
              limits:
                cpu: "1"
                memory: 2G
              requests:
                cpu: "1"
                memory: 2G
            securityContext:
              {}
            env:
              []
        volumes:
          - emptyDir: {}
            name: log-volume
        affinity:
          {}
        tolerations:
          []
        nodeSelector:
          {}
      metadata:
        annotations:
          {}
        labels: 
          app.kubernetes.io/name: kuberay
          helm.sh/chart: ray-cluster-0.4.0
          app.kubernetes.io/instance: test
          app.kubernetes.io/managed-by: Helm 

  workerGroupSpecs:
  - rayStartParams:
      block: "true"
    replicas: 1
    minReplicas: 1
    maxReplicas: 3
    groupName: smallGroup
    template:
      spec:
        imagePullSecrets:
          []
        initContainers:
          - name: init
            image: busybox:1.28
            command: ['sh', '-c', "until nslookup $FQ_RAY_IP; do echo waiting for K8s Service $FQ_RAY_IP; sleep 2; done"]
            securityContext:
              {}
        serviceAccountName: additional-workergroup-service-account
        containers:
          - volumeMounts:
            - mountPath: /tmp/ray
              name: log-volume
            name: ray-worker
            image: rayproject/ray:2.0.0
            imagePullPolicy: IfNotPresent
            resources:
              limits:
                cpu: 1
                memory: 1G
              requests:
                cpu: 1
                memory: 1G
            securityContext:
              {}
            env:
              []
            ports:
              null
        volumes:
          - emptyDir: {}
            name: log-volume
        affinity:
          {}
        tolerations:
          []
        nodeSelector:
          {}
      metadata:
        annotations:
          {}
        labels: 
          app.kubernetes.io/name: kuberay
          helm.sh/chart: ray-cluster-0.4.0
          app.kubernetes.io/instance: test
          app.kubernetes.io/managed-by: Helm
  - rayStartParams:
      block: "true"
    replicas: 1
    minReplicas: 1
    maxReplicas: 2147483647
    groupName: workergroup
    template:
      spec:
        imagePullSecrets:
          []
        initContainers:
          - name: init
            image: busybox:1.28
            command: ['sh', '-c', "until nslookup $FQ_RAY_IP; do echo waiting for K8s Service $FQ_RAY_IP; sleep 2; done"]
            securityContext:
              {}
        serviceAccountName: workergroup-service-account
        containers:
          - volumeMounts:
            - mountPath: /tmp/ray
              name: log-volume
            name: ray-worker
            image: rayproject/ray:2.0.0
            imagePullPolicy: IfNotPresent
            resources:
              limits:
                cpu: "1"
                memory: 1G
              requests:
                cpu: "1"
                memory: 1G
            securityContext:
              {}
            env:
              []
            ports:
              null
        volumes:
          - emptyDir: {}
            name: log-volume
        affinity:
          {}
        tolerations:
          []
        nodeSelector:
          {}
      metadata:
        annotations:
          {}
        labels: 
          app.kubernetes.io/name: kuberay
          helm.sh/chart: ray-cluster-0.4.0
          app.kubernetes.io/instance: test
          app.kubernetes.io/managed-by: Helm

Also I tested and deployed on our staging environment
image
image

@Yicheng-Lu-llll
Copy link
Contributor

LGTM!

also tested it:

# create service account
kubectl create serviceaccount test

# create token
kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: test-secret
  annotations:
    kubernetes.io/service-account.name: test
type: kubernetes.io/service-account-token
EOF

# create role
kubectl apply -f - <<EOF
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: default                      
  name: role-test
rules:
- apiGroups: [""]
  resources: ["pods"]                        
  verbs: ["get", "list"]            
EOF

# create RoleBinding
kubectl apply -f - <<EOF
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: rolebinding-test
  namespace: default
subjects:                                 
- kind: ServiceAccount                  
  name: test
  namespace: default
roleRef:                                  
  kind: Role
  name: role-test
  apiGroup: rbac.authorization.k8s.io      
EOF

# in {kuberay repo}/helm-chart/ray-cluster/values.yaml, set serviceAccountName: "test" only for head pod
# install ray cluster
helm install raycluster .

# in head pod, use the TOKEN to access the API Server and list all the pods
kubectl exec -it $(kubectl get pods -o=name | grep head) -- bash
export CURL_CA_BUNDLE=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
sudo apt update 
sudo apt -y install curl
# will see all the pod info. In the worker pod, list all the pods that will be forbidden.
curl -H "Authorization: Bearer $TOKEN" https://kubernetes/api/v1/namespaces/default/pods

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @ducviet00 for this contribution and @Yicheng-Lu-llll for the review!

@kevin85421
Copy link
Member

This PR only updates Helm chart, so it has no relationship with the failed tests. Merged.

@kevin85421 kevin85421 merged commit a0ee1c8 into ray-project:master Mar 21, 2023
@yc2984
Copy link

yc2984 commented Apr 5, 2023

Hi, is it possible to provide an end-to-end example, including all required configs, e.g. do you need to create those Kubernetes service account separately and how to mount the secret into the ray node?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants