OpenNeuroOrg · nellh · Sep 27, 2022 · Sep 13, 2022 · Sep 14, 2022 · Sep 19, 2022
diff --git a/helm/README.md b/helm/README.md
@@ -4,76 +4,55 @@ name: Kubernetes Deployment
 
 # OpenNeuro Kubernetes Deployment
 
-This chart is used to deploy a copy of OpenNeuro and all required services excluding any CDN, MongoDB, and ElasticSearch.
+This chart is used to deploy a copy of OpenNeuro and all required services excluding MongoDB, and ElasticSearch.
 
-On AWS, this chart is deployed using Amazon's managed Kubernetes service (EKS). An ingress creates the load balancer routing to backend services and this is fronted by CloudFront for caching.
+On GCP, this chart is designed to support GKE with Autopilot for deployment. Only worker disks and cluster creation are configured outside of this chart.
 
 Written for Helm 3.0.0 or later
 
 ## Major components
 
 - API deployment - GraphQL service (@openneuro/server npm package)
 - DataLad service deployment - Falcon server for microservice operations on datasets
-- Web deployment - SSR and static resources including the React application (@openneuro/app npm package)
+- Web deployment - Nginx serving static resources including the React application (@openneuro/app npm package)
 
 ## Pre-requisites
 
-Install [Helm](https://helm.sh/), [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/), and (optionally) [eksctl](https://eksctl.io/).
+Install [Helm](https://helm.sh/), [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/), and [gcloud](https://cloud.google.com/sdk/docs/install).
 
-Helm manages configuration templates. Kubectl makes API calls to Kubernetes on your behalf. eksctl configures AWS specific EKS resources to simplify control plane setup and is most useful when creating a new cluster or changing node groups.
+Helm manages configuration templates. Kubectl makes API calls to Kubernetes on your behalf. gcloud is used to create and authenticate with the cluster.
 
 ## Cluster Setup
 
-### Create a cluster on AWS
+### gcloud setup
 
-```bash
-eksctl create cluster --name=my-cluster-name --nodegroup-name=general --nodes=2 --instance-type=c5a.xlarge --node-ami-family=Ubuntu1804
-```
-
-This should configure the cluster and setup credentials and command context for later kubectl and helm commands. If you encounter errors here, your user likely lacks access to manage EC2, EKS, or CloudFormation resources on the AWS account.
-
-OpenNeuro uses at least two node groups to run. A general node group created as above and a secondary node group assigned to storage resources only.
+Set the default project to use for gcloud commands.
 
 ```bash
-eksctl create nodegroup --cluster=my-cluster-name --nodes=2 --instance-type=m5ad.xlarge --name=storage
+gcloud config set project hs-openneuro
 ```
 
-Example eksctl configurations from the main OpenNeuro instance are available in [staging](eksctl-cluster-prod.yaml) and [production](eksctl-cluster-staging.yaml) configurations.
-
-### Storage setup
-
-OpenEBS is used to manage volume allocation for worker nodes. Your Kubernetes nodes requires OpenZFS configuration. See [OpenEBS for supported versions](https://github.com/openebs/zfs-localpv#prerequisites). This can be built into the AMI on EKS or installed at node creation by eksctl as in the above example cluster configuration files.
-
-Storage pool nodes should be labeled to allow migration of the EBS disks on EKS updates. Label each node like so - this must be done before installing zfs-localpv the first time.
+### Create a cluster
 
 ```bash
-kubectl label node node-1 openebs.io/nodeid=pool-a
-kubectl label node node-2 openebs.io/nodeid=pool-b
+gcloud container clusters create-auto openneuro-dev --region=us-west1
 ```
 
-Once the cluster is running, initialize the CSI driver for OpenEBS ZFS LocalPV following the [install instructions](https://github.com/openebs/zfs-localpv#setup).
-
-### Setup and access Kubernetes dashboard
+This will configure the cluster and setup credentials and command context for later kubectl and helm commands. This requires IAM permissions for Kubernetes Engine.
 
-To install:
+OpenNeuro runs with autopilot which automatically allocates node resources as requested by the container requests: field.
 
-```bash
-helm install dashboard stable/kubernetes-dashboard
-```
+### Storage setup
 
-To access:
+pd-standard is balanced performance using SSDs. This provides sufficient git operation performance for interactive use of multiple datasets sharing one worker.
 
 ```bash
-# Setup a port forward to the Dashboard pod
-export POD_NAME=$(kubectl get pods -n default -l "app=kubernetes-dashboard,release=dashboard" -o jsonpath="{.items[0].metadata.name}")
-kubectl -n default port-forward $POD_NAME 8443:8443
-# Obtain an admin token
-kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep eks-admin | awk '{print $1}')
+gcloud compute disks create openneuro-staging-datasets-0 --zone us-west1-b --size 256Gi --type pd-standard
 ```
 
 ### Configuration
 
-This chart is AWS specific at the moment, as OpenNeuro requires EC2 EBS and ALB resources to run as configured. Pull requests welcome if you add support for other hosting environments.
+This chart is GCP specific at the moment due to mainly the disk configuration and load balancer ingress setup, minimal changes are required to run in other Kubernetes environments, mainly overriding the ingress and allocating moderately performant disks for the dataset worker containers.
 
 To get started create a `values.yaml` and `secrets.yaml` file. In values.yaml you will override any chart settings necessary for your target environment. For a minimal dev environment it may look like this:
 
@@ -82,41 +61,11 @@ hostname: my.dev.site.domain
 url: https://my.dev.site.domain
 environment: any-unique-string
 googleTrackingIds: ''
-storagePools:
-  stripeSize: 1099511627776 # 1TB EBS disks
-  pools:
-    - name: a
-      size: 2199023255552 # 2TB per pool
-    - name: b
-      size: 2199023255552
+workerDiskSize:
+  - id: projects/my-dev-project/zones/us-west1-b/disks/openneuro-dev-datasets-0
+    size: 256Gi
 ```
 
-Storage pools are local to a specific node. Generally you should add one pool for each node assigned to the storage node group. It is possible to assign multiple pools to one node but this will prevent even load distribution across volumes.
-
-Disks are automatically allocated by the pool size divided by stripe size. Each "stripe" is one block persistent volume backing the pool. Multiple volumes are sparsely allocated from the pool. The pool can be much smaller than the quota size for the volumes within it as long as the total requested storage is below the pool's real available size.
-
-The pool size can be adjusted automatically when increasing the size. To scale down a pool, the underlying EBS disks need to be removed from the pool first, and then manually removed.
-
-```bash
-# Locate the correct node, then run
-zpool remove nvme-Amazon_Elastic_Block_Store_vol0123457908104
-# Wait until the disk leaves the removing state, takes a while
-watch -n 60 zpool list -v
-```
-
-Remove the PVCs once the pool is no longer using them.
-
-```bash
-# Save the volume ID
-kubectl get pvc storage-pool-release-name-storage-pool-a-1
-# Delete PVC first
-kubectl delete pvc storage-pool-release-name-storage-pool-a-1
-# Delete the volume once freed (this will delete the EBS disk, so be sure here!)
-kubectl delete pv pvc-3a29528a-7b17-40b6-96a7-6385316fb401
-```
-
-Other values which can be overridden are found in the chart version of [values.yaml](charts/values.yaml).
-
 `secrets.yaml` contains any privileged configuration, like database connection strings or oauth secrets. Start with [secrets.yaml.example](secrets.yaml.example) and fill in each value. Most values are required, but you only need one authentication provider and mail, doi, and flower configuration is optional.
 
 ### Installing

diff --git a/helm/openneuro/Chart.yaml b/helm/openneuro/Chart.yaml
@@ -8,8 +8,5 @@ sources:
 appVersion: 4.10.0
 dependencies:
   - name: redis
-    version: 10.6.17
+    version: 17.1.4
     repository: https://charts.bitnami.com/bitnami
-  - name: aws-alb-ingress-controller
-    version: 1.0.0
-    repository: https://kubernetes-charts-incubator.storage.googleapis.com/
diff --git a/helm/openneuro/requirements.lock b/helm/openneuro/requirements.lock
@@ -1,12 +1,6 @@
 dependencies:
 - name: redis
   repository: https://charts.bitnami.com/bitnami
-  version: 10.6.17
-- name: aws-alb-ingress-controller
-  repository: https://kubernetes-charts-incubator.storage.googleapis.com/
-  version: 1.0.0
-- name: apm-server
-  repository: https://helm.elastic.co
-  version: 7.9.0
-digest: sha256:f90c303d40dcf002907179598ef0f1fda76b2edbc560f838910051aa9a2c21ab
-generated: "2020-08-21T14:22:28.124302759-07:00"
+  version: 17.1.4
+digest: sha256:7e7d63886296a858981054168160d78ed785ea99c89f2f04ad0f22b9447268fb
+generated: "2022-09-13T14:04:03.820425553-07:00"
diff --git a/helm/openneuro/templates/api-deployment.yaml b/helm/openneuro/templates/api-deployment.yaml
@@ -24,19 +24,22 @@ spec:
         - name: {{ .Release.Name }}-api
           image: 'openneuro/server:v{{ .Chart.AppVersion }}'
           resources:
-            limits:
-              cpu: "1.2"
-              memory: "2Gi"
             requests:
-              cpu: ".3"
-              memory: "768Mi"
+              cpu: {{ .Values.apiCpuRequests }}
+              memory: {{ .Values.apiMemoryRequests }}
           ports:
             - containerPort: 8111
           envFrom:
             - configMapRef:
                 name: {{ .Release.Name }}-configmap
             - secretRef:
                 name: {{ .Release.Name }}-secret
+          readinessProbe:
+            initialDelaySeconds: 15
+            periodSeconds: 30
+            httpGet:
+              path: '/crn/'
+              port: 8111
           livenessProbe:
             initialDelaySeconds: 60
             periodSeconds: 30

diff --git a/helm/openneuro/templates/api-service.yaml b/helm/openneuro/templates/api-service.yaml
@@ -7,8 +7,6 @@ metadata:
     chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
     release: "{{ .Release.Name }}"
     heritage: "{{ .Release.Service }}"
-  annotations:
-    alb.ingress.kubernetes.io/healthcheck-path: /crn/
 spec:
   ports:
   - port: 8111

diff --git a/helm/openneuro/templates/configmap.yaml b/helm/openneuro/templates/configmap.yaml
@@ -14,7 +14,5 @@ data:
   REDIS_PORT: "6379"
   GRAPHQL_ENDPOINT: http://{{ .Release.Name }}-api:8111/crn/graphql
   DATALAD_SERVICE_URI: {{ .Release.Name }}-dataset-worker
-  DATALAD_S3_PUBLIC_ON_EXPORT: "yes"
   LOCPATH: ""
-  SENTRY_DSN: {{ .Values.sentryDsn | quote }}
   ELASTIC_APM_SERVER_URL: {{ .Values.apmServerUrl }}
diff --git a/helm/openneuro/templates/datalad-worker-service.yaml b/helm/openneuro/templates/datalad-worker-service.yaml
@@ -4,8 +4,6 @@ apiVersion: v1
 kind: Service
 metadata:
   name: {{ $relname }}-dataset-worker-{{ . }}
-  annotations:
-    alb.ingress.kubernetes.io/healthcheck-path: /heartbeat
 spec:
   selector:
     statefulset.kubernetes.io/pod-name: {{ $relname }}-dataset-worker-{{ . }}

diff --git a/helm/openneuro/templates/dataset-worker-pv.yaml b/helm/openneuro/templates/dataset-worker-pv.yaml
@@ -0,0 +1,22 @@
+# The Worker PV matches GCP disks to statefulset claims (see dataset-worker-stateful-set)
+{{- $relname := .Release.Name -}}
+{{- range $index, $config := .Values.workerDiskSize }}
+apiVersion: v1
+kind: PersistentVolume
+metadata: 
+  name: datasets-{{ $relname }}-dataset-worker-{{ $index }}
+spec:
+  storageClassName: {{ $relname }}-datasets
+  capacity:
+    storage: {{ $config.size }}
+  accessModes:
+    - ReadWriteOnce
+  claimRef:
+    namespace: default
+    name: datasets-{{ $relname }}-dataset-worker-{{ $index }}
+  csi:
+    driver: pd.csi.storage.gke.io
+    volumeHandle: {{ $config.id }}
+    fsType: ext4
+---
+{{- end }}
diff --git a/helm/openneuro/templates/dataset-worker-stateful-set.yaml b/helm/openneuro/templates/dataset-worker-stateful-set.yaml
@@ -10,14 +10,13 @@ spec:
   replicas: {{ .Values.dataladWorkers }}
   volumeClaimTemplates:
     - metadata:
-        name: datasets-ebs
+        name: datasets
       spec:
-        storageClassName: {{ .Release.Name }}-datasets
         accessModes:
           - ReadWriteOnce
         resources:
           requests:
-            storage: 1Pi
+            storage: 10Gi
   template:
     metadata:
       labels:
@@ -26,12 +25,8 @@ spec:
         checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
         checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}
     spec:
-      tolerations:
-      - key: "storage"
-        operator: "Exists"
-        effect: "NoSchedule"
       nodeSelector:
-        role: storage
+        topology.kubernetes.io/zone: {{ .Values.workerZone }}
       volumes:
       - name: ssh-key
         secret:
@@ -45,12 +40,20 @@ spec:
         image: 'openneuro/datalad-service:v{{ .Chart.AppVersion }}'
         command: ["gunicorn", "--bind", "0.0.0.0:80", "--reload", "datalad_service.app:create_app('/datasets')", "--workers", "8", "--worker-class", "gevent", "--timeout", "60", "--keep-alive", "30"]
         resources:
-          limits:
-            cpu: "8"
-            memory: "12Gi"
           requests:
             cpu: {{ .Values.workerCpuRequests }}
-            memory: "4Gi"
+            memory: {{ .Values.workerMemoryRequests }}
+        readinessProbe:
+          periodSeconds: 5
+          initialDelaySeconds: 15
+          httpGet:
+            path: /heartbeat
+            port: 80
+        livenessProbe:
+          periodSeconds: 60
+          httpGet:
+            path: /heartbeat
+            port: 80
         ports:
         - containerPort: 80
         envFrom:
@@ -59,7 +62,7 @@ spec:
         - secretRef:
             name: {{ .Release.Name }}-secret
         volumeMounts:
-        - name: datasets-ebs
+        - name: datasets
           mountPath: /datasets
         - name: ssh-key
           mountPath: /datalad-key

diff --git a/helm/openneuro/templates/dataset-worker-storage-class.yaml b/helm/openneuro/templates/dataset-worker-storage-class.yaml
@@ -0,0 +1,10 @@
+apiVersion: storage.k8s.io/v1
+kind: StorageClass
+metadata:
+  name: {{ .Release.Name }}-datasets
+provisioner: pd.csi.storage.gke.io
+volumeBindingMode: WaitForFirstConsumer
+allowVolumeExpansion: true
+parameters:
+  type: pd-balanced
+  csi.storage.k8s.io/fstype: ext4
diff --git a/helm/openneuro/templates/indexer-job.yaml b/helm/openneuro/templates/indexer-job.yaml
@@ -1,4 +1,4 @@
-apiVersion: batch/v1beta1
+apiVersion: batch/v1
 kind: CronJob
 metadata:
   name: {{ .Release.Name }}-indexer
@@ -17,12 +17,9 @@ spec:
           - name: openneuro-indexer
             image: 'openneuro/indexer:v{{ .Chart.AppVersion }}'
             resources:
-              limits:
-                cpu: ".5"
-                memory: "512Mi"
               requests:
-                cpu: ".1"
-                memory: "256Mi"
+                cpu: ".25"
+                memory: "512Mi"
             envFrom:
               - configMapRef:
                   name: {{ .Release.Name }}-configmap