Skip to content

Commit

Permalink
Merge pull request #6 from wandb/agent-v0
Browse files Browse the repository at this point in the history
Agent v0
  • Loading branch information
bcsherma authored May 24, 2023
2 parents 98768b5 + 4beb78f commit d449291
Show file tree
Hide file tree
Showing 13 changed files with 9,719 additions and 1 deletion.
2 changes: 1 addition & 1 deletion .github/workflows/lint-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,4 @@ jobs:
- name: Run chart-testing (install)
env:
LICENSE: ${{ secrets.LICENSE }}
run: ct install --config ct.yaml --helm-extra-set-args --set=license=$LICENSE
run: ct install --charts ./charts/wandb --config ct.yaml --helm-extra-set-args --set=license=$LICENSE
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.vscode/
23 changes: 23 additions & 0 deletions charts/launch-agent/.helmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/
9 changes: 9 additions & 0 deletions charts/launch-agent/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
apiVersion: v2
name: launch-agent
description: A Helm chart for running the W&B Launch Agent in Kubernetes
type: application
version: 0.1.0
maintainers:
- name: wandb
email: [email protected]
url: https://wandb.com
20 changes: 20 additions & 0 deletions charts/launch-agent/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# W&B Launch Agent

This chart deploys the W&B Launch Agent to your Kubernetes cluster.

The launch agent is a Kubernetes Deployment that runs a container that connects to the W&B API and watches for new runs in one or more launch queues. When the agent pops a run off the queue(s), it will launch a Kubernetes Job to execute the run on the W&B user's behalf.

To deploy an agent, you will need to specify the following values:

- `agent.apiKey`: Your W&B API key
- `launchConfig`: The literal contents of a launch agent config file that will be used to configure the agent. See the [launch agent docs](https://docs.wandb.ai/guides/launch/run-agent) for more information.

You will likely want to modify the variable `agent.resources.limits.{cpu,mem}`, which default to `1000m`, and `1Gi` respectively.

You can provide these values by modifying the contents of [`values.yaml`](values.yaml) or by passing them in as command line arguments to `helm install`, e.g.

By default, this chart will also install [volcano](https://volcano.sh), but this can be disabled by setting `volcano=false`.

```bash
helm install <package-name> <launch-agent-chart-path> --set agent.apiKey=<your-api-key> --set-file launchConfig=<path-to-launch-config.yaml>
```
4 changes: 4 additions & 0 deletions charts/launch-agent/ci/basic-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
agent:
apiKey: "<an-api-key>"
launchConfig: |
queues: ["default"]
8 changes: 8 additions & 0 deletions charts/launch-agent/templates/configmap.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
apiVersion: v1
data:
launch-config.yaml: |
{{ required "Please set launchConfig to the contents of your agent config file" .Values.launchConfig | nindent 4 }}
kind: ConfigMap
metadata:
name: wandb-launch-configmap
namespace: wandb
44 changes: 44 additions & 0 deletions charts/launch-agent/templates/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: launch-agent
namespace: wandb
spec:
replicas: 1
selector:
matchLabels:
app: launch-agent
template:
metadata:
labels:
app: launch-agent
spec:
serviceAccountName: wandb-launch-serviceaccount
containers:
- name: launch-agent
image: {{ .Values.agent.image }}
resources:
{{- toYaml .Values.agent.resources | nindent 12 }}
imagePullPolicy: {{ .Values.agent.imagePullPolicy }}
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
runAsUser: 1000
capabilities:
drop: ["ALL"]
seccompProfile:
type: RuntimeDefault
env:
- name: WANDB_API_KEY
valueFrom:
secretKeyRef:
name: wandb-api-key
key: password
volumeMounts:
- name: wandb-launch-config
mountPath: /home/launch_agent/.config/wandb
readOnly: true
volumes:
- name: wandb-launch-config
configMap:
name: wandb-launch-configmap
9 changes: 9 additions & 0 deletions charts/launch-agent/templates/namespace.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
apiVersion: v1
kind: Namespace
metadata:
name: wandb
labels:
pod-security.kubernetes.io/enforce: baseline
pod-security.kubernetes.io/enforce-version: latest
pod-security.kubernetes.io/warn: baseline
pod-security.kubernetes.io/warn-version: latest
61 changes: 61 additions & 0 deletions charts/launch-agent/templates/rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: wandb-launch-serviceaccount
namespace: wandb
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: wandb
name: wandb-launch-agent
rules:
- apiGroups: [""]
resources: ["pods", "configmaps", "secrets", "pods/log"]
verbs: ["create", "get", "watch", "list", "update", "delete", "patch"]
- apiGroups: ["batch"]
resources: ["jobs", "jobs/status"]
verbs: ["create", "get", "watch", "list", "update", "delete", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: job-creator
rules:
- apiGroups: [""]
resources: ["pods", "pods/log", "secrets"]
verbs: ["create", "get", "watch", "list", "update", "delete", "patch"]
- apiGroups: ["batch"]
resources: ["jobs", "jobs/status"]
verbs: ["create", "get", "watch", "list", "update", "delete", "patch"]
- apiGroups: ["batch.volcano.sh"]
resources: ["jobs", "jobs/status"]
verbs: ["create", "get", "watch", "list", "update", "delete", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: wandb-launch-role-binding
namespace: wandb
subjects:
- kind: ServiceAccount
name: wandb-launch-serviceaccount
namespace: wandb
roleRef:
kind: Role
name: wandb-launch-agent
apiGroup: rbac.authorization.k8s.io
---
# role binding to create ML jobs in another namespace (could use cluster role binding if we want to launch cluster wide)
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: wandb-launch-cluster-role-binding
subjects:
- kind: ServiceAccount
name: wandb-launch-serviceaccount
namespace: wandb
roleRef:
kind: ClusterRole
name: job-creator
apiGroup: rbac.authorization.k8s.io
8 changes: 8 additions & 0 deletions charts/launch-agent/templates/secret.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
apiVersion: v1
kind: Secret
metadata:
name: wandb-api-key
namespace: wandb
type: kubernetes.io/basic-auth
stringData:
password: {{ required "Please set agent.apiKey to a W&B API key" .Values.agent.apiKey }}
Loading

0 comments on commit d449291

Please sign in to comment.