Skip to content

Commit

Permalink
[Breaking] Generalize LRE to arbitrary toolchains (TraceMachina#728)
Browse files Browse the repository at this point in the history
This refactors the entire remote execution setup.

We now use "base" images to supply toolchains and have wrappers to
create nativelink workers from those base images. This allows us to
"enrich" arbitrary toolchain containers to turn them into nativelink
workers.

In other words, we now have a framework to import non-Nix containers
into our Nix infrastructure, such as "classic" Ubuntu-based toolchain
containers.

Toolchain generation is now arbitrarily fine-grained. In practice, this
means that for instance the Java and C++ toolchains are now separate
entities. This has a large impact on the efficiency of multi-toolchain
deployments. The Kubernetes example has been updated accordingly.

As a side effect of the new container structures the K8s deployment now
works without root permissions in the nativelink containers.

The LRE infrastructure is now treated as a special case of the new
toolchain setup process. The `rbe-configs-gen` logic is now an
implementation detail and the generator logic is no longer carried over
into the final worker images. This brings down the image size for the
LRE containers from ~2.5GB to ~1.7GB for C++ and ~600MB for Java. The
slight overall reduction in container sizes is due to the omission of
the Bazel executable. Bazel is required to generate the Starlark
toolchain configurations but doesn't have to be present in the final
worker images.
  • Loading branch information
aaronmondal authored Mar 19, 2024
1 parent 4095e97 commit 1a43ef9
Show file tree
Hide file tree
Showing 38 changed files with 966 additions and 237 deletions.
3 changes: 2 additions & 1 deletion .bazelignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,5 @@ bazel-remote-nativelink
bazel-root
bazel-testlogs
bazel-nativelink
local-remote-execution
local-remote-execution/generated-cc
local-remote-execution/generated-java
6 changes: 3 additions & 3 deletions .bazelrc
Original file line number Diff line number Diff line change
Expand Up @@ -91,9 +91,9 @@ build:linux_zig --repo_env=BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1


# Local Remote Execution.
build:lre --host_platform=@local-remote-execution//generated/config:platform
build:lre --extra_toolchains=@local-remote-execution//generated/config:cc-toolchain"
build:lre --extra_toolchains=@local-remote-execution//generated/java:all"
build:lre --extra_execution_platforms=@local-remote-execution//generated-cc/config:platform
build:lre --extra_toolchains=@local-remote-execution//generated-cc/config:cc-toolchain"
build:lre --extra_toolchains=@local-remote-execution//generated-java/java:all"

# See: https://github.com/bazelbuild/bazel/issues/19714#issuecomment-1745604978
build:lre --repo_env=BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1
Expand Down
38 changes: 17 additions & 21 deletions deployment-examples/kubernetes/01_operations.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,24 +9,20 @@ SRC_ROOT=$(git rev-parse --show-toplevel)

kubectl apply -f ${SRC_ROOT}/deployment-examples/kubernetes/gateway.yaml

IMAGE_TAG=$(nix eval .#image.imageTag --raw)
$(nix build .#image --print-build-logs --verbose) \
&& ./result \
| skopeo \
copy \
--dest-tls-verify=false \
docker-archive:/dev/stdin \
docker://localhost:5001/nativelink:local
IMAGE_TAG=$(nix eval .#lre.imageTag --raw)
echo $IMAGE_TAG
$(nix build .#lre --print-build-logs --verbose) \
&& ./result \
| skopeo \
copy \
--dest-tls-verify=false \
docker-archive:/dev/stdin \
docker://localhost:5001/nativelink-toolchain:local
# The image for the scheduler and CAS.
nix run .#image.copyTo \
docker://localhost:5001/nativelink:local \
-- \
--dest-tls-verify=false

# The worker image for C++ actions.
nix run .#nativelink-worker-lre-cc.copyTo \
docker://localhost:5001/nativelink-worker-lre-cc:local \
-- \
--dest-tls-verify=false

# The worker image for Java actions.
nix run .#nativelink-worker-lre-java.copyTo \
docker://localhost:5001/nativelink-worker-lre-java:local \
-- \
--dest-tls-verify=false
13 changes: 9 additions & 4 deletions deployment-examples/kubernetes/02_application.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,17 @@

KUSTOMIZE_DIR=$(git rev-parse --show-toplevel)/deployment-examples/kubernetes

sed "s/__NATIVELINK_TOOLCHAIN_TAG__/$(nix eval .#lre.imageTag --raw)/g" \
"$KUSTOMIZE_DIR/worker.json.template" \
> "$KUSTOMIZE_DIR/worker.json"
sed "s/__LRE_CC_TOOLCHAIN_TAG__/$(nix eval .#lre-cc.imageTag --raw)/g" \
"$KUSTOMIZE_DIR/worker-lre-cc.json.template" \
> "$KUSTOMIZE_DIR/worker-lre-cc.json" \
sed "s/__LRE_JAVA_TOOLCHAIN_TAG__/$(nix eval .#lre-java.imageTag --raw)/g" \
"$KUSTOMIZE_DIR/worker-lre-java.json.template" \
> "$KUSTOMIZE_DIR/worker-lre-java.json" \
kubectl apply -k "$KUSTOMIZE_DIR"
kubectl rollout status deploy/nativelink-cas
kubectl rollout status deploy/nativelink-scheduler
kubectl rollout status deploy/nativelink-worker
kubectl rollout status deploy/nativelink-worker-lre-cc
kubectl rollout status deploy/nativelink-worker-lre-java
10 changes: 7 additions & 3 deletions deployment-examples/kubernetes/03_delete_application.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,12 @@

KUSTOMIZE_DIR=$(git rev-parse --show-toplevel)/deployment-examples/kubernetes

sed "s/__NATIVELINK_TOOLCHAIN_TAG__/$(nix eval .#lre.imageTag --raw)/g" \
"$KUSTOMIZE_DIR/worker.json.template" \
> "$KUSTOMIZE_DIR/worker.json"
sed "s/__LRE_CC_TOOLCHAIN_TAG__/$(nix eval .#lre-cc.imageTag --raw)/g" \
"$KUSTOMIZE_DIR/worker-lre-cc.json.template" \
> "$KUSTOMIZE_DIR/worker-lre-cc.json" \
sed "s/__LRE_JAVA_TOOLCHAIN_TAG__/$(nix eval .#lre-java.imageTag --raw)/g" \
"$KUSTOMIZE_DIR/worker-lre-java.json.template" \
> "$KUSTOMIZE_DIR/worker-lre-java.json" \
kubectl delete -k "$KUSTOMIZE_DIR"
4 changes: 2 additions & 2 deletions deployment-examples/kubernetes/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Kubernetes example

This deployment sets up a 3-container deployment with separate CAS, scheduler
This deployment sets up a 4-container deployment with separate CAS, scheduler
and worker. Don't use this example deployment in production. It's insecure.

In this example we're using `kind` to set up the cluster `cilium` to provide a
Expand All @@ -19,7 +19,7 @@ execution containers and makes them available to the cluster:
./01_operations.sh
```

Finally deploy NativeLink:
Finally, deploy NativeLink:

```bash
./02_application.sh
Expand Down
10 changes: 7 additions & 3 deletions deployment-examples/kubernetes/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
resources:
- cas.yaml
- scheduler.yaml
- worker.yaml
- worker-lre-cc.yaml
- worker-lre-java.yaml
- routes.yaml

configMapGenerator:
Expand All @@ -12,9 +13,12 @@ configMapGenerator:
- name: scheduler
files:
- scheduler.json
- name: worker
- name: worker-lre-cc
files:
- worker.json
- worker-lre-cc.json
- name: worker-lre-java
files:
- worker-lre-java.json

secretGenerator:
- name: tls-secret
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@
"fast_slow": {
"fast": {
"filesystem": {
"content_path": "/root/.cache/nativelink/data-worker-test/content_path-cas",
"temp_path": "/root/.cache/nativelink/data-worker-test/tmp_path-cas",
"content_path": "~/.cache/nativelink/data-worker-test/content_path-cas",
"temp_path": "~/.cache/nativelink/data-worker-test/tmp_path-cas",
"eviction_policy": {
// 10gb.
"max_bytes": 10000000000,
Expand All @@ -49,7 +49,7 @@
"upload_action_result": {
"ac_store": "GRPC_LOCAL_AC_STORE",
},
"work_directory": "/root/.cache/nativelink/work",
"work_directory": "~/.cache/nativelink/work",
"platform_properties": {
"cpu_count": {
"query_cmd": "nproc"
Expand All @@ -58,7 +58,20 @@
"values": ["Linux"]
},
"container-image": {
"values": ["docker://nativelink-toolchain:__NATIVELINK_TOOLCHAIN_TAG__"]
"values": [
// WARNING: This is *not* the container that is actually deployed
// here. The generator container in this example was
// `rbe-autogen-lre-cc:<sometag>` and the platform was modified
// after the fact to be `lre-cc:<sometag>`. The deployed container
// we use as worker is `nativelink-worker-lre-cc:<sometag>` which is
// a completely separate extension of the `lre-cc` base image.
//
// Treat the `docker//:...` string below as nothing more than a raw
// string that is matched by the scheduler against the value
// specified in the `exec_properties` of the corresponding platform
// at `local-remote-execution/generated-cc/config/BUILD`.
"docker://lre-cc:__LRE_CC_TOOLCHAIN_TAG__",
]
}
}
}
Expand Down
35 changes: 35 additions & 0 deletions deployment-examples/kubernetes/worker-lre-cc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nativelink-worker-lre-cc
spec:
replicas: 1
selector:
matchLabels:
app: nativelink-worker-lre-cc
template:
metadata:
labels:
app: nativelink-worker-lre-cc
spec:
containers:
- name: nativelink-worker-lre-cc
image: "localhost:5001/nativelink-worker-lre-cc:local"
env:
- name: RUST_LOG
value: warn
- name: CAS_ENDPOINT
value: nativelink-cas
- name: SCHEDULER_ENDPOINT
value: nativelink-scheduler
volumeMounts:
- name: worker-lre-cc-config
mountPath: /worker-lre-cc.json
subPath: worker-lre-cc.json
command: ["/bin/nativelink"]
args: ["/worker-lre-cc.json"]
volumes:
- name: worker-lre-cc-config
configMap:
name: worker-lre-cc
80 changes: 80 additions & 0 deletions deployment-examples/kubernetes/worker-lre-java.json.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
{
"stores": {
"GRPC_LOCAL_STORE": {
// Note: This file is used to test GRPC store.
"grpc": {
"instance_name": "main",
"endpoints": [
{"address": "grpc://${CAS_ENDPOINT:-127.0.0.1}:50051"}
],
"store_type": "cas"
}
},
"GRPC_LOCAL_AC_STORE": {
// Note: This file is used to test GRPC store.
"grpc": {
"instance_name": "main",
"endpoints": [
{"address": "grpc://${CAS_ENDPOINT:-127.0.0.1}:50051"}
],
"store_type": "ac"
}
},
"WORKER_FAST_SLOW_STORE": {
"fast_slow": {
"fast": {
"filesystem": {
"content_path": "~/.cache/nativelink/data-worker-test/content_path-cas",
"temp_path": "~/.cache/nativelink/data-worker-test/tmp_path-cas",
"eviction_policy": {
// 10gb.
"max_bytes": 10000000000,
}
}
},
"slow": {
"ref_store": {
"name": "GRPC_LOCAL_STORE",
}
}
}
}
},
"workers": [{
"local": {
"worker_api_endpoint": {
"uri": "grpc://${SCHEDULER_ENDPOINT:-127.0.0.1}:50061",
},
"cas_fast_slow_store": "WORKER_FAST_SLOW_STORE",
"upload_action_result": {
"ac_store": "GRPC_LOCAL_AC_STORE",
},
"work_directory": "~/.cache/nativelink/work",
"platform_properties": {
"cpu_count": {
"query_cmd": "nproc"
},
"OSFamily": {
"values": ["Linux"]
},
"container-image": {
"values": [
// WARNING: This is *not* the container that is actually deployed
// here. The generator container in this example was
// `rbe-autogen-lre-java:<sometag>` and the platform was modified
// after the fact to be `lre-java:<sometag>`. The deployed container
// we use as worker is `nativelink-worker-lre-java:<sometag>` which
// is a completely separate extension of the `lre-java` base image.
//
// Treat the `docker//:...` string below as nothing more than a raw
// string that is matched by the scheduler against the value
// specified in the `exec_properties` of the corresponding platform
// at `local-remote-execution/generated-java/config/BUILD`.
"docker://lre-java:__LRE_JAVA_TOOLCHAIN_TAG__",
]
}
}
}
}],
"servers": []
}
35 changes: 35 additions & 0 deletions deployment-examples/kubernetes/worker-lre-java.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nativelink-worker-lre-java
spec:
replicas: 1
selector:
matchLabels:
app: nativelink-worker-lre-java
template:
metadata:
labels:
app: nativelink-worker-lre-java
spec:
containers:
- name: nativelink-worker-lre-java
image: "localhost:5001/nativelink-worker-lre-java:local"
env:
- name: RUST_LOG
value: warn
- name: CAS_ENDPOINT
value: nativelink-cas
- name: SCHEDULER_ENDPOINT
value: nativelink-scheduler
volumeMounts:
- name: worker-lre-java-config
mountPath: /worker-lre-java.json
subPath: worker-lre-java.json
command: ["/bin/nativelink"]
args: ["/worker-lre-java.json"]
volumes:
- name: worker-lre-java-config
configMap:
name: worker-lre-java
35 changes: 0 additions & 35 deletions deployment-examples/kubernetes/worker.yaml

This file was deleted.

Loading

0 comments on commit 1a43ef9

Please sign in to comment.