Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Integrate Dshuttle into PAI #4584

Merged
merged 68 commits into from
Oct 15, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
fd63e4d
update pai version (#4174)
Binyang2014 Feb 10, 2020
5b2caae
[runtime] Only disable job ssh when gangAllocation is set (#4208)
Binyang2014 Feb 17, 2020
491c549
[Runtime] Fix lint and UT (#4214)
Binyang2014 Feb 18, 2020
0ffa794
dshuttle
Binyang2014 Dec 6, 2019
5f1ddb2
deploy change
Binyang2014 Dec 9, 2019
2fdea3d
dshuttle log
Binyang2014 Dec 12, 2019
f51166e
rest server change
Binyang2014 Dec 17, 2019
84e7de2
another update
Binyang2014 Dec 23, 2019
9bdd41f
change to office image
Binyang2014 Dec 27, 2019
b36c7ed
change deploy script
Binyang2014 Apr 10, 2020
5f56a76
add node
Binyang2014 Apr 10, 2020
0849ace
change deploy
Binyang2014 Apr 11, 2020
3964cd6
more change
Binyang2014 Apr 11, 2020
579153c
more changes
Binyang2014 Apr 12, 2020
cab6688
dshuttle udpate
Binyang2014 Apr 13, 2020
e2a6981
dshuttle fix
Binyang2014 Apr 14, 2020
0f5388f
more change
Binyang2014 Apr 22, 2020
5aaa1d3
temp change
Binyang2014 May 6, 2020
3852e1e
config update
Binyang2014 May 9, 2020
95421fc
add fuse sidecar
Binyang2014 May 12, 2020
7b9f55a
add pv/pvc support
Binyang2014 May 26, 2020
6c5e6cb
clean deploy script
Binyang2014 May 27, 2020
ce99e25
rebase master
Binyang2014 May 27, 2020
a76e1e8
orgnize deploy code
Binyang2014 May 27, 2020
da2113c
deploy change
Binyang2014 May 28, 2020
5d962c4
update
Binyang2014 May 28, 2020
36b2656
update image
Binyang2014 May 29, 2020
8ca663d
udpate
Binyang2014 May 29, 2020
c742a55
add delete script
Binyang2014 May 29, 2020
631b485
update
Binyang2014 Jun 3, 2020
f03fcb4
add doc
Binyang2014 Jun 5, 2020
7ec2766
UI change
Binyang2014 Jun 5, 2020
5c2f3ba
fix bug
Binyang2014 Jun 15, 2020
626e441
fix updatedb make disk pressure issue
Binyang2014 Jun 18, 2020
76bb6ef
more changes
Binyang2014 Jul 7, 2020
449f461
more change
Binyang2014 Jul 18, 2020
fb09e4b
merge master
Binyang2014 Jul 30, 2020
8f04d5b
revert change
Binyang2014 Jul 30, 2020
1c070c6
change config
Binyang2014 Jul 30, 2020
21904a9
Merge branch 'master' into binyli/dshuttle
Binyang2014 Aug 5, 2020
c984b71
update config
Binyang2014 Aug 5, 2020
f82ef1c
update config
Binyang2014 Aug 5, 2020
f45fa81
add log4j
Binyang2014 Aug 5, 2020
4d882da
chnage updatedb.conf
Binyang2014 Aug 5, 2020
a2cc20c
add change updatedb command
Binyang2014 Aug 5, 2020
1e5e989
change pylon
Binyang2014 Aug 5, 2020
a8104b5
change pylon config
Binyang2014 Aug 6, 2020
ddfc63b
add dshuttle storage type
Binyang2014 Aug 6, 2020
2d647ff
add doc for deshuttle deploy
Binyang2014 Aug 6, 2020
c39db29
Merge branch 'master' into binyli/dshuttle
Binyang2014 Aug 25, 2020
b9a5453
Merge branch 'master' into binyli/dshuttle
Binyang2014 Sep 8, 2020
0d2b50c
Merge branch 'master' into binyli/dshuttle
Binyang2014 Sep 14, 2020
d70c58a
fix
Binyang2014 Sep 18, 2020
1bf0a90
fix pylon
Binyang2014 Sep 21, 2020
e6a137e
move to acr
Binyang2014 Sep 23, 2020
a0b0b78
fix
Binyang2014 Sep 23, 2020
cd25cab
revert some change
Binyang2014 Sep 23, 2020
5bd73e0
Merge branch 'binyli/dshuttle' of github.com:Microsoft/pai into binyl…
Binyang2014 Sep 23, 2020
9e0ba65
change docker tag
Binyang2014 Sep 23, 2020
fbb1829
fix
Binyang2014 Sep 23, 2020
56d4ab6
fix
Binyang2014 Sep 23, 2020
7711041
update
Binyang2014 Sep 23, 2020
3b62126
fix
Binyang2014 Sep 23, 2020
7c7d547
enhance doc
Binyang2014 Sep 23, 2020
a5d9b89
Add more docs
Binyang2014 Oct 13, 2020
d5b0cec
fix review comments
Binyang2014 Oct 14, 2020
e7c9723
change image version
Binyang2014 Oct 15, 2020
faf9a4e
fix
Binyang2014 Oct 15, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions src/dshuttle-csi/deploy/delete.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

# Copyright (c) Microsoft Corporation
# All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and
# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

pushd $(dirname "$0") > /dev/null

echo "Call stop script to stop all service first"
/bin/bash stop.sh || exit $?


popd > /dev/null
143 changes: 143 additions & 0 deletions src/dshuttle-csi/deploy/dshuttle-csi-daemon.yaml.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# Copyright (c) Microsoft Corporation
# All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and
# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

kind: DaemonSet
apiVersion: apps/v1
metadata:
name: dshuttle-csi-daemon
spec:
selector:
matchLabels:
app: dshuttle-csi-daemon
template:
metadata:
labels:
app: dshuttle-csi-daemon
spec:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
initContainers:
# This change aimed to avoid OS build file index for alluxio-fuse-fs, which will consume much disk space
- name: change-updatedb-conf
image: dshuttle.azurecr.io/dshuttle/dshuttle-csi:25037dc
imagePullPolicy: Always
securityContext:
runAsUser: 0
command: ["/bin/bash", "-c"]
args:
- FILE=/host-config/updatedb.conf && grep -q 'PRUNEFS=".*fuse.alluxio-fuse.*"' "$FILE" || echo $(sed '/PRUNEFS=/s/"$/ fuse.alluxio-fuse"/' "$FILE") > $FILE
volumeMounts:
- name: etc
mountPath: /host-config/updatedb.conf
subPath: updatedb.conf
containers:
- name: node-driver-registrar
image: quay.io/k8scsi/csi-node-driver-registrar:v1.0.2
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "rm -rf /registration/dshuttle-reg.sock /var/lib/kubelet/plugins/csi-dshuttle-plugin"]
args:
- --v=5
- --csi-address=/plugin/csi.sock
- --kubelet-registration-path=/var/lib/kubelet/plugins/csi-dshuttle-plugin/csi.sock
env:
- name: KUBE_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumeMounts:
- name: plugin-dir
mountPath: /plugin
- name: registration-dir
mountPath: /registration
- name: dshuttle-csi-daemon
securityContext:
privileged: true
runAsUser: 0
image: dshuttle.azurecr.io/dshuttle/dshuttle-csi:25037dc
command: ["/usr/local/bin/dshuttle-csi"]
args :
- "--v=4"
- "--nodeid=$(NODE_ID)"
- "--endpoint=$(CSI_ENDPOINT)"
env:
- name: ALLUXIO_CLIENT_HOSTNAME
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: ALLUXIO_CLIENT_JAVA_OPTS
value: " -Dalluxio.user.hostname=$(ALLUXIO_CLIENT_HOSTNAME) -Dalluxio.worker.hostname=$(ALLUXIO_CLIENT_HOSTNAME) "
- name: NODE_ID
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: CSI_ENDPOINT
value: unix://plugin/csi.sock
envFrom:
- configMapRef:
name: dshuttle-config
{%- if cluster_cfg['cluster']['common']['qos-switch'] == "true" %}
resources:
limits:
memory: "{{ cluster_cfg['dshuttle']['csi_daemon_limit_mem'] }}"
requests:
memory: "{{ cluster_cfg['dshuttle']['csi_daemon_request_mem'] }}"
{%- endif %}
imagePullPolicy: "Always"
volumeMounts:
- name: plugin-dir
mountPath: /plugin
- name: pods-mount-dir
mountPath: /var/lib/kubelet/pods
mountPropagation: "Bidirectional"
- name: dshuttle-domain
mountPath: /opt/domain
- name: fuse-logs
mountPath: /opt/alluxio/logs
- name: dshuttle-log-config
mountPath: /opt/alluxio/conf/log4j.properties
subPath: log4j.properties
volumes:
- name: plugin-dir
hostPath:
path: /var/lib/kubelet/plugins/csi-dshuttle-plugin
type: DirectoryOrCreate
- name: pods-mount-dir
hostPath:
path: /var/lib/kubelet/pods
type: Directory
- hostPath:
path: /var/lib/kubelet/plugins_registry
type: Directory
name: registration-dir
- name: dshuttle-domain
hostPath:
path: /tmp/alluxio-domain
type: "Directory"
- name: fuse-logs
hostPath:
path: /var/log/dshuttle
type: DirectoryOrCreate
- name: dshuttle-log-config
configMap:
name: dshuttle-log-config
- name: etc
hostPath:
path: /etc
imagePullSecrets:
- name: dshuttle-regcred
24 changes: 24 additions & 0 deletions src/dshuttle-csi/deploy/dshuttle-csi-driver.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Copyright (c) Microsoft Corporation
# All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and
# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

apiVersion: storage.k8s.io/v1beta1
kind: CSIDriver
metadata:
name: dshuttle
spec:
attachRequired: false
podInfoOnMount: true
36 changes: 36 additions & 0 deletions src/dshuttle-csi/deploy/service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Copyright (c) Microsoft Corporation
# All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and
# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

cluster-type:
- k8s

prerequisite:
- cluster-configuration
- dshuttle-master
- dshuttle-worker

template-list:
- dshuttle-csi-daemon.yaml
- start.sh
- stop.sh

start-script: start.sh
stop-script: stop.sh
delete-script: delete.sh

deploy-rules:
- in: pai-worker
31 changes: 31 additions & 0 deletions src/dshuttle-csi/deploy/start.sh.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/bin/bash

# Copyright (c) Microsoft Corporation
# All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and
# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

{%- if cluster_cfg['cluster']['common']['dshuttle'] == 'true' %}
pushd $(dirname "$0") > /dev/null

kubectl apply --overwrite=true -f dshuttle-csi-driver.yaml || exit $?
kubectl apply --overwrite=true -f dshuttle-csi-daemon.yaml || exit $?

sleep 10
# Wait until the service is ready.
PYTHONPATH="../../../deployment" python -m k8sPaiLibrary.monitorTool.check_pod_ready_status -w -k app -v dshuttle-csi-daemon || exit $?

popd > /dev/null
{%- endif %}
32 changes: 32 additions & 0 deletions src/dshuttle-csi/deploy/stop.sh.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/bin/bash

# Copyright (c) Microsoft Corporation
# All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and
# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

{%- if cluster_cfg['cluster']['common']['dshuttle'] == 'true' %}
pushd $(dirname "$0") > /dev/null

if kubectl get daemonset | grep -q "dshuttle-csi-daemon"; then
kubectl delete daemonset dshuttle-csi-daemon || exit $?
fi

if kubectl get csidriver | grep -q "dshuttle"; then
kubectl delete csidriver dshuttle || exit $?
fi

popd > /dev/null
{%- endif %}
45 changes: 45 additions & 0 deletions src/dshuttle-master/deploy/delete.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#!/bin/bash

# Copyright (c) Microsoft Corporation
# All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and
# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

pushd $(dirname "$0") > /dev/null

echo "Call stop to stop all dshuttle-master pod first"
/bin/bash stop.sh || exit $?

echo "Create dshuttle-master-delete configmap for deleting data on the host"
kubectl create configmap dshuttle-master-delete --from-file=dshuttle-master-delete/ --dry-run -o yaml | kubectl apply --overwrite=true -f - || exit $?

echo "Create cleaner daemon"
kubectl apply --overwrite=true -f delete.yaml || exit $?
hzy46 marked this conversation as resolved.
Show resolved Hide resolved
sleep 5

PYTHONPATH="../../../deployment" python -m k8sPaiLibrary.monitorTool.check_pod_ready_status -w -k app -v delete-batch-job-dshuttle-master || exit $?

echo "Dshuttle master clean job is done"
echo "Delete dshuttle master cleaner daemon and configmap"
if kubectl get daemonset | grep -q "delete-batch-job-dshuttle-master"; then
kubectl delete ds delete-batch-job-dshuttle-master || exit $?
fi

if kubectl get configmap | grep -q "dshuttle-master-delete"; then
kubectl delete configmap dshuttle-master-delete || exit $?
fi
sleep 5

popd > /dev/null
Loading