Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Commit

Permalink
fix add/remove node doc issue (#5269)
Browse files Browse the repository at this point in the history
  • Loading branch information
suiguoxin authored Feb 2, 2021
1 parent c26313e commit 06eb934
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 23 deletions.
40 changes: 20 additions & 20 deletions docs/manual/cluster-admin/how-to-add-and-remove-nodes.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Log in to your dev box machine, find [the pre-kept folder `~/pai-deploy`](./inst

Find the file `~/pai-deploy/kubespray/inventory/pai/hosts.yml`, and follow the steps below to modify it.

Supposing you want to add 2 worker nodes into your cluster and their hostnames are `a` and `b`. Add these 2 nodes into the `hosts.yml`. An example:
Supposing you want to add 2 worker nodes into your cluster and their hostnames are `new-worker-node-0` and `new-worker-node-1`. Add these 2 nodes into the `hosts.yml`. An example:

```yaml
all:
Expand All @@ -35,15 +35,15 @@ all:
...

############# Example start ###################
a:
new-worker-node-0:
ip: x.x.x.x
access_ip: x.x.x.x
ansible_host: x.x.x.x
ansible_ssh_user: "username"
ansible_ssh_pass: "your-password-here"
ansible_become_pass: "your-password-here"
ansible_ssh_extra_args: '-o StrictHostKeyChecking=no'
b:
new-worker-node-1:
ip: x.x.x.x
access_ip: x.x.x.x
ansible_host: x.x.x.x
Expand All @@ -65,8 +65,8 @@ all:
origin4:

############# Example start ###################
a:
b:
new-worker-node-0:
new-worker-node-1:
############## Example end ####################

gpu:
Expand All @@ -75,8 +75,8 @@ all:

############# Example start ###################
#### If the worker doesn't have GPU, please don't add them here.
a:
b:
new-worker-node-0:
new-worker-node-1:
############## Example end ####################

etcd:
Expand All @@ -95,38 +95,38 @@ all:
Go into folder `~/pai-deploy/kubespray/`, run:

```bash
ansible-playbook -i inventory/pai/hosts.yml scale.yml -b --become-user=root -e "node=a,b" -e "@inventory/pai/openpai.yml"
ansible-playbook -i inventory/pai/hosts.yml cluster.yml -b --become-user=root --limit=new-worker-node-0,new-worker-node-1 -e "@inventory/pai/openpai.yml"
```

The nodes to add are specified with `-e` flag.
The nodes to add are specified with the `--limit` flag.

### Update OpenPAI Service Configuration

Find your [service configuration file `layout.yaml` and `services-configuration.yaml`](./basic-management-operations.md#pai-service-management-and-paictl) in `~/pai-deploy/cluster-cfg`.

- Add the new node into `machine-list` field in `layout.yaml`
- Add the new node into `machine-list` field in `layout.yaml`, create a new `machine-sku` if necessary. Refer to [layout.yaml](./installation-guide.md#layoutyaml-format) for schema requirements.

```yaml
machine-list:
- hostname: a
- hostname: new-worker-node--0
hostip: x.x.x.x
machine-type: xxx-sku
pai-worker: "true"
- hostname: b
- hostname: new-worker-node-1
hostip: x.x.x.x
machine-type: xxx-sku
pai-worker: "true"
```

- If you are using hived scheduler, you should modify its setting in `services-configuration.yaml` properly. Please refer to [how to set up virtual clusters](./how-to-set-up-virtual-clusters.md) and the [hived scheduler doc](https://github.com/microsoft/hivedscheduler/blob/master/doc/user-manual.md) for details. If you are using Kubernetes default scheduler, you can skip this step.

- Stop the service, push the latest configuration, and then start services:
- Stop the service, push the latest configuration, and then start related services:

```bash
./paictl.py service stop -n cluster-configuration hivedscheduler rest-server
./paictl.py service stop -n cluster-configuration hivedscheduler rest-server job-exporter
./paictl.py config push -p <config-folder> -m service
./paictl.py service start -n cluster-configuration hivedscheduler rest-server
./paictl.py service start -n cluster-configuration hivedscheduler rest-server job-exporter
```

If you have configured any PV/PVC storage, please confirm the added worker node meets the PV's requirements. See [Confirm Worker Nodes Environment](./how-to-set-up-storage.md#confirm-environment-on-worker-nodes) for details.
Expand All @@ -139,17 +139,17 @@ To remove nodes from the cluster, there is no need to modify `hosts.yml`.
Go into `~/pai-deploy/kubespray/`, run

```bash
ansible-playbook -i inventory/pai/hosts.yml remove-node.yml -b --become-user=root -e "node=a,b" -e "@inventory/pai/openpai.yml"
ansible-playbook -i inventory/pai/hosts.yml remove-node.yml -b --become-user=root -e "node=worker-node-to-remove-0,worker-node-to-remove-1" -e "@inventory/pai/openpai.yml"
```

The nodes to remove are specified with `-e` flag.
The nodes to remove are specified with the `-e` flag.

Modify the `layout.yaml` and `services-configuration.yaml`.

Stop the service, push the latest configuration, and then start services:
Stop the service, push the latest configuration, and then start related services:

```bash
./paictl.py service stop -n cluster-configuration hivedscheduler rest-server
./paictl.py service stop -n cluster-configuration hivedscheduler rest-server job-exporter
./paictl.py config push -p <config-folder> -m service
./paictl.py service start -n cluster-configuration hivedscheduler rest-server
./paictl.py service start -n cluster-configuration hivedscheduler rest-server job-exporter
```
4 changes: 1 addition & 3 deletions docs/manual/cluster-admin/installation-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,9 +170,7 @@ Please edit `layout.yaml` and a `config.yaml` file under `<pai-code-dir>/contrib
These two files spedify the cluster layout and the customized configuration, respectively.
The following is the format and example of these 2 files.

#### Tips for China Users

If you are a China user, before you edit these files, please refer to [here](./configuration-for-china.md) first.
**Tips for China Users**: If you are a China user, before you edit these files, please refer to [here](./configuration-for-china.md) first.

#### `layout.yaml` format

Expand Down

0 comments on commit 06eb934

Please sign in to comment.