Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Commit

Permalink
Merge branch 'master' into zimiao/clean_job
Browse files Browse the repository at this point in the history
  • Loading branch information
mzmssg authored Sep 7, 2018
2 parents 68b45d6 + 19f68a0 commit 8ba7209
Showing 1 changed file with 20 additions and 17 deletions.
37 changes: 20 additions & 17 deletions pai-management/doc/cluster-bootup.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ With the cluster being set up, the steps to bring PAI up on it are as follows:
## Customized deploy <a name="customizeddeploy"></a>

### Steps:

- [Step 0. Prepare the dev-box](#c-step-0)
- [Step 1. Prepare the quick-start.yaml file](#c-step-1)
- [Step 2. Generate OpenPAI configuration files](#c-step-2)
Expand All @@ -49,6 +50,8 @@ Please refer to this [section](./how-to-setup-dev-box.md) for the customize sett

##### (1) Run your dev-box

Notice that `dev-box` should run on a machine outside of PAI cluster, it shouldn't run on any PAI cluster node.

```bash

# Pull the dev-box image from Docker Hub
Expand Down Expand Up @@ -119,11 +122,11 @@ sudo docker ps

### Step 1. Prepare the quick-start.yaml file <a name="c-step-1"></a>

Prepare the file under dev-box folder: /pai/pai-management/quick-start
Prepare the file under dev-box folder: /pai/pai-management/quick-start

There is a example file under path: /pai/pai-management/quick-start/quick-start-example.yaml
There is a example file under path: /pai/pai-management/quick-start/quick-start-example.yaml

An example yaml file is shown below. Note that you should change the IP address of the machine and ssh information accordingly.
An example yaml file is shown below. Note that you should change the IP address of the machine and ssh information accordingly.

```yaml
# quick-start.yaml
Expand Down Expand Up @@ -170,7 +173,7 @@ python paictl.py cluster generate-configuration -i /pai/pai-management/quick-sta

```

##### (2) update docker tag to release version
##### (2) update docker tag to release version

```bash
vi ~/pai-config/services-configuration.yaml
Expand Down Expand Up @@ -198,7 +201,7 @@ Please refer to this [section](./how-to-write-pai-configuration.md) for the deta

### Step 3(Optional). Customize configure OpenPAI <a name="c-step-3"></a>

This method is for advanced users.
This method is for advanced users.

The description of each field in these configuration files can be found in [A Guide For Cluster Configuration](how-to-write-pai-configuration.md).

Expand All @@ -213,9 +216,9 @@ If user want to customize configuration, please see the table below
- [configure customize docker repository](./how-to-write-pai-configuration.md#docker_repo)
- [configure OpenPAI admin user account](./how-to-write-pai-configuration.md#configure_user_acc)
- port / data folder etc.
- [configure service entry](./how-to-write-pai-configuration.md#configure_service_entry)
- [configure service entry](./how-to-write-pai-configuration.md#configure_service_entry)
- [configure HDFS data / OpenPAI temp data folder](./how-to-write-pai-configuration.md#data_folder)
- component version
- component version
- [configure K8s component version](./how-to-write-pai-configuration.md#k8s_component)
- [configure docker version](./how-to-write-pai-configuration.md#docker_repo)
- [configure nvidia gpu driver version](./how-to-write-pai-configuration.md#driver_version)
Expand All @@ -240,7 +243,7 @@ If user want to customize configuration, please see the table below
- [YARN / HDFS](./how-to-write-pai-service-configuration.md#hadoop)
- [Zookeeper](./how-to-write-pai-service-configuration.md#zookeeper)
- Monitor
- [Prometheus / Exporter](./how-to-write-pai-service-configuration.md#prometheus)
- [Prometheus / Exporter](./how-to-write-pai-service-configuration.md#prometheus)
- [Grafana](./how-to-write-pai-service-configuration.md#grafana)
- [Appendix: Default values in auto-generated configuration files](./how-to-write-pai-configuration.md#appendix)

Expand Down Expand Up @@ -311,17 +314,17 @@ http://<master>:9090/#!/pod?namespace=default

Where `<master>` is the same as in the previous [section](#step-2).

## Singlebox deploy <a name="singlebox"></a>
## Singlebox deploy <a name="singlebox"></a>

### Steps:

- [Step 0. Prepare the dev-box](#c-step-0)

- Step 1. Prepare the quick-start.yaml file

Prepare the file under dev-box folder: /pai/pai-management/quick-start
Prepare the file under dev-box folder: /pai/pai-management/quick-start

There is a example file under path: /pai/pai-management/quick-start/quick-start-example.yaml
There is a example file under path: /pai/pai-management/quick-start/quick-start-example.yaml

An example yaml file is shown below. Note that you should change the IP address of the machine and ssh information accordingly.

Expand Down Expand Up @@ -370,7 +373,7 @@ ssh-password: pai-password
- [3 Getting help](#troubleshooting_3)

### 1 Troubleshooting OpenPAI services <a name="troubleshooting_1"></a>

#### 1.1 Diagnosing the problem <a name="troubleshooting_1.1"></a>

- Monitor
Expand Down Expand Up @@ -415,7 +418,7 @@ As OpenPAI services are deployed on kubernetes, please refer [debug kubernetes p

#### 1.2 Fix problem <a name="troubleshooting_1.2"></a>
- Update OpenPAI Configuration

Check and refine 4 yaml files:

```
Expand All @@ -425,15 +428,15 @@ Check and refine 4 yaml files:
- serivices-configuration.yaml
```

- Customize config for specific service
- Customize config for specific service

If user want to customize single service, you could find service config file at [pai-management/bootstrap](../bootstrap) and find image dockerfile at [pai-management/src](../src).

- Update Code & Image

- Customize image dockerfile or code

User could find service's image dockerfile at [pai-management/src](#pai-management/src) and customize them.
User could find service's image dockerfile at [pai-management/src](#pai-management/src) and customize them.

- Rebuild image

Expand Down Expand Up @@ -463,7 +466,7 @@ python paictl.py service stop \
[ -n service-name ]
```

If the -n parameter is specified, only the given service, e.g. rest-server, webportal, watchdog, etc., will be stopped. If not, all PAI services will be stopped.
If the -n parameter is specified, only the given service, e.g. rest-server, webportal, watchdog, etc., will be stopped. If not, all PAI services will be stopped.

2. ```Boot up single all OpenPAI services.```

Expand All @@ -476,7 +479,7 @@ Please refer [Kubernetes Troubleshoot Clusters](https://kubernetes.io/docs/tasks
### 3 Getting help <a name="troubleshooting_3"></a>

- [StackOverflow:](../../docs/stackoverflow.md) If you have questions about OpenPAI, please submit question at Stackoverflow under tag: openpai
- [Report an issue:](https://github.com/Microsoft/pai/wiki/Issue-tracking) If you have issue/ bug/ new feature, please submit it at Github
- [Report an issue:](https://github.com/Microsoft/pai/wiki/Issue-tracking) If you have issue/ bug/ new feature, please submit it at Github

## Maintenance <a name="maintenance"></a>
#### [Service Upgrading](./machine-maintenance.md#service-maintain.md)
Expand Down

0 comments on commit 8ba7209

Please sign in to comment.