This repository has been archived by the owner on Jun 6, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 549
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Doc refactoring and update hello-world sample (#2445)
- Loading branch information
1 parent
a7525dc
commit 8944849
Showing
2 changed files
with
19 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -91,17 +91,17 @@ For a large size cluster, this section is still needed to generate default confi | |
|
||
#### Customize deployment | ||
|
||
As various hardware environments and different use scenarios, default configuration of OpenPAI may need to be updated. Following [Customize deployment](docs/pai-management/doc/how-to-generate-cluster-config.md#Optional-Step-3.-Customize-configure-OpenPAI) part to learn more details. | ||
As various hardware environments and different use scenarios, default configuration of OpenPAI may need to be optimized. Following [Customize deployment](docs/pai-management/doc/how-to-generate-cluster-config.md#Optional-Step-3.-Customize-configure-OpenPAI) part to learn more details. | ||
|
||
### Validate deployment | ||
|
||
After deployment, it's recommended to [validate key components of OpenPAI](docs/pai-management/doc/validate-deployment.md) in health status. After validation is success, [submit a hello-world job](docs/user/training.md) and check if it works end-to-end. | ||
|
||
### Train users before "train models" | ||
|
||
The common practice on OpenPAI is to submit job requests, and wait jobs got computing resource and executed. It's different experience with assigning dedicated servers to each one. People may feel computing resource is not in control and the learning curve may be higher than run job on dedicated servers. But shared resource on OpenPAI can improve productivity significantly and save time on maintaining environments. | ||
The common practice on OpenPAI is to submit job requests, and wait jobs got computing resource and executed. It's different experience with assigning dedicated servers to each one. People may feel computing resource is not in control and the learning curve may be higher than run job on dedicated servers. But shared resource on OpenPAI can improve utilization of resources and save time on maintaining environments. | ||
|
||
For administrators of OpenPAI, a successful deployment is first step, the second step is to let users of OpenPAI understand benefits and know how to use it. Users of OpenPAI can learn from [Train models](#train-models). But below content is for various scenarios and may be too much to specific users. So, a simplified document based on below content is easier to learn. | ||
For administrators of OpenPAI, a successful deployment is first step, the second step is to let users of OpenPAI understand benefits and know how to use it. Users can learn from [Train models](#train-models). But below part of training models is for various scenarios and maybe users doesn't need all of them. So, administrators can create simplified documents as users' actual scenarios. | ||
|
||
### FAQ | ||
|
||
|
@@ -111,23 +111,23 @@ If FAQ doesn't resolve it, refer to [here](#get-involved) to ask question or sub | |
|
||
## Train models | ||
|
||
Like all machine learning platforms, OpenPAI is a productive tool. To maximize utilization, it's recommended to submit training jobs and let OpenPAI to allocate resource and run it. If there are too many jobs, some jobs may be queued until enough resource available, and OpenPAI choose some server(s) to run a job. This is different with run code on dedicated servers, and it needs a bit more knowledge about how to submit/manage training jobs on OpenPAI. | ||
Like all machine learning platforms, OpenPAI is a productive tool. To maximize utilization of resources, it's recommended to submit training jobs and let OpenPAI to allocate resource and run it. If there are too many jobs, some jobs may be queued until enough resource available. This is different with run code on dedicated servers, and it needs a bit more knowledge about how to submit/manage training jobs on OpenPAI. | ||
|
||
Note, OpenPAI also supports to allocate on demand resource besides queuing jobs. Users can use SSH or Jupyter to connect like on a physical server, refer to [here](examples/jupyter/README.md) about how to use OpenPAI like this way. Though it's not efficient to resources, but it also saves cost on setup and managing environments on physical servers. | ||
Note, OpenPAI also supports to allocate dedicated resource besides queuing jobs. Users can use SSH or Jupyter to connect and use like on a physical server, refer to [here](examples/jupyter/README.md) for details. Though it's not efficient to resources, but it also saves cost on setup and managing environments on physical servers. | ||
|
||
### Submit training jobs | ||
|
||
Follow [submitting a hello-world job](docs/user/training.md), and learn more about training models on OpenPAI. It's a very simple job and used to understand OpenPAI job definition and familiar with Web portal. | ||
Follow [submitting a hello-world job](docs/user/training.md), and learn more about training models on OpenPAI. It's a very simple job and used to understand OpenPAI job configuration and familiar with Web UI. | ||
|
||
### OpenPAI VS Code Client | ||
|
||
[OpenPAI VS Code Client](contrib/pai_vscode/VSCodeExt.md) is a friendly, GUI based client tool of OpenPAI. It's an extension of Visual Studio Code. It can submit job, simulate job running locally, manage multiple OpenPAI environments, and so on. | ||
|
||
### Troubleshooting job failure | ||
|
||
Web portal and job log are helpful to analyze job failure, and OpenPAI supports SSH into environment for debugging. | ||
Web UI and job log are helpful to analyze job failure, and OpenPAI supports SSH into environment for debugging. | ||
|
||
Refer to [here](docs/user/troubleshooting_job.md) for more information about troubleshooting job failure. It's recommended to get code succeeded locally, then submit to OpenPAI. It reduces posibility to troubleshoot remotely. | ||
Refer to [here](docs/user/troubleshooting_job.md) for more information about troubleshooting job failure. | ||
|
||
## Administration | ||
|
||
|
@@ -137,7 +137,7 @@ Refer to [here](docs/user/troubleshooting_job.md) for more information about tro | |
|
||
## Reference | ||
|
||
* [Job definition](docs/job_tutorial.md) | ||
* [Job configuration](docs/job_tutorial.md) | ||
* [RESTful API](docs/rest-server/API.md) | ||
* Design documents could be found [here](docs). | ||
|
||
|
@@ -167,8 +167,8 @@ contact [[email protected]](mailto:[email protected]) with any additio | |
|
||
We are working on a set of major features improvement and refactor, anyone who is familiar with the features is encouraged to join the design review and discussion in the corresponding issue ticket. | ||
|
||
* PAI virtual cluster design. [Issue 1754](https://github.com/Microsoft/pai/issues/1754) | ||
* PAI protocol design. [Issue 2007](https://github.com/Microsoft/pai/issues/2007) | ||
* OpenPAI virtual cluster design. [Issue 1754](https://github.com/Microsoft/pai/issues/1754) | ||
* OpenPAI protocol design. [Issue 2007](https://github.com/Microsoft/pai/issues/2007) | ||
|
||
### Who should consider contributing to OpenPAI | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters