Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change jp contents #1090

Merged
merged 26 commits into from
Sep 23, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
ff64ed2
Revert "delete --privileged,test=document_fix (#1052)"
mmglove Jul 21, 2021
9232457
Merge branch 'master' of https://github.com/PaddlePaddle/benchmark
mmglove Jul 23, 2021
837b5f1
Merge branch 'master' of https://github.com/PaddlePaddle/benchmark
mmglove Jul 28, 2021
d04ab74
Merge branch 'master' of https://github.com/PaddlePaddle/benchmark
mmglove Aug 4, 2021
8e769a3
Merge branch 'master' of https://github.com/PaddlePaddle/benchmark
mmglove Aug 11, 2021
0cda15a
Merge branch 'master' of https://github.com/PaddlePaddle/benchmark
mmglove Aug 11, 2021
a533b9a
Merge branch 'master' of https://github.com/PaddlePaddle/benchmark
mmglove Aug 16, 2021
6ba2751
Merge branch 'master' of https://github.com/PaddlePaddle/benchmark
mmglove Aug 20, 2021
9342af6
Merge branch 'master' of https://github.com/PaddlePaddle/benchmark
mmglove Aug 27, 2021
93c42dd
Merge branch 'master' of https://github.com/PaddlePaddle/benchmark
mmglove Aug 30, 2021
5c11099
Merge branch 'master' of https://github.com/PaddlePaddle/benchmark
mmglove Sep 7, 2021
f7f1ee9
add OtherFrame demo
mmglove Sep 8, 2021
d7a77d1
add OtherFrame demo
mmglove Sep 8, 2021
6154a1e
add OtherFrame demo
mmglove Sep 8, 2021
9b7ffea
add OtherFrame demo,test=document_fix
mmglove Sep 8, 2021
7f2d57d
add OtherFrame demo,test=document_fix
mmglove Sep 8, 2021
adf1ab5
add OtherFrame demo,test=document_fix
mmglove Sep 8, 2021
4978b89
add OtherFrame demo,test=document_fix
mmglove Sep 8, 2021
b91b7e6
add OtherFrame demo,test=document_fix
mmglove Sep 8, 2021
16c61fb
add jp contents,test=document_fix
mmglove Sep 17, 2021
a48ae22
add jp contents,test=document_fix
mmglove Sep 17, 2021
b89f869
change jp contents,test=document_fix
mmglove Sep 23, 2021
99a430b
change jp contents,test=document_fix
mmglove Sep 23, 2021
7c796a0
change jp contents,test=document_fix
mmglove Sep 23, 2021
432ffc8
change jp contents,test=document_fix
mmglove Sep 23, 2021
2d2eaa0
change jp contents,test=document_fix
mmglove Sep 23, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 90 additions & 0 deletions OtherFrame/Speech/PyTorch/README_demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# NGC PyTorch 性能复现
## 本readme仅为示例,相关内容请勿更新到此, NLP_demo也仅为示例
## 目录

├── PrepareEnv.sh # 竞品PyTorch运行环境搭建
├── README.md # 运行文档
├── models # 提供竞品PyTorch框架的修改后的模型,官方模型请直接在脚本中拉取,统一方向的模型commit应一致,如不一致请单独在模型运行脚本中写明运行的commit
├── run_PyTorch.sh # 全量竞品PyTorch框架模型运行脚本
└── scripts # 提供各个模型复现性能的脚本
## 环境介绍
### 1.物理机环境
- 单机(单卡、8卡)
- 系统:CentOS release 7.5 (Final)
- GPU:Tesla V100-SXM2-32GB * 8
- CPU:Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz * 80
- Driver Version: 460.27.04
- 内存:629 GB
- CUDA、cudnn Version: cuda10.1-cudnn7 、 cuda11.2-cudnn8-gcc82
- 多机(32卡) TODO
### 2.Docker 镜像,如:

NGC PyTorch 的代码仓库提供了自动构建 Docker 镜像的 [Dockerfile](https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Translation/Transformer/Dockerfile),

- **镜像版本**: `nvcr.io/nvidia/pytorch:20.06-py3` # 竞品镜像,每个方向的请一致
- **PyTorch 版本**: `1.6.0a0+9907a3e` # 竞品版本:最新稳定版本,如需特定版本请备注说明原因
- **CUDA 版本**: `11.2`
- **cuDnn 版本**: `8.0.1`

## 测试步骤
```bash
bash run_PyTorch.sh; # 创建容器,在该标准环境中测试模型
```
脚本内容,如:
```bash
#!/usr/bin/env bash
# 拉镜像
ImageName= ;
docker pull ${ImageName}
# 启动镜像后测试单个模型
run_cmd="bash PrepareEnv.sh;
cd /workspace/models/NLP/nlp_modelName/;
cp /workspace/scripts/NLP/nlp_modelName/preData.sh ./;
cp /workspace/scripts/NLP/nlp_modelName/run_benchmark.sh ./;
cp /workspace/scripts/NLP/nlp_modelName/analysis_log.py ./;
CUDA_VISIBLE_DEVICES=0 bash run_benchmark.sh sp 32 fp32 500;
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash run_benchmark.sh sp 64 fp16 500;
"
# 启动镜像
nvidia-docker run --name test_torch -it \
--net=host \
--shm-size=1g \
-v $PWD:/workspace \
${ImageName} /bin/bash -c "${run_cmd}"

```
## 单个模型脚本目录

└── nlp_modelName # 模型名
├── README.md # 运行文档
├── analysis_log.py # log解析脚本,每个框架尽量统一,可参考[paddle的analysis.py](https://github.com/mmglove/benchmark/blob/jp_0907/scripts/analysis.py)
├── logs # 训练log,注:log中不得包含机器ip等敏感信息
│   ├── index # log解析后待入库数据json文件
│   │   ├── nlp_modelName_sp_bs32_fp32_1_speed # 单卡数据
│   │   └── nlp_modelName_mp_bs32_fp32_8_speed # 8卡数据
│   └── train_log # 原始训练log
├── preData.sh # 数据处理
└── run_benchmark.sh # 运行脚本(包含性能、收敛性)

## 输出

每个模型case需返回log解析后待入库数据json文件

```bash
{
"log_file": "/logs/2021.0906.211134.post107/train_log/ResNet101_bs32_1_1_sp", \ # log 目录,创建规范见PrepareEnv.sh
"model_name": "clas_MobileNetv1_bs32_fp32", \ # 模型case名,创建规范:repoName_模型名_bs${bs_item}_${fp_item} 如:clas_MobileNetv1_bs32_fp32
"mission_name": "图像分类", \ # 模型case所属任务名称,具体可参考scripts/config.ini
"direction_id": 0, \ # 模型case所属方向id,0:CV|1:NLP|2:Rec 具体可参考benchmark/scripts/config.ini
"run_mode": "sp", \ # 单卡:sp|多卡:mp
"index": 1, \ # 速度验证默认为1
"gpu_num": 1, \ # 1|8
"FINAL_RESULT": 197.514, \ # 速度计算后的平均值,需要skip掉不稳定的前几步值
"JOB_FAIL_FLAG": 0, \ # 该模型case运行0:成功|1:失败
"UNIT": "images/s" \ # 速度指标的单位
}

```



3 changes: 3 additions & 0 deletions OtherFrame/Speech/PyTorch/models/README_demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# NGC PyTorch 性能复现
## 本readme仅为示例,相关内容请勿更新到此,
## models目录下存放模型代码,以以子库方式合入
39 changes: 39 additions & 0 deletions OtherFrame/Speech/PyTorch/scripts/README_demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# NGC PyTorch 性能复现
## 本readme仅为示例,相关内容请勿更新到此, NLP_demo也仅为示例
## 目录

## 单个模型脚本目录

└── nlp_modelName # 模型名
├── README.md # 运行文档
├── analysis_log.py # log解析脚本,每个框架尽量统一,可参考[paddle的analysis.py](https://github.com/mmglove/benchmark/blob/jp_0907/scripts/analysis.py)
├── logs # 训练log,注:log中不得包含机器ip等敏感信息
│   ├── index # log解析后待入库数据json文件
│   │   ├── nlp_modelName_sp_bs32_fp32_1_speed # 单卡数据
│   │   └── nlp_modelName_mp_bs32_fp32_8_speed # 8卡数据
│   └── train_log # 原始训练log
├── preData.sh # 数据处理
└── run_benchmark.sh # 运行脚本(包含性能、收敛性)

## 输出

每个模型case需返回log解析后待入库数据json文件

```bash
{
"log_file": "/logs/2021.0906.211134.post107/train_log/ResNet101_bs32_1_1_sp", \ # log 目录,创建规范见PrepareEnv.sh
"model_name": "clas_MobileNetv1_bs32_fp32", \ # 模型case名,创建规范:repoName_模型名_bs${bs_item}_${fp_item} 如:clas_MobileNetv1_bs32_fp32
"mission_name": "图像分类", \ # 模型case所属任务名称,具体可参考scripts/config.ini
"direction_id": 0, \ # 模型case所属方向id,0:CV|1:NLP|2:Rec 具体可参考benchmark/scripts/config.ini
"run_mode": "sp", \ # 单卡:sp|多卡:mp
"index": 1, \ # 速度验证默认为1
"gpu_num": 1, \ # 1|8
"FINAL_RESULT": 197.514, \ # 速度计算后的平均值,需要skip掉不稳定的前几步值
"JOB_FAIL_FLAG": 0, \ # 该模型case运行0:成功|1:失败
"UNIT": "images/s" \ # 速度指标的单位
}

```



90 changes: 90 additions & 0 deletions OtherFrame/clas/PyTorch/README_demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# NGC PyTorch 性能复现
## 本readme仅为示例,相关内容请勿更新到此, NLP_demo也仅为示例
## 目录

├── PrepareEnv.sh # 竞品PyTorch运行环境搭建
├── README.md # 运行文档
├── models # 提供竞品PyTorch框架的修改后的模型,官方模型请直接在脚本中拉取,统一方向的模型commit应一致,如不一致请单独在模型运行脚本中写明运行的commit
├── run_PyTorch.sh # 全量竞品PyTorch框架模型运行脚本
└── scripts # 提供各个模型复现性能的脚本
## 环境介绍
### 1.物理机环境
- 单机(单卡、8卡)
- 系统:CentOS release 7.5 (Final)
- GPU:Tesla V100-SXM2-32GB * 8
- CPU:Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz * 80
- Driver Version: 460.27.04
- 内存:629 GB
- CUDA、cudnn Version: cuda10.1-cudnn7 、 cuda11.2-cudnn8-gcc82
- 多机(32卡) TODO
### 2.Docker 镜像,如:

NGC PyTorch 的代码仓库提供了自动构建 Docker 镜像的 [Dockerfile](https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Translation/Transformer/Dockerfile),

- **镜像版本**: `nvcr.io/nvidia/pytorch:20.06-py3` # 竞品镜像,每个方向的请一致
- **PyTorch 版本**: `1.6.0a0+9907a3e` # 竞品版本:最新稳定版本,如需特定版本请备注说明原因
- **CUDA 版本**: `11.2`
- **cuDnn 版本**: `8.0.1`

## 测试步骤
```bash
bash run_PyTorch.sh; # 创建容器,在该标准环境中测试模型
```
脚本内容,如:
```bash
#!/usr/bin/env bash
# 拉镜像
ImageName= ;
docker pull ${ImageName}
# 启动镜像后测试单个模型
run_cmd="bash PrepareEnv.sh;
cd /workspace/models/NLP/nlp_modelName/;
cp /workspace/scripts/NLP/nlp_modelName/preData.sh ./;
cp /workspace/scripts/NLP/nlp_modelName/run_benchmark.sh ./;
cp /workspace/scripts/NLP/nlp_modelName/analysis_log.py ./;
CUDA_VISIBLE_DEVICES=0 bash run_benchmark.sh sp 32 fp32 500;
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash run_benchmark.sh sp 64 fp16 500;
"
# 启动镜像
nvidia-docker run --name test_torch -it \
--net=host \
--shm-size=1g \
-v $PWD:/workspace \
${ImageName} /bin/bash -c "${run_cmd}"

```
## 单个模型脚本目录

└── nlp_modelName # 模型名
├── README.md # 运行文档
├── analysis_log.py # log解析脚本,每个框架尽量统一,可参考[paddle的analysis.py](https://github.com/mmglove/benchmark/blob/jp_0907/scripts/analysis.py)
├── logs # 训练log,注:log中不得包含机器ip等敏感信息
│   ├── index # log解析后待入库数据json文件
│   │   ├── nlp_modelName_sp_bs32_fp32_1_speed # 单卡数据
│   │   └── nlp_modelName_mp_bs32_fp32_8_speed # 8卡数据
│   └── train_log # 原始训练log
├── preData.sh # 数据处理
└── run_benchmark.sh # 运行脚本(包含性能、收敛性)

## 输出

每个模型case需返回log解析后待入库数据json文件

```bash
{
"log_file": "/logs/2021.0906.211134.post107/train_log/ResNet101_bs32_1_1_sp", \ # log 目录,创建规范见PrepareEnv.sh
"model_name": "clas_MobileNetv1_bs32_fp32", \ # 模型case名,创建规范:repoName_模型名_bs${bs_item}_${fp_item} 如:clas_MobileNetv1_bs32_fp32
"mission_name": "图像分类", \ # 模型case所属任务名称,具体可参考scripts/config.ini
"direction_id": 0, \ # 模型case所属方向id,0:CV|1:NLP|2:Rec 具体可参考benchmark/scripts/config.ini
"run_mode": "sp", \ # 单卡:sp|多卡:mp
"index": 1, \ # 速度验证默认为1
"gpu_num": 1, \ # 1|8
"FINAL_RESULT": 197.514, \ # 速度计算后的平均值,需要skip掉不稳定的前几步值
"JOB_FAIL_FLAG": 0, \ # 该模型case运行0:成功|1:失败
"UNIT": "images/s" \ # 速度指标的单位
}

```



3 changes: 3 additions & 0 deletions OtherFrame/clas/PyTorch/models/README_demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# NGC PyTorch 性能复现
## 本readme仅为示例,相关内容请勿更新到此,
## models目录下存放模型代码,以以子库方式合入
39 changes: 39 additions & 0 deletions OtherFrame/clas/PyTorch/scripts/README_demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# NGC PyTorch 性能复现
## 本readme仅为示例,相关内容请勿更新到此, NLP_demo也仅为示例
## 目录

## 单个模型脚本目录

└── nlp_modelName # 模型名
├── README.md # 运行文档
├── analysis_log.py # log解析脚本,每个框架尽量统一,可参考[paddle的analysis.py](https://github.com/mmglove/benchmark/blob/jp_0907/scripts/analysis.py)
├── logs # 训练log,注:log中不得包含机器ip等敏感信息
│   ├── index # log解析后待入库数据json文件
│   │   ├── nlp_modelName_sp_bs32_fp32_1_speed # 单卡数据
│   │   └── nlp_modelName_mp_bs32_fp32_8_speed # 8卡数据
│   └── train_log # 原始训练log
├── preData.sh # 数据处理
└── run_benchmark.sh # 运行脚本(包含性能、收敛性)

## 输出

每个模型case需返回log解析后待入库数据json文件

```bash
{
"log_file": "/logs/2021.0906.211134.post107/train_log/ResNet101_bs32_1_1_sp", \ # log 目录,创建规范见PrepareEnv.sh
"model_name": "clas_MobileNetv1_bs32_fp32", \ # 模型case名,创建规范:repoName_模型名_bs${bs_item}_${fp_item} 如:clas_MobileNetv1_bs32_fp32
"mission_name": "图像分类", \ # 模型case所属任务名称,具体可参考scripts/config.ini
"direction_id": 0, \ # 模型case所属方向id,0:CV|1:NLP|2:Rec 具体可参考benchmark/scripts/config.ini
"run_mode": "sp", \ # 单卡:sp|多卡:mp
"index": 1, \ # 速度验证默认为1
"gpu_num": 1, \ # 1|8
"FINAL_RESULT": 197.514, \ # 速度计算后的平均值,需要skip掉不稳定的前几步值
"JOB_FAIL_FLAG": 0, \ # 该模型case运行0:成功|1:失败
"UNIT": "images/s" \ # 速度指标的单位
}

```



90 changes: 90 additions & 0 deletions OtherFrame/clas/mxnet/README_demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# NGC PyTorch 性能复现
## 本readme仅为示例,相关内容请勿更新到此, NLP_demo也仅为示例
## 目录

├── PrepareEnv.sh # 竞品PyTorch运行环境搭建
├── README.md # 运行文档
├── models # 提供竞品PyTorch框架的修改后的模型,官方模型请直接在脚本中拉取,统一方向的模型commit应一致,如不一致请单独在模型运行脚本中写明运行的commit
├── run_PyTorch.sh # 全量竞品PyTorch框架模型运行脚本
└── scripts # 提供各个模型复现性能的脚本
## 环境介绍
### 1.物理机环境
- 单机(单卡、8卡)
- 系统:CentOS release 7.5 (Final)
- GPU:Tesla V100-SXM2-32GB * 8
- CPU:Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz * 80
- Driver Version: 460.27.04
- 内存:629 GB
- CUDA、cudnn Version: cuda10.1-cudnn7 、 cuda11.2-cudnn8-gcc82
- 多机(32卡) TODO
### 2.Docker 镜像,如:

NGC PyTorch 的代码仓库提供了自动构建 Docker 镜像的 [Dockerfile](https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Translation/Transformer/Dockerfile),

- **镜像版本**: `nvcr.io/nvidia/pytorch:20.06-py3` # 竞品镜像,每个方向的请一致
- **PyTorch 版本**: `1.6.0a0+9907a3e` # 竞品版本:最新稳定版本,如需特定版本请备注说明原因
- **CUDA 版本**: `11.2`
- **cuDnn 版本**: `8.0.1`

## 测试步骤
```bash
bash run_PyTorch.sh; # 创建容器,在该标准环境中测试模型
```
脚本内容,如:
```bash
#!/usr/bin/env bash
# 拉镜像
ImageName= ;
docker pull ${ImageName}
# 启动镜像后测试单个模型
run_cmd="bash PrepareEnv.sh;
cd /workspace/models/NLP/nlp_modelName/;
cp /workspace/scripts/NLP/nlp_modelName/preData.sh ./;
cp /workspace/scripts/NLP/nlp_modelName/run_benchmark.sh ./;
cp /workspace/scripts/NLP/nlp_modelName/analysis_log.py ./;
CUDA_VISIBLE_DEVICES=0 bash run_benchmark.sh sp 32 fp32 500;
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash run_benchmark.sh sp 64 fp16 500;
"
# 启动镜像
nvidia-docker run --name test_torch -it \
--net=host \
--shm-size=1g \
-v $PWD:/workspace \
${ImageName} /bin/bash -c "${run_cmd}"

```
## 单个模型脚本目录

└── nlp_modelName # 模型名
├── README.md # 运行文档
├── analysis_log.py # log解析脚本,每个框架尽量统一,可参考[paddle的analysis.py](https://github.com/mmglove/benchmark/blob/jp_0907/scripts/analysis.py)
├── logs # 训练log,注:log中不得包含机器ip等敏感信息
│   ├── index # log解析后待入库数据json文件
│   │   ├── nlp_modelName_sp_bs32_fp32_1_speed # 单卡数据
│   │   └── nlp_modelName_mp_bs32_fp32_8_speed # 8卡数据
│   └── train_log # 原始训练log
├── preData.sh # 数据处理
└── run_benchmark.sh # 运行脚本(包含性能、收敛性)

## 输出

每个模型case需返回log解析后待入库数据json文件

```bash
{
"log_file": "/logs/2021.0906.211134.post107/train_log/ResNet101_bs32_1_1_sp", \ # log 目录,创建规范见PrepareEnv.sh
"model_name": "clas_MobileNetv1_bs32_fp32", \ # 模型case名,创建规范:repoName_模型名_bs${bs_item}_${fp_item} 如:clas_MobileNetv1_bs32_fp32
"mission_name": "图像分类", \ # 模型case所属任务名称,具体可参考scripts/config.ini
"direction_id": 0, \ # 模型case所属方向id,0:CV|1:NLP|2:Rec 具体可参考benchmark/scripts/config.ini
"run_mode": "sp", \ # 单卡:sp|多卡:mp
"index": 1, \ # 速度验证默认为1
"gpu_num": 1, \ # 1|8
"FINAL_RESULT": 197.514, \ # 速度计算后的平均值,需要skip掉不稳定的前几步值
"JOB_FAIL_FLAG": 0, \ # 该模型case运行0:成功|1:失败
"UNIT": "images/s" \ # 速度指标的单位
}

```



3 changes: 3 additions & 0 deletions OtherFrame/clas/mxnet/models/README_demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# NGC PyTorch 性能复现
## 本readme仅为示例,相关内容请勿更新到此,
## models目录下存放模型代码,以以子库方式合入
Loading