Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topkv2op optimize #30403

Merged
merged 8 commits into from
Mar 1, 2021
Merged

Topkv2op optimize #30403

merged 8 commits into from
Mar 1, 2021

Conversation

thisjiang
Copy link
Contributor

@thisjiang thisjiang commented Jan 13, 2021

PR types

Performance optimization

PR changes

OPs

Describe

问题起因:
maskrcnn模型在topk处耗时特别高,cub库的DeviceSegmentedRadixSortKernel这个kernel耗时占比超过35.8%。

问题分析:
top_k_v2_op.cu#L112处有个判断:当axis不等于最后一个维度时需要对矩阵进行转置操作,这是为了保证kernel在读取数据时保持global memory coalesce。但这存在一个问题,当input_shape = (20, 242991), axis = 0时,转置后的矩阵大小就变成了trans_dim = (242991, 24),而在top_k_v2_op.cu#L153处又有一个判断,当input_width <= 1024时会走cub的SortTopk函数,很不巧的是,SortTopk对于处理这种行数非常大的矩阵很不在行,因此导致了速度非常慢。

优化方案:
修改top_k_v2_op.cu#L153处的条件来严格限制SortTopk的进入条件:

将原有的input_width <= 1024条件增加限制为(input_width <= 1024 && input_height <= 2048)

优化成果:
测试基于mask_rcnn_r50_fpn_1x_coco模型 + coco17数据集 + 取前18条ips平均值:

修改 ips
原始版本 4.847311765
去掉SortTopk逻辑 6.308770588
(input_width <= 1024 && input_height <= 2048) 6.1157
修改 profile时间占比
原始版本 35.8%
(input_width <= 1024 && input_height <= 2048) 5.8%

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for op benchmark CI.

op benchmark CI的错误为:

2021-02-26 20:08:43 [/workspace/Paddle/tools/test_op_benchmark.sh:126] [INFO] Load op: "top_k_v2".
2021-02-26 20:08:43 [/workspace/Paddle/tools/test_op_benchmark.sh:261] [ERROR] Missing test script of "top_k_v2"(paddle/fluid/operators/top_k_v2_op.cu) in benchmark.
2021-02-26 20:08:43 [/workspace/Paddle/tools/test_op_benchmark.sh:265] [INFO] See https://github.com/PaddlePaddle/Paddle/wiki/PR-CI-OP-benchmark-Manual for details.

是top_k_v2到topk测试脚本的映射规则匹配失败,后续@Avin0323 来跟进和解决一下这个问题吧。

@wzzju wzzju merged commit 8f4ac6b into PaddlePaddle:develop Mar 1, 2021
@thisjiang thisjiang deleted the topkv2op-optimize branch March 1, 2021 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants