Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MKL]IR Pass Error when run ernie on inference #55477

Closed
engineer1109 opened this issue Jul 17, 2023 · 8 comments
Closed

[MKL]IR Pass Error when run ernie on inference #55477

engineer1109 opened this issue Jul 17, 2023 · 8 comments
Assignees
Labels
Intel PFCC Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc status/close 已关闭 type/bug-report 报bug

Comments

@engineer1109
Copy link
Contributor

bug描述 Describe the Bug

复现Code
链接: https://pan.baidu.com/s/1J0PkcTu1ngMK-DkSh79j_Q?pwd=yvqy 提取码: yvqy

Paddle版本 develop 27fd2bc

里面给了一个Demo,ernie的文本分类,纯C++的paddle inference。
EnableMKLDNN 开启与关闭 的推理结果不一样。

==inferNoMKL==
news_finance
news_house
news_house
news_edu
news_finance
==inferMKL==
news_entertainment
news_travel
news_entertainment
news_car
news_entertainment

inferNoMKL 是结果正确的,关闭了EnableMKLDNN
inferMKL 是错误的,开启了EnableMKLDNN

这个Bug大概是1-2个月内出现的,以前是确定没有的。
这个Bug大概和近期的MKL IRPass的修改有关。

其他补充信息 Additional Supplementary Information

No response

@engineer1109
Copy link
Contributor Author

@jzhang533 这个问题和上次的那个问题类似。
不过这次问题是在CPU MKL IRPass上的。
几乎所有的NLP都有这个问题。
是近期1-2个月内出现。

@carryyu carryyu assigned jiahy0825 and unassigned jiahy0825 Jul 17, 2023
@paddle-bot paddle-bot bot added the PFCC Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc label Jul 17, 2023
@xinyu-intel xinyu-intel self-assigned this Jul 18, 2023
@xinyu-intel
Copy link
Contributor

@engineer1109 Hi, thank you for the issue report. Since it is related to oneDNN, I tried to reproduced it locally.

It seems the results are equal after enable oneDNN in my local environment(Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz):

paddle version: db1f2c4

LOG:

(pp) ../install_x86_64/main 
--- Running analysis [ir_graph_build_pass]
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0718 09:50:07.891781 85502 executor.cc:187] Old Executor is Running.
--- Running analysis [ir_analysis_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [layer_norm_fuse_pass]
--- Running IR pass [attention_lstm_fuse_pass]
--- Running IR pass [seqconv_eltadd_relu_fuse_pass]
--- Running IR pass [seqpool_cvm_concat_fuse_pass]
--- Running IR pass [mul_lstm_fuse_pass]
--- Running IR pass [fc_gru_fuse_pass]
--- Running IR pass [mul_gru_fuse_pass]
--- Running IR pass [seq_concat_fc_fuse_pass]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [matmul_v2_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
I0718 09:50:07.969004 85502 fuse_pass_base.cc:59] ---  detected 26 subgraphs
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
I0718 09:50:07.970268 85502 fuse_pass_base.cc:59] ---  detected 8 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
I0718 09:50:07.979451 85502 fuse_pass_base.cc:59] ---  detected 26 subgraphs
--- Running IR pass [repeated_fc_relu_fuse_pass]
--- Running IR pass [squared_mat_sub_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [constant_folding_pass]
I0718 09:50:07.989873 85502 fuse_pass_base.cc:59] ---  detected 1 subgraphs
--- Running analysis [save_optimized_model_pass]
W0718 09:50:07.990353 85502 save_optimized_model_pass.cc:28] save_optim_cache_model is turned off, skip save_optimized_model_pass
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I0718 09:50:07.990731 85502 memory_optimize_pass.cc:118] The persistable params in main graph are : 63.8833MB
I0718 09:50:07.993149 85502 memory_optimize_pass.cc:246] Cluster name : unsqueeze2_0.tmp_0  size: 4
I0718 09:50:07.993156 85502 memory_optimize_pass.cc:246] Cluster name : token_type_ids  size: 8
I0718 09:50:07.993160 85502 memory_optimize_pass.cc:246] Cluster name : linear_74.tmp_1  size: 3744
I0718 09:50:07.993162 85502 memory_optimize_pass.cc:246] Cluster name : gelu_4.tmp_0  size: 3744
I0718 09:50:07.993165 85502 memory_optimize_pass.cc:246] Cluster name : linear_61.tmp_1  size: 1248
I0718 09:50:07.993167 85502 memory_optimize_pass.cc:246] Cluster name : tmp_20  size: 1248
I0718 09:50:07.993170 85502 memory_optimize_pass.cc:246] Cluster name : transpose_14.tmp_0  size: 936
I0718 09:50:07.993171 85502 memory_optimize_pass.cc:246] Cluster name : reshape2_14.tmp_0  size: 936
--- Running analysis [ir_graph_to_program_pass]
I0718 09:50:08.036355 85502 analysis_predictor.cc:1676] ======= optimize end =======
I0718 09:50:08.041451 85502 naive_executor.cc:171] ---  skip [feed], feed -> token_type_ids
I0718 09:50:08.041465 85502 naive_executor.cc:171] ---  skip [feed], feed -> input_ids
I0718 09:50:08.042903 85502 naive_executor.cc:171] ---  skip [linear_77.tmp_1], fetch -> fetch
I0718 09:50:08.182106 85502 analysis_predictor.cc:1470] MKLDNN is enabled
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [mkldnn_placement_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [layer_norm_fuse_pass]
--- Running IR pass [attention_lstm_fuse_pass]
--- Running IR pass [seqconv_eltadd_relu_fuse_pass]
--- Running IR pass [seqpool_cvm_concat_fuse_pass]
--- Running IR pass [mul_lstm_fuse_pass]
--- Running IR pass [fc_gru_fuse_pass]
--- Running IR pass [mul_gru_fuse_pass]
--- Running IR pass [seq_concat_fc_fuse_pass]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [matmul_v2_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
I0718 09:50:08.263229 85502 fuse_pass_base.cc:59] ---  detected 26 subgraphs
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
I0718 09:50:08.264391 85502 fuse_pass_base.cc:59] ---  detected 8 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
I0718 09:50:08.273350 85502 fuse_pass_base.cc:59] ---  detected 26 subgraphs
--- Running IR pass [repeated_fc_relu_fuse_pass]
--- Running IR pass [squared_mat_sub_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [constant_folding_pass]
I0718 09:50:08.283744 85502 fuse_pass_base.cc:59] ---  detected 1 subgraphs
--- Running IR pass [squeeze2_transpose2_onednn_fuse_pass]
--- fused 0 squeeze2 with transpose2
--- Running IR pass [depthwise_conv_mkldnn_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_affine_channel_mkldnn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_bias_mkldnn_fuse_pass]
--- Running IR pass [conv_transpose_bias_mkldnn_fuse_pass]
--- Running IR pass [conv_elementwise_add_mkldnn_fuse_pass]
--- Running IR pass [conv_activation_mkldnn_fuse_pass]
--- Running IR pass [scale_matmul_fuse_pass]
I0718 09:50:08.293412 85502 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 scale with matmul
--- Running IR pass [reshape_transpose_matmul_mkldnn_fuse_pass]
I0718 09:50:08.300585 85502 fuse_pass_base.cc:59] ---  detected 12 subgraphs
---    fused 12 reshape + transpose + matmul with reshape's xshape with transpose's xshape
--- Running IR pass [matmul_transpose_reshape_mkldnn_fuse_pass]
I0718 09:50:08.304795 85502 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 fused_matmul + transpose + reshape patterns
--- Running IR pass [matmul_elementwise_add_mkldnn_fuse_pass]
I0718 09:50:08.305992 85502 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 fused_matmul (as x) with elementwise_add
--- Running IR pass [matmul_activation_mkldnn_fuse_pass]
--- Running IR pass [fc_mkldnn_pass]
I0718 09:50:08.312490 85502 fuse_pass_base.cc:59] ---  detected 26 subgraphs
---    enabled FC MKL-DNN for 26 fc ops 
--- Running IR pass [fc_act_mkldnn_fuse_pass]
I0718 09:50:08.313565 85502 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 fc with gelu activation
I0718 09:50:08.315255 85502 fuse_pass_base.cc:59] ---  detected 1 subgraphs
---    fused 1 fc with tanh activation
--- Running IR pass [fc_elementwise_add_mkldnn_fuse_pass]
---    Fused 8 fc (as y) + elementwise_add patterns
I0718 09:50:08.320371 85502 fuse_pass_base.cc:59] ---  detected 8 subgraphs
--- Running IR pass [self_attention_fuse_pass]
---    fused 0 self attention (of scaled_dp_attention) with self_attention_fuse
--- Running IR pass [batch_norm_act_fuse_pass]
--- Running IR pass [softplus_activation_onednn_fuse_pass]
--- Running IR pass [shuffle_channel_mkldnn_detect_pass]
--- Running IR pass [elementwise_act_onednn_fuse_pass]
--- Running IR pass [operator_scale_onednn_fuse_pass]
--- Running IR pass [operator_unsqueeze2_onednn_fuse_pass]
--- Running IR pass [operator_reshape2_onednn_fuse_pass]
--- Running analysis [save_optimized_model_pass]
W0718 09:50:08.326776 85502 save_optimized_model_pass.cc:28] save_optim_cache_model is turned off, skip save_optimized_model_pass
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I0718 09:50:08.327189 85502 memory_optimize_pass.cc:118] The persistable params in main graph are : 63.8833MB
I0718 09:50:08.328198 85502 memory_optimize_pass.cc:246] Cluster name : unsqueeze2_0.tmp_0  size: 4
I0718 09:50:08.328207 85502 memory_optimize_pass.cc:246] Cluster name : tmp_16  size: 1248
I0718 09:50:08.328208 85502 memory_optimize_pass.cc:246] Cluster name : reshape2_15.tmp_0  size: 936
I0718 09:50:08.328212 85502 memory_optimize_pass.cc:246] Cluster name : tmp_14  size: 36
I0718 09:50:08.328213 85502 memory_optimize_pass.cc:246] Cluster name : gelu_3.tmp_0  size: 3744
I0718 09:50:08.328217 85502 memory_optimize_pass.cc:246] Cluster name : layer_norm_21.tmp_2  size: 1248
I0718 09:50:08.328218 85502 memory_optimize_pass.cc:246] Cluster name : token_type_ids  size: 8
--- Running analysis [ir_graph_to_program_pass]
I0718 09:50:08.361274 85502 analysis_predictor.cc:1676] ======= optimize end =======
I0718 09:50:08.361474 85502 naive_executor.cc:171] ---  skip [feed], feed -> token_type_ids
I0718 09:50:08.361481 85502 naive_executor.cc:171] ---  skip [feed], feed -> input_ids
I0718 09:50:08.362283 85502 naive_executor.cc:171] ---  skip [linear_77.tmp_1], fetch -> fetch
I0718 09:50:08.362777 85502 onednn_context.cc:81] oneDNN v3.1.1
==inferNoMKL==
news_finance
news_house
news_house
news_edu
news_finance
==inferMKL==
news_finance
news_house
news_house
news_edu
news_finance

Which platform are you using for testing? Can you help to compare the log with yours?

@engineer1109
Copy link
Contributor Author

@xinyu-intel
This is my log.

--- Running analysis [ir_graph_build_pass]
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0718 10:11:44.220561 2153053 executor.cc:187] Old Executor is Running.
--- Running analysis [ir_analysis_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [layer_norm_fuse_pass]
--- Running IR pass [attention_lstm_fuse_pass]
--- Running IR pass [seqconv_eltadd_relu_fuse_pass]
--- Running IR pass [seqpool_cvm_concat_fuse_pass]
--- Running IR pass [mul_lstm_fuse_pass]
--- Running IR pass [fc_gru_fuse_pass]
--- Running IR pass [mul_gru_fuse_pass]
--- Running IR pass [seq_concat_fc_fuse_pass]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [matmul_v2_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
I0718 10:11:44.254137 2153053 fuse_pass_base.cc:59] ---  detected 26 subgraphs
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
I0718 10:11:44.254751 2153053 fuse_pass_base.cc:59] ---  detected 8 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
I0718 10:11:44.259657 2153053 fuse_pass_base.cc:59] ---  detected 26 subgraphs
--- Running IR pass [repeated_fc_relu_fuse_pass]
--- Running IR pass [squared_mat_sub_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [constant_folding_pass]
I0718 10:11:44.264782 2153053 fuse_pass_base.cc:59] ---  detected 1 subgraphs
--- Running analysis [save_optimized_model_pass]
W0718 10:11:44.265024 2153053 save_optimized_model_pass.cc:28] save_optim_cache_model is turned off, skip save_optimized_model_pass
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I0718 10:11:44.265228 2153053 memory_optimize_pass.cc:118] The persistable params in main graph are : 63.8833MB
I0718 10:11:44.266413 2153053 memory_optimize_pass.cc:246] Cluster name : token_type_ids  size: 8
I0718 10:11:44.266417 2153053 memory_optimize_pass.cc:246] Cluster name : transpose_14.tmp_0  size: 936
I0718 10:11:44.266417 2153053 memory_optimize_pass.cc:246] Cluster name : transpose_1.tmp_0  size: 936
I0718 10:11:44.266419 2153053 memory_optimize_pass.cc:246] Cluster name : tmp_5  size: 1248
I0718 10:11:44.266422 2153053 memory_optimize_pass.cc:246] Cluster name : layer_norm_21.tmp_2  size: 1248
I0718 10:11:44.266422 2153053 memory_optimize_pass.cc:246] Cluster name : linear_74.tmp_1  size: 3744
I0718 10:11:44.266424 2153053 memory_optimize_pass.cc:246] Cluster name : unsqueeze2_0.tmp_0  size: 4
I0718 10:11:44.266425 2153053 memory_optimize_pass.cc:246] Cluster name : linear_68.tmp_1  size: 3744
--- Running analysis [ir_graph_to_program_pass]
I0718 10:11:44.290699 2153053 analysis_predictor.cc:1676] ======= optimize end =======
I0718 10:11:44.294454 2153053 naive_executor.cc:171] ---  skip [feed], feed -> token_type_ids
I0718 10:11:44.294461 2153053 naive_executor.cc:171] ---  skip [feed], feed -> input_ids
I0718 10:11:44.295307 2153053 naive_executor.cc:171] ---  skip [linear_77.tmp_1], fetch -> fetch
I0718 10:11:44.359247 2153053 analysis_predictor.cc:1470] MKLDNN is enabled
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [mkldnn_placement_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [layer_norm_fuse_pass]
--- Running IR pass [attention_lstm_fuse_pass]
--- Running IR pass [seqconv_eltadd_relu_fuse_pass]
--- Running IR pass [seqpool_cvm_concat_fuse_pass]
--- Running IR pass [mul_lstm_fuse_pass]
--- Running IR pass [fc_gru_fuse_pass]
--- Running IR pass [mul_gru_fuse_pass]
--- Running IR pass [seq_concat_fc_fuse_pass]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [matmul_v2_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
I0718 10:11:44.395046 2153053 fuse_pass_base.cc:59] ---  detected 26 subgraphs
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
I0718 10:11:44.395629 2153053 fuse_pass_base.cc:59] ---  detected 8 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
I0718 10:11:44.400547 2153053 fuse_pass_base.cc:59] ---  detected 26 subgraphs
--- Running IR pass [repeated_fc_relu_fuse_pass]
--- Running IR pass [squared_mat_sub_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [constant_folding_pass]
I0718 10:11:44.405717 2153053 fuse_pass_base.cc:59] ---  detected 1 subgraphs
--- Running IR pass [squeeze2_transpose2_onednn_fuse_pass]
--- fused 0 squeeze2 with transpose2
--- Running IR pass [depthwise_conv_mkldnn_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_affine_channel_mkldnn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_bias_mkldnn_fuse_pass]
--- Running IR pass [conv_transpose_bias_mkldnn_fuse_pass]
--- Running IR pass [conv_elementwise_add_mkldnn_fuse_pass]
--- Running IR pass [conv_activation_mkldnn_fuse_pass]
--- Running IR pass [scale_matmul_fuse_pass]
I0718 10:11:44.410969 2153053 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 scale with matmul
--- Running IR pass [reshape_transpose_matmul_mkldnn_fuse_pass]
I0718 10:11:44.414777 2153053 fuse_pass_base.cc:59] ---  detected 12 subgraphs
---    fused 12 reshape + transpose + matmul with reshape's xshape with transpose's xshape
--- Running IR pass [matmul_transpose_reshape_mkldnn_fuse_pass]
I0718 10:11:44.417080 2153053 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 fused_matmul + transpose + reshape patterns
--- Running IR pass [matmul_elementwise_add_mkldnn_fuse_pass]
I0718 10:11:44.417721 2153053 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 fused_matmul (as x) with elementwise_add
--- Running IR pass [matmul_activation_mkldnn_fuse_pass]
--- Running IR pass [fc_mkldnn_pass]
I0718 10:11:44.421105 2153053 fuse_pass_base.cc:59] ---  detected 26 subgraphs
---    enabled FC MKL-DNN for 26 fc ops 
--- Running IR pass [fc_act_mkldnn_fuse_pass]
I0718 10:11:44.421675 2153053 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 fc with gelu activation
I0718 10:11:44.422614 2153053 fuse_pass_base.cc:59] ---  detected 1 subgraphs
---    fused 1 fc with tanh activation
--- Running IR pass [fc_elementwise_add_mkldnn_fuse_pass]
---    Fused 8 fc (as y) + elementwise_add patterns
I0718 10:11:44.425307 2153053 fuse_pass_base.cc:59] ---  detected 8 subgraphs
--- Running IR pass [self_attention_fuse_pass]
W0718 10:11:44.425390 2153053 self_attention_fuse_pass.cc:53] No-avx512 or MKL supported!
--- Running IR pass [batch_norm_act_fuse_pass]
--- Running IR pass [softplus_activation_onednn_fuse_pass]
--- Running IR pass [shuffle_channel_mkldnn_detect_pass]
--- Running IR pass [elementwise_act_onednn_fuse_pass]
--- Running IR pass [operator_scale_onednn_fuse_pass]
--- Running IR pass [operator_unsqueeze2_onednn_fuse_pass]
--- Running IR pass [operator_reshape2_onednn_fuse_pass]
--- Running analysis [save_optimized_model_pass]
W0718 10:11:44.428572 2153053 save_optimized_model_pass.cc:28] save_optim_cache_model is turned off, skip save_optimized_model_pass
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I0718 10:11:44.428772 2153053 memory_optimize_pass.cc:118] The persistable params in main graph are : 63.8833MB
I0718 10:11:44.429283 2153053 memory_optimize_pass.cc:246] Cluster name : unsqueeze2_0.tmp_0  size: 4
I0718 10:11:44.429286 2153053 memory_optimize_pass.cc:246] Cluster name : softmax_3.tmp_0  size: 36
I0718 10:11:44.429287 2153053 memory_optimize_pass.cc:246] Cluster name : tmp_22  size: 36
I0718 10:11:44.429288 2153053 memory_optimize_pass.cc:246] Cluster name : tmp_5  size: 1248
I0718 10:11:44.429291 2153053 memory_optimize_pass.cc:246] Cluster name : embedding_10.tmp_0  size: 1248
I0718 10:11:44.429291 2153053 memory_optimize_pass.cc:246] Cluster name : token_type_ids  size: 8
I0718 10:11:44.429293 2153053 memory_optimize_pass.cc:246] Cluster name : gelu_1.tmp_0  size: 3744
--- Running analysis [ir_graph_to_program_pass]
I0718 10:11:44.443881 2153053 analysis_predictor.cc:1676] ======= optimize end =======
I0718 10:11:44.445695 2153053 naive_executor.cc:171] ---  skip [feed], feed -> token_type_ids
I0718 10:11:44.445701 2153053 naive_executor.cc:171] ---  skip [feed], feed -> input_ids
I0718 10:11:44.446098 2153053 naive_executor.cc:171] ---  skip [linear_77.tmp_1], fetch -> fetch
I0718 10:11:44.446327 2153053 onednn_context.cc:81] oneDNN v3.1.1
==inferNoMKL==
news_finance
news_house
news_house
news_edu
news_finance
==inferMKL==
news_entertainment
news_travel
news_entertainment
news_car
news_entertainment

My env is ubuntu 20.04.5 GCC 9.4.0 5.15.0-76-generic

My ldd is

	linux-vdso.so.1 (0x00007ffe70bc9000)
	libcuBERT_tokenization.so (0x00007f1f90a7e000)
	libpaddle_inference.so => /media/wjl/D2/test/ernie_test/./third_party/paddle/lib/x86_64/libpaddle_inference.so (0x00007f1f87dc9000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f1f87bc7000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1f87bac000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1f879ba000)
	libutf8proc.so (0x00007f1f87961000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1f8793e000)
	libdnnl.so.3 (0x00007f1f84b52000)
	libiomp5.so (0x00007f1f8475d000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1f8460e000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f1f90abe000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1f84606000)
	libgomp.so.1 => /lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f1f845c4000)

I have used

export LD_LIBRARY_PATH=:./

to make sure that use the so under the install_x86_64

@engineer1109
Copy link
Contributor Author

CPU is 12th Gen Intel(R) Core(TM) i7-12700
No avx512 support

@xinyu-intel
Copy link
Contributor

CPU is 12th Gen Intel(R) Core(TM) i7-12700 No avx512 support

okay. It seems the difference is that the paddle_inference.so is built under non-avx512 platform. I'll try to find another environment...

@xinyu-intel
Copy link
Contributor

xinyu-intel commented Jul 18, 2023

@engineer1109 It looks like the result come back to normal after pass fc_elementwise_add_mkldnn_fuse_pass disabled on my core-i7 environment. However, I'm still rooting cause some details inside the pass. I can submit a PR to temporarily disable the pass if your workload is urgent.

@xinyu-intel
Copy link
Contributor

@engineer1109 try if #55504 can fix your case.

@engineer1109
Copy link
Contributor Author

@xinyu-intel Fixed Success.
Also fixed this bug of cpu mkl part. #54569

@paddle-bot paddle-bot bot added the status/close 已关闭 label Jul 18, 2023
@paddle-bot paddle-bot bot closed this as completed Jul 18, 2023
@paddle-bot paddle-bot bot removed the status/new-issue 新建 label Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Intel PFCC Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc status/close 已关闭 type/bug-report 报bug
Projects
None yet
Development

No branches or pull requests

3 participants