-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MKL]IR Pass Error when run ernie on inference #55477
Comments
@jzhang533 这个问题和上次的那个问题类似。 |
@engineer1109 Hi, thank you for the issue report. Since it is related to oneDNN, I tried to reproduced it locally. It seems the results are equal after enable oneDNN in my local environment(Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz): paddle version: db1f2c4 LOG: (pp) ../install_x86_64/main
--- Running analysis [ir_graph_build_pass]
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0718 09:50:07.891781 85502 executor.cc:187] Old Executor is Running.
--- Running analysis [ir_analysis_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [layer_norm_fuse_pass]
--- Running IR pass [attention_lstm_fuse_pass]
--- Running IR pass [seqconv_eltadd_relu_fuse_pass]
--- Running IR pass [seqpool_cvm_concat_fuse_pass]
--- Running IR pass [mul_lstm_fuse_pass]
--- Running IR pass [fc_gru_fuse_pass]
--- Running IR pass [mul_gru_fuse_pass]
--- Running IR pass [seq_concat_fc_fuse_pass]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [matmul_v2_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
I0718 09:50:07.969004 85502 fuse_pass_base.cc:59] --- detected 26 subgraphs
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
I0718 09:50:07.970268 85502 fuse_pass_base.cc:59] --- detected 8 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
I0718 09:50:07.979451 85502 fuse_pass_base.cc:59] --- detected 26 subgraphs
--- Running IR pass [repeated_fc_relu_fuse_pass]
--- Running IR pass [squared_mat_sub_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [constant_folding_pass]
I0718 09:50:07.989873 85502 fuse_pass_base.cc:59] --- detected 1 subgraphs
--- Running analysis [save_optimized_model_pass]
W0718 09:50:07.990353 85502 save_optimized_model_pass.cc:28] save_optim_cache_model is turned off, skip save_optimized_model_pass
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I0718 09:50:07.990731 85502 memory_optimize_pass.cc:118] The persistable params in main graph are : 63.8833MB
I0718 09:50:07.993149 85502 memory_optimize_pass.cc:246] Cluster name : unsqueeze2_0.tmp_0 size: 4
I0718 09:50:07.993156 85502 memory_optimize_pass.cc:246] Cluster name : token_type_ids size: 8
I0718 09:50:07.993160 85502 memory_optimize_pass.cc:246] Cluster name : linear_74.tmp_1 size: 3744
I0718 09:50:07.993162 85502 memory_optimize_pass.cc:246] Cluster name : gelu_4.tmp_0 size: 3744
I0718 09:50:07.993165 85502 memory_optimize_pass.cc:246] Cluster name : linear_61.tmp_1 size: 1248
I0718 09:50:07.993167 85502 memory_optimize_pass.cc:246] Cluster name : tmp_20 size: 1248
I0718 09:50:07.993170 85502 memory_optimize_pass.cc:246] Cluster name : transpose_14.tmp_0 size: 936
I0718 09:50:07.993171 85502 memory_optimize_pass.cc:246] Cluster name : reshape2_14.tmp_0 size: 936
--- Running analysis [ir_graph_to_program_pass]
I0718 09:50:08.036355 85502 analysis_predictor.cc:1676] ======= optimize end =======
I0718 09:50:08.041451 85502 naive_executor.cc:171] --- skip [feed], feed -> token_type_ids
I0718 09:50:08.041465 85502 naive_executor.cc:171] --- skip [feed], feed -> input_ids
I0718 09:50:08.042903 85502 naive_executor.cc:171] --- skip [linear_77.tmp_1], fetch -> fetch
I0718 09:50:08.182106 85502 analysis_predictor.cc:1470] MKLDNN is enabled
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [mkldnn_placement_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [layer_norm_fuse_pass]
--- Running IR pass [attention_lstm_fuse_pass]
--- Running IR pass [seqconv_eltadd_relu_fuse_pass]
--- Running IR pass [seqpool_cvm_concat_fuse_pass]
--- Running IR pass [mul_lstm_fuse_pass]
--- Running IR pass [fc_gru_fuse_pass]
--- Running IR pass [mul_gru_fuse_pass]
--- Running IR pass [seq_concat_fc_fuse_pass]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [matmul_v2_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
I0718 09:50:08.263229 85502 fuse_pass_base.cc:59] --- detected 26 subgraphs
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
I0718 09:50:08.264391 85502 fuse_pass_base.cc:59] --- detected 8 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
I0718 09:50:08.273350 85502 fuse_pass_base.cc:59] --- detected 26 subgraphs
--- Running IR pass [repeated_fc_relu_fuse_pass]
--- Running IR pass [squared_mat_sub_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [constant_folding_pass]
I0718 09:50:08.283744 85502 fuse_pass_base.cc:59] --- detected 1 subgraphs
--- Running IR pass [squeeze2_transpose2_onednn_fuse_pass]
--- fused 0 squeeze2 with transpose2
--- Running IR pass [depthwise_conv_mkldnn_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_affine_channel_mkldnn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_bias_mkldnn_fuse_pass]
--- Running IR pass [conv_transpose_bias_mkldnn_fuse_pass]
--- Running IR pass [conv_elementwise_add_mkldnn_fuse_pass]
--- Running IR pass [conv_activation_mkldnn_fuse_pass]
--- Running IR pass [scale_matmul_fuse_pass]
I0718 09:50:08.293412 85502 fuse_pass_base.cc:59] --- detected 4 subgraphs
--- fused 4 scale with matmul
--- Running IR pass [reshape_transpose_matmul_mkldnn_fuse_pass]
I0718 09:50:08.300585 85502 fuse_pass_base.cc:59] --- detected 12 subgraphs
--- fused 12 reshape + transpose + matmul with reshape's xshape with transpose's xshape
--- Running IR pass [matmul_transpose_reshape_mkldnn_fuse_pass]
I0718 09:50:08.304795 85502 fuse_pass_base.cc:59] --- detected 4 subgraphs
--- fused 4 fused_matmul + transpose + reshape patterns
--- Running IR pass [matmul_elementwise_add_mkldnn_fuse_pass]
I0718 09:50:08.305992 85502 fuse_pass_base.cc:59] --- detected 4 subgraphs
--- fused 4 fused_matmul (as x) with elementwise_add
--- Running IR pass [matmul_activation_mkldnn_fuse_pass]
--- Running IR pass [fc_mkldnn_pass]
I0718 09:50:08.312490 85502 fuse_pass_base.cc:59] --- detected 26 subgraphs
--- enabled FC MKL-DNN for 26 fc ops
--- Running IR pass [fc_act_mkldnn_fuse_pass]
I0718 09:50:08.313565 85502 fuse_pass_base.cc:59] --- detected 4 subgraphs
--- fused 4 fc with gelu activation
I0718 09:50:08.315255 85502 fuse_pass_base.cc:59] --- detected 1 subgraphs
--- fused 1 fc with tanh activation
--- Running IR pass [fc_elementwise_add_mkldnn_fuse_pass]
--- Fused 8 fc (as y) + elementwise_add patterns
I0718 09:50:08.320371 85502 fuse_pass_base.cc:59] --- detected 8 subgraphs
--- Running IR pass [self_attention_fuse_pass]
--- fused 0 self attention (of scaled_dp_attention) with self_attention_fuse
--- Running IR pass [batch_norm_act_fuse_pass]
--- Running IR pass [softplus_activation_onednn_fuse_pass]
--- Running IR pass [shuffle_channel_mkldnn_detect_pass]
--- Running IR pass [elementwise_act_onednn_fuse_pass]
--- Running IR pass [operator_scale_onednn_fuse_pass]
--- Running IR pass [operator_unsqueeze2_onednn_fuse_pass]
--- Running IR pass [operator_reshape2_onednn_fuse_pass]
--- Running analysis [save_optimized_model_pass]
W0718 09:50:08.326776 85502 save_optimized_model_pass.cc:28] save_optim_cache_model is turned off, skip save_optimized_model_pass
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I0718 09:50:08.327189 85502 memory_optimize_pass.cc:118] The persistable params in main graph are : 63.8833MB
I0718 09:50:08.328198 85502 memory_optimize_pass.cc:246] Cluster name : unsqueeze2_0.tmp_0 size: 4
I0718 09:50:08.328207 85502 memory_optimize_pass.cc:246] Cluster name : tmp_16 size: 1248
I0718 09:50:08.328208 85502 memory_optimize_pass.cc:246] Cluster name : reshape2_15.tmp_0 size: 936
I0718 09:50:08.328212 85502 memory_optimize_pass.cc:246] Cluster name : tmp_14 size: 36
I0718 09:50:08.328213 85502 memory_optimize_pass.cc:246] Cluster name : gelu_3.tmp_0 size: 3744
I0718 09:50:08.328217 85502 memory_optimize_pass.cc:246] Cluster name : layer_norm_21.tmp_2 size: 1248
I0718 09:50:08.328218 85502 memory_optimize_pass.cc:246] Cluster name : token_type_ids size: 8
--- Running analysis [ir_graph_to_program_pass]
I0718 09:50:08.361274 85502 analysis_predictor.cc:1676] ======= optimize end =======
I0718 09:50:08.361474 85502 naive_executor.cc:171] --- skip [feed], feed -> token_type_ids
I0718 09:50:08.361481 85502 naive_executor.cc:171] --- skip [feed], feed -> input_ids
I0718 09:50:08.362283 85502 naive_executor.cc:171] --- skip [linear_77.tmp_1], fetch -> fetch
I0718 09:50:08.362777 85502 onednn_context.cc:81] oneDNN v3.1.1
==inferNoMKL==
news_finance
news_house
news_house
news_edu
news_finance
==inferMKL==
news_finance
news_house
news_house
news_edu
news_finance Which platform are you using for testing? Can you help to compare the log with yours? |
@xinyu-intel
My env is ubuntu 20.04.5 GCC 9.4.0 5.15.0-76-generic My ldd is
I have used export LD_LIBRARY_PATH=:./ to make sure that use the so under the install_x86_64 |
CPU is 12th Gen Intel(R) Core(TM) i7-12700 |
okay. It seems the difference is that the paddle_inference.so is built under non-avx512 platform. I'll try to find another environment... |
@engineer1109 It looks like the result come back to normal after pass fc_elementwise_add_mkldnn_fuse_pass disabled on my core-i7 environment. However, I'm still rooting cause some details inside the pass. I can submit a PR to temporarily disable the pass if your workload is urgent. |
@engineer1109 try if #55504 can fix your case. |
@xinyu-intel Fixed Success. |
bug描述 Describe the Bug
复现Code
链接: https://pan.baidu.com/s/1J0PkcTu1ngMK-DkSh79j_Q?pwd=yvqy 提取码: yvqy
Paddle版本 develop 27fd2bc
里面给了一个Demo,ernie的文本分类,纯C++的paddle inference。
EnableMKLDNN 开启与关闭 的推理结果不一样。
inferNoMKL 是结果正确的,关闭了EnableMKLDNN
inferMKL 是错误的,开启了EnableMKLDNN
这个Bug大概是1-2个月内出现的,以前是确定没有的。
这个Bug大概和近期的MKL IRPass的修改有关。
其他补充信息 Additional Supplementary Information
No response
The text was updated successfully, but these errors were encountered: