[MKL]IR Pass Error when run ernie on inference #55477

engineer1109 · 2023-07-17T09:04:25Z

bug描述 Describe the Bug

复现Code
链接: https://pan.baidu.com/s/1J0PkcTu1ngMK-DkSh79j_Q?pwd=yvqy 提取码: yvqy

Paddle版本 develop 27fd2bc

里面给了一个Demo，ernie的文本分类，纯C++的paddle inference。
EnableMKLDNN 开启与关闭的推理结果不一样。

==inferNoMKL==
news_finance
news_house
news_house
news_edu
news_finance
==inferMKL==
news_entertainment
news_travel
news_entertainment
news_car
news_entertainment

inferNoMKL 是结果正确的，关闭了EnableMKLDNN
inferMKL 是错误的，开启了EnableMKLDNN

这个Bug大概是1-2个月内出现的，以前是确定没有的。
这个Bug大概和近期的MKL IRPass的修改有关。

其他补充信息 Additional Supplementary Information

No response

The text was updated successfully, but these errors were encountered:

engineer1109 · 2023-07-17T09:06:32Z

@jzhang533 这个问题和上次的那个问题类似。
不过这次问题是在CPU MKL IRPass上的。
几乎所有的NLP都有这个问题。
是近期1-2个月内出现。

xinyu-intel · 2023-07-18T02:03:23Z

@engineer1109 Hi, thank you for the issue report. Since it is related to oneDNN, I tried to reproduced it locally.

It seems the results are equal after enable oneDNN in my local environment(Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz):

paddle version: db1f2c4

LOG:

(pp) ../install_x86_64/main 
--- Running analysis [ir_graph_build_pass]
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0718 09:50:07.891781 85502 executor.cc:187] Old Executor is Running.
--- Running analysis [ir_analysis_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [layer_norm_fuse_pass]
--- Running IR pass [attention_lstm_fuse_pass]
--- Running IR pass [seqconv_eltadd_relu_fuse_pass]
--- Running IR pass [seqpool_cvm_concat_fuse_pass]
--- Running IR pass [mul_lstm_fuse_pass]
--- Running IR pass [fc_gru_fuse_pass]
--- Running IR pass [mul_gru_fuse_pass]
--- Running IR pass [seq_concat_fc_fuse_pass]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [matmul_v2_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
I0718 09:50:07.969004 85502 fuse_pass_base.cc:59] ---  detected 26 subgraphs
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
I0718 09:50:07.970268 85502 fuse_pass_base.cc:59] ---  detected 8 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
I0718 09:50:07.979451 85502 fuse_pass_base.cc:59] ---  detected 26 subgraphs
--- Running IR pass [repeated_fc_relu_fuse_pass]
--- Running IR pass [squared_mat_sub_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [constant_folding_pass]
I0718 09:50:07.989873 85502 fuse_pass_base.cc:59] ---  detected 1 subgraphs
--- Running analysis [save_optimized_model_pass]
W0718 09:50:07.990353 85502 save_optimized_model_pass.cc:28] save_optim_cache_model is turned off, skip save_optimized_model_pass
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I0718 09:50:07.990731 85502 memory_optimize_pass.cc:118] The persistable params in main graph are : 63.8833MB
I0718 09:50:07.993149 85502 memory_optimize_pass.cc:246] Cluster name : unsqueeze2_0.tmp_0  size: 4
I0718 09:50:07.993156 85502 memory_optimize_pass.cc:246] Cluster name : token_type_ids  size: 8
I0718 09:50:07.993160 85502 memory_optimize_pass.cc:246] Cluster name : linear_74.tmp_1  size: 3744
I0718 09:50:07.993162 85502 memory_optimize_pass.cc:246] Cluster name : gelu_4.tmp_0  size: 3744
I0718 09:50:07.993165 85502 memory_optimize_pass.cc:246] Cluster name : linear_61.tmp_1  size: 1248
I0718 09:50:07.993167 85502 memory_optimize_pass.cc:246] Cluster name : tmp_20  size: 1248
I0718 09:50:07.993170 85502 memory_optimize_pass.cc:246] Cluster name : transpose_14.tmp_0  size: 936
I0718 09:50:07.993171 85502 memory_optimize_pass.cc:246] Cluster name : reshape2_14.tmp_0  size: 936
--- Running analysis [ir_graph_to_program_pass]
I0718 09:50:08.036355 85502 analysis_predictor.cc:1676] ======= optimize end =======
I0718 09:50:08.041451 85502 naive_executor.cc:171] ---  skip [feed], feed -> token_type_ids
I0718 09:50:08.041465 85502 naive_executor.cc:171] ---  skip [feed], feed -> input_ids
I0718 09:50:08.042903 85502 naive_executor.cc:171] ---  skip [linear_77.tmp_1], fetch -> fetch
I0718 09:50:08.182106 85502 analysis_predictor.cc:1470] MKLDNN is enabled
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [mkldnn_placement_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [layer_norm_fuse_pass]
--- Running IR pass [attention_lstm_fuse_pass]
--- Running IR pass [seqconv_eltadd_relu_fuse_pass]
--- Running IR pass [seqpool_cvm_concat_fuse_pass]
--- Running IR pass [mul_lstm_fuse_pass]
--- Running IR pass [fc_gru_fuse_pass]
--- Running IR pass [mul_gru_fuse_pass]
--- Running IR pass [seq_concat_fc_fuse_pass]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [matmul_v2_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
I0718 09:50:08.263229 85502 fuse_pass_base.cc:59] ---  detected 26 subgraphs
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
I0718 09:50:08.264391 85502 fuse_pass_base.cc:59] ---  detected 8 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
I0718 09:50:08.273350 85502 fuse_pass_base.cc:59] ---  detected 26 subgraphs
--- Running IR pass [repeated_fc_relu_fuse_pass]
--- Running IR pass [squared_mat_sub_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [constant_folding_pass]
I0718 09:50:08.283744 85502 fuse_pass_base.cc:59] ---  detected 1 subgraphs
--- Running IR pass [squeeze2_transpose2_onednn_fuse_pass]
--- fused 0 squeeze2 with transpose2
--- Running IR pass [depthwise_conv_mkldnn_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_affine_channel_mkldnn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_bias_mkldnn_fuse_pass]
--- Running IR pass [conv_transpose_bias_mkldnn_fuse_pass]
--- Running IR pass [conv_elementwise_add_mkldnn_fuse_pass]
--- Running IR pass [conv_activation_mkldnn_fuse_pass]
--- Running IR pass [scale_matmul_fuse_pass]
I0718 09:50:08.293412 85502 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 scale with matmul
--- Running IR pass [reshape_transpose_matmul_mkldnn_fuse_pass]
I0718 09:50:08.300585 85502 fuse_pass_base.cc:59] ---  detected 12 subgraphs
---    fused 12 reshape + transpose + matmul with reshape's xshape with transpose's xshape
--- Running IR pass [matmul_transpose_reshape_mkldnn_fuse_pass]
I0718 09:50:08.304795 85502 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 fused_matmul + transpose + reshape patterns
--- Running IR pass [matmul_elementwise_add_mkldnn_fuse_pass]
I0718 09:50:08.305992 85502 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 fused_matmul (as x) with elementwise_add
--- Running IR pass [matmul_activation_mkldnn_fuse_pass]
--- Running IR pass [fc_mkldnn_pass]
I0718 09:50:08.312490 85502 fuse_pass_base.cc:59] ---  detected 26 subgraphs
---    enabled FC MKL-DNN for 26 fc ops 
--- Running IR pass [fc_act_mkldnn_fuse_pass]
I0718 09:50:08.313565 85502 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 fc with gelu activation
I0718 09:50:08.315255 85502 fuse_pass_base.cc:59] ---  detected 1 subgraphs
---    fused 1 fc with tanh activation
--- Running IR pass [fc_elementwise_add_mkldnn_fuse_pass]
---    Fused 8 fc (as y) + elementwise_add patterns
I0718 09:50:08.320371 85502 fuse_pass_base.cc:59] ---  detected 8 subgraphs
--- Running IR pass [self_attention_fuse_pass]
---    fused 0 self attention (of scaled_dp_attention) with self_attention_fuse
--- Running IR pass [batch_norm_act_fuse_pass]
--- Running IR pass [softplus_activation_onednn_fuse_pass]
--- Running IR pass [shuffle_channel_mkldnn_detect_pass]
--- Running IR pass [elementwise_act_onednn_fuse_pass]
--- Running IR pass [operator_scale_onednn_fuse_pass]
--- Running IR pass [operator_unsqueeze2_onednn_fuse_pass]
--- Running IR pass [operator_reshape2_onednn_fuse_pass]
--- Running analysis [save_optimized_model_pass]
W0718 09:50:08.326776 85502 save_optimized_model_pass.cc:28] save_optim_cache_model is turned off, skip save_optimized_model_pass
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I0718 09:50:08.327189 85502 memory_optimize_pass.cc:118] The persistable params in main graph are : 63.8833MB
I0718 09:50:08.328198 85502 memory_optimize_pass.cc:246] Cluster name : unsqueeze2_0.tmp_0  size: 4
I0718 09:50:08.328207 85502 memory_optimize_pass.cc:246] Cluster name : tmp_16  size: 1248
I0718 09:50:08.328208 85502 memory_optimize_pass.cc:246] Cluster name : reshape2_15.tmp_0  size: 936
I0718 09:50:08.328212 85502 memory_optimize_pass.cc:246] Cluster name : tmp_14  size: 36
I0718 09:50:08.328213 85502 memory_optimize_pass.cc:246] Cluster name : gelu_3.tmp_0  size: 3744
I0718 09:50:08.328217 85502 memory_optimize_pass.cc:246] Cluster name : layer_norm_21.tmp_2  size: 1248
I0718 09:50:08.328218 85502 memory_optimize_pass.cc:246] Cluster name : token_type_ids  size: 8
--- Running analysis [ir_graph_to_program_pass]
I0718 09:50:08.361274 85502 analysis_predictor.cc:1676] ======= optimize end =======
I0718 09:50:08.361474 85502 naive_executor.cc:171] ---  skip [feed], feed -> token_type_ids
I0718 09:50:08.361481 85502 naive_executor.cc:171] ---  skip [feed], feed -> input_ids
I0718 09:50:08.362283 85502 naive_executor.cc:171] ---  skip [linear_77.tmp_1], fetch -> fetch
I0718 09:50:08.362777 85502 onednn_context.cc:81] oneDNN v3.1.1
==inferNoMKL==
news_finance
news_house
news_house
news_edu
news_finance
==inferMKL==
news_finance
news_house
news_house
news_edu
news_finance

Which platform are you using for testing? Can you help to compare the log with yours?

engineer1109 · 2023-07-18T02:14:47Z

@xinyu-intel
This is my log.

--- Running analysis [ir_graph_build_pass]
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0718 10:11:44.220561 2153053 executor.cc:187] Old Executor is Running.
--- Running analysis [ir_analysis_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [layer_norm_fuse_pass]
--- Running IR pass [attention_lstm_fuse_pass]
--- Running IR pass [seqconv_eltadd_relu_fuse_pass]
--- Running IR pass [seqpool_cvm_concat_fuse_pass]
--- Running IR pass [mul_lstm_fuse_pass]
--- Running IR pass [fc_gru_fuse_pass]
--- Running IR pass [mul_gru_fuse_pass]
--- Running IR pass [seq_concat_fc_fuse_pass]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [matmul_v2_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
I0718 10:11:44.254137 2153053 fuse_pass_base.cc:59] ---  detected 26 subgraphs
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
I0718 10:11:44.254751 2153053 fuse_pass_base.cc:59] ---  detected 8 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
I0718 10:11:44.259657 2153053 fuse_pass_base.cc:59] ---  detected 26 subgraphs
--- Running IR pass [repeated_fc_relu_fuse_pass]
--- Running IR pass [squared_mat_sub_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [constant_folding_pass]
I0718 10:11:44.264782 2153053 fuse_pass_base.cc:59] ---  detected 1 subgraphs
--- Running analysis [save_optimized_model_pass]
W0718 10:11:44.265024 2153053 save_optimized_model_pass.cc:28] save_optim_cache_model is turned off, skip save_optimized_model_pass
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I0718 10:11:44.265228 2153053 memory_optimize_pass.cc:118] The persistable params in main graph are : 63.8833MB
I0718 10:11:44.266413 2153053 memory_optimize_pass.cc:246] Cluster name : token_type_ids  size: 8
I0718 10:11:44.266417 2153053 memory_optimize_pass.cc:246] Cluster name : transpose_14.tmp_0  size: 936
I0718 10:11:44.266417 2153053 memory_optimize_pass.cc:246] Cluster name : transpose_1.tmp_0  size: 936
I0718 10:11:44.266419 2153053 memory_optimize_pass.cc:246] Cluster name : tmp_5  size: 1248
I0718 10:11:44.266422 2153053 memory_optimize_pass.cc:246] Cluster name : layer_norm_21.tmp_2  size: 1248
I0718 10:11:44.266422 2153053 memory_optimize_pass.cc:246] Cluster name : linear_74.tmp_1  size: 3744
I0718 10:11:44.266424 2153053 memory_optimize_pass.cc:246] Cluster name : unsqueeze2_0.tmp_0  size: 4
I0718 10:11:44.266425 2153053 memory_optimize_pass.cc:246] Cluster name : linear_68.tmp_1  size: 3744
--- Running analysis [ir_graph_to_program_pass]
I0718 10:11:44.290699 2153053 analysis_predictor.cc:1676] ======= optimize end =======
I0718 10:11:44.294454 2153053 naive_executor.cc:171] ---  skip [feed], feed -> token_type_ids
I0718 10:11:44.294461 2153053 naive_executor.cc:171] ---  skip [feed], feed -> input_ids
I0718 10:11:44.295307 2153053 naive_executor.cc:171] ---  skip [linear_77.tmp_1], fetch -> fetch
I0718 10:11:44.359247 2153053 analysis_predictor.cc:1470] MKLDNN is enabled
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [mkldnn_placement_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [layer_norm_fuse_pass]
--- Running IR pass [attention_lstm_fuse_pass]
--- Running IR pass [seqconv_eltadd_relu_fuse_pass]
--- Running IR pass [seqpool_cvm_concat_fuse_pass]
--- Running IR pass [mul_lstm_fuse_pass]
--- Running IR pass [fc_gru_fuse_pass]
--- Running IR pass [mul_gru_fuse_pass]
--- Running IR pass [seq_concat_fc_fuse_pass]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [matmul_v2_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
I0718 10:11:44.395046 2153053 fuse_pass_base.cc:59] ---  detected 26 subgraphs
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
I0718 10:11:44.395629 2153053 fuse_pass_base.cc:59] ---  detected 8 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
I0718 10:11:44.400547 2153053 fuse_pass_base.cc:59] ---  detected 26 subgraphs
--- Running IR pass [repeated_fc_relu_fuse_pass]
--- Running IR pass [squared_mat_sub_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [constant_folding_pass]
I0718 10:11:44.405717 2153053 fuse_pass_base.cc:59] ---  detected 1 subgraphs
--- Running IR pass [squeeze2_transpose2_onednn_fuse_pass]
--- fused 0 squeeze2 with transpose2
--- Running IR pass [depthwise_conv_mkldnn_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_affine_channel_mkldnn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_bias_mkldnn_fuse_pass]
--- Running IR pass [conv_transpose_bias_mkldnn_fuse_pass]
--- Running IR pass [conv_elementwise_add_mkldnn_fuse_pass]
--- Running IR pass [conv_activation_mkldnn_fuse_pass]
--- Running IR pass [scale_matmul_fuse_pass]
I0718 10:11:44.410969 2153053 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 scale with matmul
--- Running IR pass [reshape_transpose_matmul_mkldnn_fuse_pass]
I0718 10:11:44.414777 2153053 fuse_pass_base.cc:59] ---  detected 12 subgraphs
---    fused 12 reshape + transpose + matmul with reshape's xshape with transpose's xshape
--- Running IR pass [matmul_transpose_reshape_mkldnn_fuse_pass]
I0718 10:11:44.417080 2153053 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 fused_matmul + transpose + reshape patterns
--- Running IR pass [matmul_elementwise_add_mkldnn_fuse_pass]
I0718 10:11:44.417721 2153053 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 fused_matmul (as x) with elementwise_add
--- Running IR pass [matmul_activation_mkldnn_fuse_pass]
--- Running IR pass [fc_mkldnn_pass]
I0718 10:11:44.421105 2153053 fuse_pass_base.cc:59] ---  detected 26 subgraphs
---    enabled FC MKL-DNN for 26 fc ops 
--- Running IR pass [fc_act_mkldnn_fuse_pass]
I0718 10:11:44.421675 2153053 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 fc with gelu activation
I0718 10:11:44.422614 2153053 fuse_pass_base.cc:59] ---  detected 1 subgraphs
---    fused 1 fc with tanh activation
--- Running IR pass [fc_elementwise_add_mkldnn_fuse_pass]
---    Fused 8 fc (as y) + elementwise_add patterns
I0718 10:11:44.425307 2153053 fuse_pass_base.cc:59] ---  detected 8 subgraphs
--- Running IR pass [self_attention_fuse_pass]
W0718 10:11:44.425390 2153053 self_attention_fuse_pass.cc:53] No-avx512 or MKL supported!
--- Running IR pass [batch_norm_act_fuse_pass]
--- Running IR pass [softplus_activation_onednn_fuse_pass]
--- Running IR pass [shuffle_channel_mkldnn_detect_pass]
--- Running IR pass [elementwise_act_onednn_fuse_pass]
--- Running IR pass [operator_scale_onednn_fuse_pass]
--- Running IR pass [operator_unsqueeze2_onednn_fuse_pass]
--- Running IR pass [operator_reshape2_onednn_fuse_pass]
--- Running analysis [save_optimized_model_pass]
W0718 10:11:44.428572 2153053 save_optimized_model_pass.cc:28] save_optim_cache_model is turned off, skip save_optimized_model_pass
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I0718 10:11:44.428772 2153053 memory_optimize_pass.cc:118] The persistable params in main graph are : 63.8833MB
I0718 10:11:44.429283 2153053 memory_optimize_pass.cc:246] Cluster name : unsqueeze2_0.tmp_0  size: 4
I0718 10:11:44.429286 2153053 memory_optimize_pass.cc:246] Cluster name : softmax_3.tmp_0  size: 36
I0718 10:11:44.429287 2153053 memory_optimize_pass.cc:246] Cluster name : tmp_22  size: 36
I0718 10:11:44.429288 2153053 memory_optimize_pass.cc:246] Cluster name : tmp_5  size: 1248
I0718 10:11:44.429291 2153053 memory_optimize_pass.cc:246] Cluster name : embedding_10.tmp_0  size: 1248
I0718 10:11:44.429291 2153053 memory_optimize_pass.cc:246] Cluster name : token_type_ids  size: 8
I0718 10:11:44.429293 2153053 memory_optimize_pass.cc:246] Cluster name : gelu_1.tmp_0  size: 3744
--- Running analysis [ir_graph_to_program_pass]
I0718 10:11:44.443881 2153053 analysis_predictor.cc:1676] ======= optimize end =======
I0718 10:11:44.445695 2153053 naive_executor.cc:171] ---  skip [feed], feed -> token_type_ids
I0718 10:11:44.445701 2153053 naive_executor.cc:171] ---  skip [feed], feed -> input_ids
I0718 10:11:44.446098 2153053 naive_executor.cc:171] ---  skip [linear_77.tmp_1], fetch -> fetch
I0718 10:11:44.446327 2153053 onednn_context.cc:81] oneDNN v3.1.1
==inferNoMKL==
news_finance
news_house
news_house
news_edu
news_finance
==inferMKL==
news_entertainment
news_travel
news_entertainment
news_car
news_entertainment

My env is ubuntu 20.04.5 GCC 9.4.0 5.15.0-76-generic

My ldd is

	linux-vdso.so.1 (0x00007ffe70bc9000)
	libcuBERT_tokenization.so (0x00007f1f90a7e000)
	libpaddle_inference.so => /media/wjl/D2/test/ernie_test/./third_party/paddle/lib/x86_64/libpaddle_inference.so (0x00007f1f87dc9000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f1f87bc7000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1f87bac000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1f879ba000)
	libutf8proc.so (0x00007f1f87961000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1f8793e000)
	libdnnl.so.3 (0x00007f1f84b52000)
	libiomp5.so (0x00007f1f8475d000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1f8460e000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f1f90abe000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1f84606000)
	libgomp.so.1 => /lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f1f845c4000)

I have used

export LD_LIBRARY_PATH=:./

to make sure that use the so under the install_x86_64

engineer1109 · 2023-07-18T02:18:56Z

CPU is 12th Gen Intel(R) Core(TM) i7-12700
No avx512 support

xinyu-intel · 2023-07-18T02:22:40Z

CPU is 12th Gen Intel(R) Core(TM) i7-12700 No avx512 support

okay. It seems the difference is that the paddle_inference.so is built under non-avx512 platform. I'll try to find another environment...

xinyu-intel · 2023-07-18T03:35:31Z

@engineer1109 It looks like the result come back to normal after pass fc_elementwise_add_mkldnn_fuse_pass disabled on my core-i7 environment. However, I'm still rooting cause some details inside the pass. I can submit a PR to temporarily disable the pass if your workload is urgent.

xinyu-intel · 2023-07-18T05:29:59Z

@engineer1109 try if #55504 can fix your case.

engineer1109 · 2023-07-18T07:50:28Z

@xinyu-intel Fixed Success.
Also fixed this bug of cpu mkl part. #54569

engineer1109 added status/new-issue 新建 type/bug-report 报bug labels Jul 17, 2023

paddle-bot bot assigned jiahy0825 Jul 17, 2023

carryyu assigned jiahy0825 and unassigned jiahy0825 Jul 17, 2023

paddle-bot bot added the PFCC Paddle Framework Contributor Club，https://github.com/PaddlePaddle/community/tree/master/pfcc label Jul 17, 2023

xinyu-intel added the Intel label Jul 18, 2023

xinyu-intel self-assigned this Jul 18, 2023

paddle-bot bot added the status/close 已关闭 label Jul 18, 2023

paddle-bot bot closed this as completed Jul 18, 2023

paddle-bot bot removed the status/new-issue 新建 label Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MKL]IR Pass Error when run ernie on inference #55477

[MKL]IR Pass Error when run ernie on inference #55477

engineer1109 commented Jul 17, 2023

engineer1109 commented Jul 17, 2023

xinyu-intel commented Jul 18, 2023

engineer1109 commented Jul 18, 2023

engineer1109 commented Jul 18, 2023

xinyu-intel commented Jul 18, 2023

xinyu-intel commented Jul 18, 2023 •

edited

Loading

xinyu-intel commented Jul 18, 2023

engineer1109 commented Jul 18, 2023

[MKL]IR Pass Error when run ernie on inference #55477

[MKL]IR Pass Error when run ernie on inference #55477

Comments

engineer1109 commented Jul 17, 2023

bug描述 Describe the Bug

其他补充信息 Additional Supplementary Information

engineer1109 commented Jul 17, 2023

xinyu-intel commented Jul 18, 2023

engineer1109 commented Jul 18, 2023

engineer1109 commented Jul 18, 2023

xinyu-intel commented Jul 18, 2023

xinyu-intel commented Jul 18, 2023 • edited Loading

xinyu-intel commented Jul 18, 2023

engineer1109 commented Jul 18, 2023

xinyu-intel commented Jul 18, 2023 •

edited

Loading