Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练ppocrv4 报错ValueError: (InvalidArgument) The input of Op(Conv) should be a 4-D or 5-D Tensor. #11200

Open
wangyang581 opened this issue Nov 6, 2023 · 4 comments
Assignees

Comments

@wangyang581
Copy link

wangyang581 commented Nov 6, 2023

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

  • 系统环境/System Environment:Ubuntu 20.04 环境为paddle官方docker
  • 版本号/Version:paddlepaddle/paddle:2.5.2-gpu-cuda11.2-cudnn8.2-trt8.0
  • Paddle:paddlepaddle-gpu:2.5.2.post112
  • PaddleOCR:release/2.7
  • 问题相关组件/Related components:tools/train.py
  • 运行指令/Command Code:python tools/train.py -c configs/rec/PP-OCRv4/ch_PP-OCRv4_rec_distill.yml
  • 完整报错/Complete Error Message:
    Traceback (most recent call last):
    File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "main", mod_spec)
    File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
    File "/root/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 39, in
    cli.main()
    File "/root/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
    File "/root/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="main")
    File "/root/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 322, in run_path
    pkg_name=pkg_name, script_name=fname)
    File "/root/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 136, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
    File "/root/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
    File "/paddle/tools/train.py", line 227, in
    main(config, device, logger, vdl_writer)
    File "/paddle/tools/train.py", line 202, in main
    amp_dtype)
    File "/paddle/tools/program.py", line 301, in train
    preds = model(images, data=batch[1:])
    File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in call
    return self.forward(*inputs, **kwargs)
    File "/paddle/ppocr/modeling/architectures/distillation_model.py", line 59, in forward
    result_dict[model_name] = self.model_list[idx](x, data)
    File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in call
    return self.forward(*inputs, **kwargs)
    File "/paddle/ppocr/modeling/architectures/base_model.py", line 100, in forward
    x = self.head(x, targets=data)
    File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in call
    return self.forward(*inputs, **kwargs)
    File "/paddle/ppocr/modeling/heads/rec_multi_head.py", line 92, in forward
    ctc_encoder = self.ctc_encoder(x)
    File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in call
    return self.forward(*inputs, **kwargs)
    File "/paddle/ppocr/modeling/necks/rnn.py", line 261, in forward
    x = self.encoder(x)
    File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in call
    return self.forward(*inputs, **kwargs)
    File "/paddle/ppocr/modeling/necks/rnn.py", line 208, in forward
    z = self.conv1(z)
    File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in call
    return self.forward(*inputs, **kwargs)
    File "/paddle/ppocr/modeling/backbones/rec_svtrnet.py", line 68, in forward
    out = self.conv(inputs)
    File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in call
    return self.forward(*inputs, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/conv.py", line 722, in forward
    use_cudnn=self._use_cudnn,
    File "/usr/local/lib/python3.7/dist-packages/paddle/nn/functional/conv.py", line 141, in _conv_nd
    data_format,
    ValueError: (InvalidArgument) The input of Op(Conv) should be a 4-D or 5-D Tensor. But received: input's dimension is 3, input's shape is [8, 240, 256].
    [Hint: Expected in_dims.size() == 4 || in_dims.size() == 5 == true, but received in_dims.size() == 4 || in_dims.size() == 5:0 != true:1.] (at ../paddle/phi/infermeta/binary.cc:475)

我镜像中用相同的数据可以用ch_PP-OCRv4_rec_hgnet.yml配置文件训练,也可以用v3的配置文件训练,只有ch_PP-OCRv4_rec_distill.yml这个配置文件报错。

我采用的ch_PP-OCRv4_rec_distill.yml配置文件的内容如下:

Global:
debug: false
use_gpu: true
epoch_num: 200
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_dkd_400w_svtr_ctc_lcnet_blank_dkd0.1/
save_epoch_step: 40
eval_batch_step:

  • 0
  • 2000
    cal_metric_during_train: true
    pretrained_model: ./pre_train/rec/ch_PP-OCRv4_rec_train/student.pdparams
    checkpoints:
    save_inference_dir: doc/imgs_words/ch/
    use_visualdl: false
    infer_img: doc/imgs_words/ch/word_1.jpg
    character_dict_path: ppocr/utils/ppocr_keys_v1.txt
    max_text_length: &max_text_length 25
    infer_mode: false
    use_space_char: true
    distributed: true
    save_res_path: ./output/rec/predicts_ppocrv3.txt
    Optimizer:
    name: Adam
    beta1: 0.9
    beta2: 0.999
    lr:
    name: Cosine
    learning_rate: 0.001
    warmup_epoch: 2
    regularizer:
    name: L2
    factor: 3.0e-05
    Architecture:
    model_type: rec
    name: DistillationModel
    algorithm: Distillation
    Models:
    Teacher:
    pretrained:
    freeze_params: true
    return_all_feats: true
    model_type: rec
    algorithm: SVTR
    Transform: null
    Backbone:
    name: SVTRNet
    img_size:
    - 48
    - 320
    out_char_num: 40
    out_channels: 192
    patch_merging: Conv
    embed_dim:
    - 64
    - 128
    - 256
    depth:
    - 3
    - 6
    - 3
    num_heads:
    - 2
    - 4
    - 8
    mixer:
    - Conv
    - Conv
    - Conv
    - Conv
    - Conv
    - Conv
    - Global
    - Global
    - Global
    - Global
    - Global
    - Global
    local_mixer:
    - - 5
    - 5
    - - 5
    - 5
    - - 5
    - 5
    last_stage: false
    prenorm: true
    Head:
    name: MultiHead
    head_list:
    - CTCHead:
    Neck:
    name: svtr
    dims: 120
    depth: 2
    hidden_dims: 120
    kernel_size: [1, 3]
    use_guide: True
    Head:
    fc_decay: 0.00001
    - NRTRHead:
    nrtr_dim: 384
    max_text_length: *max_text_length
    Student:
    pretrained:
    freeze_params: false
    return_all_feats: true
    model_type: rec
    algorithm: SVTR
    Transform: null
    Backbone:
    name: PPLCNetV3
    scale: 0.95
    Head:
    name: MultiHead
    head_list:
    - CTCHead:
    Neck:
    name: svtr
    dims: 120
    depth: 2
    hidden_dims: 120
    kernel_size: [1, 3]
    use_guide: True
    Head:
    fc_decay: 0.00001
    - NRTRHead:
    nrtr_dim: 384
    max_text_length: *max_text_length
    Loss:
    name: CombinedLoss
    loss_config_list:
  • DistillationDKDLoss:
    weight: 0.1
    model_name_pairs:
      • Student
      • Teacher
        key: head_out
        multi_head: true
        alpha: 1.0
        beta: 2.0
        dis_head: gtc
        name: dkd
  • DistillationCTCLoss:
    weight: 1.0
    model_name_list:
    • Student
      key: head_out
      multi_head: true
  • DistillationNRTRLoss:
    weight: 1.0
    smoothing: false
    model_name_list:
    • Student
      key: head_out
      multi_head: true
  • DistillCTCLogits:
    weight: 1.0
    reduction: mean
    model_name_pairs:
      • Student
      • Teacher
        key: head_out
        PostProcess:
        name: DistillationCTCLabelDecode
        model_name:
  • Student
    key: head_out
    multi_head: true
    Metric:
    name: DistillationMetric
    base_metric_name: RecMetric
    main_indicator: acc
    key: Student
    ignore_space: false
    Train:
    dataset:
    name: SimpleDataSet
    data_dir: ./train_data/lpd_rec
    label_file_list:
    • ./train_data/lpd_rec/train.txt
      ratio_list:

    • 1.0
      transforms:

    • DecodeImage:
      img_mode: BGR
      channel_first: false

    • RecAug:

    • MultiLabelEncode:
      gtc_encode: NRTRLabelEncode

    • RecResizeImg:
      image_shape: [3, 48, 320]

    • KeepKeys:
      keep_keys:

      • image
      • label_ctc
      • label_gtc
      • length
      • valid_ratio
        loader:
        shuffle: true
        batch_size_per_card: 8
        drop_last: true
        num_workers: 2
        use_shared_memory: true
        Eval:
        dataset:
        name: SimpleDataSet
        data_dir: ./train_data/lpd_rec
        label_file_list:
    • ./train_data/lpd_rec/test.txt
      transforms:

    • DecodeImage:
      img_mode: BGR
      channel_first: false

    • MultiLabelEncode:
      gtc_encode: NRTRLabelEncode

    • RecResizeImg:
      image_shape: [3, 48, 320]

    • KeepKeys:
      keep_keys:

      • image
      • label_ctc
      • label_gtc
      • length
      • valid_ratio
        loader:
        shuffle: false
        drop_last: false
        batch_size_per_card: 8
        num_workers: 2
        profiler_options: null

请问我要如何修改呢?

@704572066
Copy link

很多人都遇到这个问题

@lizhq
Copy link

lizhq commented Dec 5, 2023

yes , 此预训练模型对应下面配置文件

Global:
debug: false
use_gpu: true
epoch_num: 20
log_smooth_window: 20
print_batch_step: 1
save_model_dir: /home/aistudio/output/Student/pt
save_epoch_step: 20
eval_batch_step:

  • 0
  • 2000
    cal_metric_during_train: true
    pretrained_model: https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_rec_train/student.pdparams
    checkpoints: ''
    save_inference_dir: null
    use_visualdl: true
    infer_img: doc/imgs_words/ch/word_1.jpg
    character_dict_path: /home/aistudio/paddleXdata/dict.txt
    max_text_length: 25
    infer_mode: false
    use_space_char: true
    distributed: true
    save_res_path: ./output/rec/predicts_ppocrv3.txt
    eval_batch_epoch: 5
    use_xpu: false
    use_npu: false
    use_mlu: false
    to_static: false
    use_amp: false
    amp_level: 'OFF'
    Optimizer:
    name: Adam
    beta1: 0.9
    beta2: 0.999
    lr:
    name: Cosine
    learning_rate: 0.0001
    warmup_epoch: 3
    regularizer:
    name: L2
    factor: 3.0e-05
    Architecture:
    model_type: rec
    algorithm: SVTR_LCNet
    Transform: null
    Backbone:
    name: PPLCNetV3
    scale: 0.95
    Head:
    name: MultiHead
    head_list:
    • CTCHead:
      Neck:
      name: svtr
      dims: 120
      depth: 2
      hidden_dims: 120
      kernel_size:
      - 1
      - 3
      use_guide: true
      Head:
      fc_decay: 1.0e-05
    • NRTRHead:
      nrtr_dim: 384
      max_text_length: 25
      Loss:
      name: MultiLoss
      loss_config_list:
  • CTCLoss: null
  • NRTRLoss: null
    PostProcess:
    name: CTCLabelDecode
    Metric:
    name: RecMetric
    main_indicator: acc
    Train:
    dataset:
    name: MSTextRecDataset
    ds_width: false
    data_dir: /home/aistudio/paddleXdata
    ext_op_transform_idx: 1
    label_file_list:
    • /home/aistudio/output/Teacher/teacher_best/merged_label.txt
      transforms:
    • DecodeImage:
      img_mode: BGR
      channel_first: false
    • RecConAug:
      prob: 0.5
      ext_data_num: 2
      image_shape:
      • 48
      • 320
      • 3
        max_text_length: 25
    • RecAug: null
    • MultiLabelEncode:
      gtc_encode: NRTRLabelEncode
    • KeepKeys:
      keep_keys:
      • image
      • label_ctc
      • label_gtc
      • length
      • valid_ratio
        sampler:
        name: MultiScaleSampler
        scales:
      • 320
      • 32
      • 320
      • 48
      • 320
      • 64
        first_bs: 64
        fix_bs: false
        divided_factor:
    • 8
    • 16
      is_training: true
      loader:
      shuffle: true
      batch_size_per_card: 64
      drop_last: true
      num_workers: 8
      Eval:
      dataset:
      name: TextRecDataset
      data_dir: /home/aistudio/paddleXdata
      label_file_list:
    • /home/aistudio/paddleXdata/val.txt
      transforms:
    • DecodeImage:
      img_mode: BGR
      channel_first: false
    • MultiLabelEncode:
      gtc_encode: NRTRLabelEncode
    • RecResizeImg:
      image_shape:
      • 3
      • 48
      • 320
    • KeepKeys:
      keep_keys:
      • image
      • label_ctc
      • label_gtc
      • length
      • valid_ratio
        loader:
        shuffle: false
        drop_last: false
        batch_size_per_card: 64
        num_workers: 4
        profiler_options: null

@fxfxfxfxfxfxfxfx
Copy link

请问这个·问题解决了吗

@leduy-it
Copy link

@tink2123 can you take a look of this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants