Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

你好,在微调fine-tuning/run_classifier.py在运行时报错run_classifier.py: error: unrecognized arguments: --vocab_path models/encryptd_vocab.txt,我查了run_classifier.py并没有看到vocab_path参数的定义,请问怎么解决?谢谢 #100

Open
fjlinww opened this issue Dec 22, 2024 · 12 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@fjlinww
Copy link

fjlinww commented Dec 22, 2024

python3 fine-tuning/run_classifier.py --pretrained_model_path models/pre-trained_model.bin
--vocab_path models/encryptd_vocab.txt
--train_path datasets/fine-tuning_dataset/cstnet-tls1.3/packet/train_dataset.tsv
--dev_path datasets/fine-tuning_dataset/cstnet-tls1.3/packet/valid_dataset.tsv
--test_path datasets/fine-tuning_dataset/cstnet-tls1.3/packet/test_dataset.tsv
--epochs_num 10 --batch_size 32 --embedding word pos seg
--encoder transformer --mask fully_visible
--seq_length 128 --learning_rate 2e-5
usage: run_classifier.py [-h] [--pretrained_model_path PRETRAINED_MODEL_PATH] [--output_model_path OUTPUT_MODEL_PATH] --train_path TRAIN_PATH --dev_path DEV_PATH [--test_path TEST_PATH]
[--config_path CONFIG_PATH] [--embedding {word,pos,seg,sinusoidalpos,dual} [{word,pos,seg,sinusoidalpos,dual} ...]]
[--tgt_embedding {word,pos,seg,sinusoidalpos,dual} [{word,pos,seg,sinusoidalpos,dual} ...]] [--max_seq_length MAX_SEQ_LENGTH] [--relative_position_embedding] [--share_embedding]
[--remove_embedding_layernorm] [--factorized_embedding_parameterization] [--encoder {transformer,rnn,lstm,gru,birnn,bilstm,bigru,gatedcnn,dual}] [--decoder {None,transformer}]
[--mask {fully_visible,causal,causal_with_prefix}] [--layernorm_positioning {pre,post}] [--feed_forward {dense,gated}] [--relative_attention_buckets_num RELATIVE_ATTENTION_BUCKETS_NUM]
[--remove_attention_scale] [--remove_transformer_bias] [--layernorm {normal,t5}] [--bidirectional] [--parameter_sharing] [--has_residual_attention] [--has_lmtarget_bias]
[--target {sp,lm,mlm,bilm,cls} [{sp,lm,mlm,bilm,cls} ...]] [--tie_weights] [--pooling {mean,max,first,last}] [--prefix_lm_loss] [--learning_rate LEARNING_RATE] [--warmup WARMUP]
[--lr_decay LR_DECAY] [--optimizer {adamw,adafactor}] [--scheduler {linear,cosine,cosine_with_restarts,polynomial,constant,constant_with_warmup,inverse_sqrt,tri_stage}]
[--batch_size BATCH_SIZE] [--seq_length SEQ_LENGTH] [--dropout DROPOUT] [--epochs_num EPOCHS_NUM] [--report_steps REPORT_STEPS] [--seed SEED] [--log_path LOG_PATH]
[--log_level {ERROR,INFO,DEBUG,NOTSET}] [--log_file_level {ERROR,INFO,DEBUG,NOTSET}] [--pooling-type {mean,max,first,last}] [--tokenizer {bert,char,space}] [--soft_targets]
[--soft_alpha SOFT_ALPHA]
run_classifier.py: error: unrecognized arguments: --vocab_path models/encryptd_vocab.txt

@fjlinww fjlinww changed the title run_classifier.py: error: unrecognized arguments: --vocab_path models/encryptd_vocab.txt 你好,在微调fine-tuning/run_classifier.py在运行时报错run_classifier.py: error: unrecognized arguments: --vocab_path models/encryptd_vocab.txt,我查了run_classifier.py并没有看到vocab_path参数的定义,请问怎么解决?谢谢 Dec 22, 2024
@linwhitehat linwhitehat added the help wanted Extra attention is needed label Dec 23, 2024
@linwhitehat
Copy link
Owner

你好,根据你反馈的错误,应该是参数识别错误,你可以试试以下命令:

python3 fine-tuning/run_classifier.py --pretrained_model_path models/pre-trained_model.bin \
                                   --vocab_path models/encryptd_vocab.txt \
                                   --train_path datasets/cstnet-tls1.3/packet/train_dataset.tsv \
                                   --dev_path datasets/cstnet-tls1.3/packet/valid_dataset.tsv \
                                   --test_path datasets/cstnet-tls1.3/packet/test_dataset.tsv \
                                   --epochs_num 10 --batch_size 32 --embedding word_pos_seg \
                                   --encoder transformer --mask fully_visible \
                                   --seq_length 128 --learning_rate 2e-5

相关使用命令参数可以参考说明using-et-bert

@fjlinww
Copy link
Author

fjlinww commented Dec 23, 2024

感谢回复!其实这个错误是这么来的:

  1. 按仓库首页readme命令执行微调时会先报
    File "/home/fjlinww/ET-BERT/fine-tuning/run_classifier.py", line 10, in
    from uer.layers import *
    ModuleNotFoundError: No module named 'uer'
    通过export PYTHONPATH=$PYTHONPATH:/home/fjlinww/ET-BERT/uer解决了

  2. 继续执行会报错run_classifier.py: error: argument --embedding: invalid choice: 'word_pos_seg' (choose from 'word', 'pos', 'seg', 'sinusoidalpos', 'dual')
    这个参数是来自main函数的finetune_opts(parser),查看了uer.opts.py当中的finetune_opts
    parser.add_argument("--embedding", choices=["word", "pos", "seg", "sinusoidalpos", "dual"], default="word", nargs='+',
    所以我改成了--embedding word pos seg

  3. 继续执行就会报错run_classifier.py: error: unrecognized arguments: --vocab_path models/encryptd_vocab.txt
    这个参数也是来自main函数的finetune_opts(parser),查看了uer.opts.py当中的tokenizer_opts
    parser.add_argument("--vocab_path", default=None, type=str, help="Path of the vocabulary file.")
    但是目前的run_classifier.py没有类似tokenizer_opts(parser)的,不知道是不是要加上才能够解析
    image

@linwhitehat
Copy link
Owner

感谢回复!其实这个错误是这么来的:

  1. 按仓库首页readme命令执行微调时会先报
    File "/home/fjlinww/ET-BERT/fine-tuning/run_classifier.py", line 10, in
    from uer.layers import *
    ModuleNotFoundError: No module named 'uer'
    通过export PYTHONPATH=$PYTHONPATH:/home/fjlinww/ET-BERT/uer解决了
  2. 继续执行会报错run_classifier.py: error: argument --embedding: invalid choice: 'word_pos_seg' (choose from 'word', 'pos', 'seg', 'sinusoidalpos', 'dual')
    这个参数是来自main函数的finetune_opts(parser),查看了uer.opts.py当中的finetune_opts
    parser.add_argument("--embedding", choices=["word", "pos", "seg", "sinusoidalpos", "dual"], default="word", nargs='+',
    所以我改成了--embedding word pos seg
  3. 继续执行就会报错run_classifier.py: error: unrecognized arguments: --vocab_path models/encryptd_vocab.txt
    这个参数也是来自main函数的finetune_opts(parser),查看了uer.opts.py当中的tokenizer_opts
    parser.add_argument("--vocab_path", default=None, type=str, help="Path of the vocabulary file.")
    但是目前的run_classifier.py没有类似tokenizer_opts(parser)的,不知道是不是要加上才能够解析
    image

十分抱歉出现这个错误,我们大概了解情况了,这个错误可能和前阵子更新了uer的文件有关系,有一些参数的适配项没有同步。如果迫切的话,你可以尝试回退一下uer中的文件版本试试。

@fjlinww
Copy link
Author

fjlinww commented Dec 24, 2024

我clone的是最新的代码,目前暂时没有看到除了main以外的其他分支了

@linwhitehat
Copy link
Owner

我clone的是最新的代码,目前暂时没有看到除了main以外的其他分支了

由于我们目前没有空闲的资源进行uer代码的验证,所以已经回退uer的旧版本,你可以使用已更新的相应仓库内容进行替换。后续我们将在更新新版uer时进行测试并更新其余相关文件与代码。

@linwhitehat linwhitehat added the bug Something isn't working label Dec 24, 2024
@fjlinww
Copy link
Author

fjlinww commented Dec 24, 2024

感谢!我重新clone,已经没有之前的报错了。
https://github.com/linwhitehat/ET-BERT?tab=readme-ov-file#using-et-bert
这里--pretrained_model_path models/pre-trained_model.bin要改成--pretrained_model_path models/pretrained_model.bin,才会跟您提供的保持一致

@linwhitehat
Copy link
Owner

感谢!我重新clone,已经没有之前的报错了。 https://github.com/linwhitehat/ET-BERT?tab=readme-ov-file#using-et-bert 这里--pretrained_model_path models/pre-trained_model.bin要改成--pretrained_model_path models/pretrained_model.bin,才会跟您提供的保持一致

好的,已修正。

@fjlinww
Copy link
Author

fjlinww commented Dec 25, 2024

您好,微调的时候发现了新的问题
~/ET-BERT$ python3 fine-tuning/run_classifier.py --pretrained_model_path models/pretrained_model.bin
--vocab_path models/encryptd_vocab.txt
--train_path datasets/fine-tuning_dataset/cstnet-tls1.3/packet/train_dataset.tsv
--dev_path datasets/fine-tuning_dataset/cstnet-tls1.3/packet/valid_dataset.tsv
--test_path datasets/fine-tuning_dataset/cstnet-tls1.3/packet/test_dataset.tsv
--epochs_num 10 --batch_size 32 --embedding word_pos_seg
--encoder transformer --mask fully_visible
--seq_length 128 --learning_rate 2e-5
/home/fjlinww/ET-BERT/fine-tuning/run_classifier.py:90: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
model.load_state_dict(torch.load(args.pretrained_model_path, map_location=map_location), strict=False)
Batch size: 32
The number of training instances: 465367
2 GPUs are available. Let's use them.
Start training.
0%| | 0/10 [00:00<?, ?it/s]

进度一直是0,我的环境是A6000x2,存储也是足够的
image

由于我只有2块GPU,所以我对应修改了fine-tuning/run_classifier.py:

image

只是进度一直是0,请教可能是什么原因?微调数据集是https://drive.google.com/drive/folders/1KlZatGoNm-4qu04z0LfrTpZr2oDaHfzr

@weiyuhao2021
Copy link

可以试试在脚本里添加
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
或者在shell里export一下
有的时候卡住是因为有bug,但是多卡并发的时候可能不报错

@LeiPudd
Copy link

LeiPudd commented Jan 10, 2025

感谢!我重新clone,已经没有之前的报错了。 https://github.com/linwhitehat/ET-BERT?tab=readme-ov-file#using-et-bert 这里--pretrained_model_path models/pre-trained_model.bin要改成--pretrained_model_path models/pretrained_model.bin,才会跟您提供的保持一致

你好请问,我clone的是最新的代码,但是还是报这个错误ModuleNotFoundError: No module named 'uer',怎么调整一下?

@weiyuhao2021
Copy link

感谢!我重新clone,已经没有之前的报错了。 https://github.com/linwhitehat/ET-BERT?tab=readme-ov-file#using-et-bert 这里--pretrained_model_path models/pre-trained_model.bin要改成--pretrained_model_path models/pretrained_model.bin,才会跟您提供的保持一致

你好请问,我clone的是最新的代码,但是还是报这个错误ModuleNotFoundError: No module named 'uer',怎么调整一下?

应该需要把本项目涉及的uer项目加到python解释器的path中

@Chen9crane
Copy link

感谢!我重新clone,已经没有之前的报错了。 https://github.com/linwhitehat/ET-BERT?tab=readme-ov-file#using-et-bert 这里--pretrained_model_path models/pre-trained_model.bin要改成--pretrained_model_path models/pretrained_model.bin,才会跟您提供的保持一致

你好,请问你重新clone的是哪个版本?我clone当前最新版本后微调时仍然遇到uer相关的报错。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants