Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

加载数据处理器报错 #3

Open
lokking opened this issue Oct 17, 2024 · 1 comment
Open

加载数据处理器报错 #3

lokking opened this issue Oct 17, 2024 · 1 comment

Comments

@lokking
Copy link

lokking commented Oct 17, 2024

from transformers import LayoutXLMTokenizer, LayoutLMv3ImageProcessor, LayoutLMv3Processor

加载 Tokenizer 和 ImageProcessor

tokenizer = LayoutXLMTokenizer.from_pretrained(model_name_path)
image_processor = LayoutLMv3ImageProcessor.from_pretrained(model_name_path, apply_ocr=False)

创建 Processor

processor = LayoutLMv3Processor(tokenizer=tokenizer, image_processor=image_processor, apply_ocr=False)
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'XLMRobertaTokenizer'.
The class this function is called from is 'LayoutXLMTokenizer'.

@ckert-W
Copy link

ckert-W commented Nov 16, 2024

from transformers import LayoutXLMTokenizer, LayoutLMv3ImageProcessor, LayoutLMv3Processor

加载 Tokenizer 和 ImageProcessor

tokenizer = LayoutXLMTokenizer.from_pretrained(model_name_path) image_processor = LayoutLMv3ImageProcessor.from_pretrained(model_name_path, apply_ocr=False)

创建 Processor

processor = LayoutLMv3Processor(tokenizer=tokenizer, image_processor=image_processor, apply_ocr=False) The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'XLMRobertaTokenizer'. The class this function is called from is 'LayoutXLMTokenizer'.

我是将 transformers/models/layoutlmv3/processing_layoutlmv3 中的第49行改为: "tokenizer_class = ("LayoutLMv3Tokenizer", "LayoutLMv3TokenizerFast",'XLMRobertaTokenizer','XLMRobertaTokenizerFast','LayoutXLMTokenizer')" ,就可以正常运行了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants