Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NER嵌套实体识别bug #169

Open
tianchiguaixia opened this issue May 29, 2024 · 4 comments
Open

NER嵌套实体识别bug #169

tianchiguaixia opened this issue May 29, 2024 · 4 comments

Comments

@tianchiguaixia
Copy link

1.当我使用您的sequence_labeling下方的NER脚本时候,发现你的数据集全是BIO格式。这种格式遇到嵌套实体的NER会存在漏标的情况。
2.最好的方法,还是按照doccano标注工具导出的格式,去训练。
3.希望可以支持嵌套实体标注的数据集训练
image

[CMeEE-V2_dev.json](https://github.com/Tongjilibo/bert4torch/files/15478305/CMeEE-V2_dev.json)

@Tongjilibo
Copy link
Owner

我的理解是缺少嵌套实体的数据处理代码是吗?不过这块用户自己处理也是可以的吧?

@tianchiguaixia
Copy link
Author

BIO不支持嵌套实体标注。建议,这个脚本写成通用的。兼容通一训练

@Tongjilibo
Copy link
Owner

BIO不支持嵌套实体标注。建议,这个脚本写成通用的。兼容通一训练

看看回头增加一个这种格式的demo吧,本来数据处理过程就是开放的,大家自行适配就可以了

@tianchiguaixia
Copy link
Author

对的。其实目前更开放的格式是start和end。只需要支持这种,就适配各种任务。BIO太局限了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants