We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.当我使用您的sequence_labeling下方的NER脚本时候,发现你的数据集全是BIO格式。这种格式遇到嵌套实体的NER会存在漏标的情况。 2.最好的方法,还是按照doccano标注工具导出的格式,去训练。 3.希望可以支持嵌套实体标注的数据集训练
[CMeEE-V2_dev.json](https://github.com/Tongjilibo/bert4torch/files/15478305/CMeEE-V2_dev.json)
The text was updated successfully, but these errors were encountered:
我的理解是缺少嵌套实体的数据处理代码是吗?不过这块用户自己处理也是可以的吧?
Sorry, something went wrong.
BIO不支持嵌套实体标注。建议,这个脚本写成通用的。兼容通一训练
看看回头增加一个这种格式的demo吧,本来数据处理过程就是开放的,大家自行适配就可以了
对的。其实目前更开放的格式是start和end。只需要支持这种,就适配各种任务。BIO太局限了
No branches or pull requests
1.当我使用您的sequence_labeling下方的NER脚本时候,发现你的数据集全是BIO格式。这种格式遇到嵌套实体的NER会存在漏标的情况。
![image](https://private-user-images.githubusercontent.com/29837553/334638566-6fa032d2-c946-42bc-83b4-095d21fc30f3.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5Nzc4MjgsIm5iZiI6MTczODk3NzUyOCwicGF0aCI6Ii8yOTgzNzU1My8zMzQ2Mzg1NjYtNmZhMDMyZDItYzk0Ni00MmJjLTgzYjQtMDk1ZDIxZmMzMGYzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDAxMTg0OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTAyNDg2MzIwNzNmNmQzYTUxZDg1NjkzODA2Y2NmZmY1MmM0M2YxZGU0ZmYxZjEwNjU4ZjY3ODExZmIxNmMyMjcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.PozIoN7YUiebo1PEyeQ9xftfLHaVfbGwKmO1-tHWXU4)
2.最好的方法,还是按照doccano标注工具导出的格式,去训练。
3.希望可以支持嵌套实体标注的数据集训练
The text was updated successfully, but these errors were encountered: