Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SeqEvalEvaluator don't take empty label "O" into account. Thus lowering the F1 weighted score. #18

Open
Thibeb opened this issue Feb 9, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@Thibeb
Copy link
Contributor

Thibeb commented Feb 9, 2024

Finetuning using the medkit library gives worse results than when directly using the hugging face library.

When fine-tuning a BERT model using the HFEntityMatcherTrainable component from the medkit library, the metrics I got afterward when evaluating it were far worst than the same training using the hugging face API only.

This was found in this tutorial were engineers trained the same model with the same corpus than I did and got much bette results.

After reproducing their code on my machine, I got the same results as they did.

They achieved an averaged F1 score of approximately 0.90 were as my medkit trained BERT model was achieving an averaged F1 score of 0.62.

After some investigation

I found out that this gap was explained by the fact that, when they evaluated their model with the classification_report function from sklearn, the NO_ENTITY label (usually noted as "O" was taken into account for the calculation of the averaged F1 score.

The "O" label being logically everywhere in the samples, the F1 score was thus highly augmented.

When discarding this label from the average F1 calculation I found out that they're model performed as much as mine.

In the medkit library, the SeqEvalEvaluator discard the F1 score of the "O" label, which lowers the averaged F1.

It is not a problem but it would be interesting to have the option with SeqEval to choose whether or not to take the "O" label into account to give a bit more of options when evaluating predictions.

@ghisvail ghisvail added the enhancement New feature or request label Feb 19, 2024
@ghisvail
Copy link
Contributor

We were given a very insightful talk from @Rian-T recently. He touched on the topic of evaluation, and mentioned this particular point.

From what I remembered, he said that there was not a consensus on whether to include or not class O in the evaluation. Perhaps the best option would be to let users choose via a parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants