-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add parameter to optionally output raw results in the evaluation #117
Comments
There is already a |
@de-code I will check Ideally, the idea is to have the evaluation results and all the raw data used for calculating them. IMHO it's useful, for example, during the n-fold cross-validation when the data might be partitioned differently at each run because it's shuffled. |
I'm trying to find a way where to apply the change without changing the current design.
def inverse_transform(self, y):
"""
send back original label string
"""
indice_tag = {i: t for t, i in self.vocab_tag.items()}
return [indice_tag[y_] for y_ in y]
|
Apologies, I missed the n-fold use-case. That seems definitely valuable. In any case the tag command should provide an example how to get the values out. You may also want to be mindful of memory usage. I had issues when working with "larger" datasets (~2000 documents, still small in the DL world). I think I rewritten some parts to be more generator based. |
I am not sure whether you meant that just as an example. But the I think to get the x, you will need to collect what you are passing to the model, or the data generator.
In my project I have the added complexity of supporting sliding windows for individual documents (i.e. the same document may be split across batches, potentially on separate positions). Without sliding windows it should be fairly simple. The padding would be easy to filter out. You could also use the original document length as we only pad at the end. |
I've pushed a first implementation, which returns the raw information between each evaluation Example output evaluation
Example output n-fold
Few notes:
|
When I use my fulltext data, I believe the raw data would amount to more than a GB. Outputting that to stdout may make it difficult to use. Perhaps it would make sense to output it to separate files? |
Here some examples of the output: I've updated the initial comment to add all the parts for which these features should be implemented. I would implement it only for grobid, ner and classification. |
When we perform the n-fold cross validation or holdout evaluation, we would like to have the possibility to output the raw results (on a separate file) as we do in grobid. In this way we can compare what is expected and predicted for each evaluation task.
Components to be updated:
The text was updated successfully, but these errors were encountered: