Missing information from the extracted content #566

ajinkya2903 · 2020-04-09T04:39:21Z

Hi, I have re-trained a grobid using 50 pdfs. These pdfs are technical pdfs. I want to extract the total text content from the pdfs. But grobid missed some content from pdfs though we have retrained it. I am not understanding even after retraining it is missing the contents of the pdfs. Can you tell me the solution for this?

lfoppiano · 2020-04-09T04:59:54Z

Dear @ajinkya2903 you need to give some more information in order for us to help you. Which content do you want to extract?

Also before doing that, please read the documentation.
There is a whole section explaining how to create training data and use them:
https://grobid.readthedocs.io/en/latest/training/General-principles/

In addition, I also suggest you to do a search in the previous opened issues from other people, there are few on this same subject, I'm pulling out some of them:

ajinkya2903 closed this as completed Aug 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing information from the extracted content #566

Missing information from the extracted content #566

ajinkya2903 commented Apr 9, 2020

lfoppiano commented Apr 9, 2020

Missing information from the extracted content #566

Missing information from the extracted content #566

Comments

ajinkya2903 commented Apr 9, 2020

lfoppiano commented Apr 9, 2020