Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing information from the extracted content #566

Closed
ajinkya2903 opened this issue Apr 9, 2020 · 1 comment
Closed

Missing information from the extracted content #566

ajinkya2903 opened this issue Apr 9, 2020 · 1 comment

Comments

@ajinkya2903
Copy link

Hi, I have re-trained a grobid using 50 pdfs. These pdfs are technical pdfs. I want to extract the total text content from the pdfs. But grobid missed some content from pdfs though we have retrained it. I am not understanding even after retraining it is missing the contents of the pdfs. Can you tell me the solution for this?

@lfoppiano
Copy link
Collaborator

Dear @ajinkya2903 you need to give some more information in order for us to help you. Which content do you want to extract?

Also before doing that, please read the documentation.
There is a whole section explaining how to create training data and use them:
https://grobid.readthedocs.io/en/latest/training/General-principles/

In addition, I also suggest you to do a search in the previous opened issues from other people, there are few on this same subject, I'm pulling out some of them:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants