-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixed fulltext block start #714
Conversation
I am currently running into the issue described in #712 (comment) |
I got around the problem by manually creating |
@@ -1,30 +1,42 @@ | |||
package org.grobid.core.engines; | |||
|
|||
import org.apache.commons.lang3.tuple.Pair; | |||
import org.grobid.core.analyzers.GrobidAnalyzer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are quite a few unused imports. Might be good to tidy it up (and remove commented out code, perhaps mark test to be ignored instead).
@@ -43,6 +55,41 @@ public static void tearDown() { | |||
GrobidFactory.reset(); | |||
} | |||
|
|||
public DocumentPiece getWholeDocumentPiece(Document doc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe getWholeDocumentPiece
and getWholeDocumentParts
could be moved to a central place, if it doesn't exist already.
Other parsers are potentially affected as well. There appear to be a lot of duplication and and could probably be refactored. |
Raised suggestion for refactoring the features: #718 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble understanding the whole picture here, so I'm not sure I can really review this part.
There is an example in the issue #712 - happy to add more information? |
I have regenerated the fulltext model feature files with the fix, but actually there is no difference with before with respect to starting block. I also retrained the model (before checking the difference in features), in branch https://github.com/kermitt2/grobid/tree/fix-712 It might appear in some new files like in your example, but the benchmark on PMC set only changes at the second decimal, so it should be minor. So we could simply merge the fix, but no need to update the training files and model for this. |
resolves #712