-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCR-D Workflow integration #5697
Conversation
Just because this PR is the first to integrate OCR workflow files does not mean it is not generic. Again, there is nothing specific to OCR-D here. The contents of the files is arbitrary as far as Kitodo is concerned, and the OCR is still integrated merely as a ScriptTask.
Zeutschel/ABBYY decided they do not want to concern the users with details like selecting OCR workflows. That's understandable for a commercial service which is technically backed by a black box anyway. But they could have offered a workflow selector, too. Any OCR backend could – now. |
Okay, I'm out. Maybe I can not express my thinking or it is interpreted in the wrong way. Please remove me from the review. I'm out of this pull request. |
@markusweigelt Maybe you can explain where the OCR-D specificity comes from? I don't see it either. What would prevent someone from using whatever OCR they use now? |
I assumed that the workflows were OCR-D specific and that's why I had refactored it as part of the adjustements from the review. I think we all can agree on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR Technically, the result of the reviews is OK. But I wonder whether we need the functionality in this form.
What basically strikes me about this pull request is that a lot of effort is put into it for little additional functionality. Moreover, this can only be used with a special external application. At least if I understood it correctly. For me, it looks like this:
In my opinion, the ocrProfile
is technical metadata. The goal is, that this metadata can be defined in the production template. I would choose the following solution, without touching the source code at all: The metadata is created with the desired value in the ruleset with the domain="technical"
. (An additional wrapping ruleset can be created for exactly this setting using includes, if different settings for the same ruleset are to be stored in the system.) The ruleset with the metadata is used in the production template. So, when a process is created, the ocrProfile
is taken over into the process as the corresponding metadata. Here it can be used.
I see the directory for OCR profiles, that can be configured here, completely outside of KitodoProduction.
I think of a completely different (better) implementation: If you could edit a page of metadata in the production templates. So that you can start the metadata editor in the production template and set any metadata (allowed in the ruleset) there. This is then passed into all created processes.
That's my opinion on this. Technically, I haven't tried the pull request. The code looks OK when I looked at it.
Hi all, i have already tested out the OCR-D suite (https://github.com/slub/ocrd_manager, https://github.com/slub/ocrd_controller) in our Kitodo test system, which works good. And i am therefor trying to deploy that in production. I would appreciate if we could move forward in enabling a better integration of Kitodo and OCR-D. and want to make some comments here based on practical experience. I got the OCR-D integration running without using the changes of this pull request and i want to point out that we also have to think about the indented users of the proposed implementation. Right now i mostly think about two scenarios when doing an OCR in the context of out mass digitization
https://github.com/slub/ocrd_manager/blob/main/workflows/ocr-workflow-default.sh
https://ocr-d.de/en/workflows Apart from that i could derive most of the information from the metadata we encode as part of the description in Kitodo, mainly the language (English, German..) and the Script type (Antiqua, Fraktur). I am already injecting those parameters on the fly in my predefined workflows. The question for me then is: What is the additional value of tying a specific OCR-workflow ("OCR profile") to a production template? A production template could be associated with a project which holds many different type of materials which require different types of OCR. In general, a production template is determined by different considerations than the selection of a specific OCR. This pull request make the "OCR profile" also selectable by process. This is more useful, but do the content editors really have the necessary knowledge to make an informed decision here? I do not see myself or another person with more knowledge configuring OCR profiles for single processes. One possible scenario i can think of is that a user can - on the basis of the material's layout - define, that this material - although it is quiet young - requires a more sophisticated workflow. And thereby decides that the default simple workflow is overriden by a more complex, but also predefined workflow. This would mean that in the end i have two profiles: "Simple Layout" and "Complex layout". And this could probably be recorded, as Matthias suggests, in the technical metadata. Having that information stored, it is possible to attach to those "Profile" specific file with workflow instructions and specify those files in Kitodo. But it is probably not ultimately necessary since my script which is called from Kitodo already contains a lot of Business logic and could get the workflow specification (encoded in a file) from a lot of different places. I would probably agree that the approach here is generic enough since i could also use the information in the "OCR Profile" file to send specific instruction to any OCR server or process. I am not sure if it is really necessary, but nothing prevents the encoding of OCR server specific information (e.g. an Abby Server) in the file. Or one can choose not to select a profile at all because it is not necessary in the Institution's OCR setup. Kitodo stays agnostic in that regard. Although i do not see that many use cases for the given implementation, i just see it as support for an additional scenario of interacting with an OCR system. (by passing it workflows which are stored in Kitodo), which still allows for other, different ways of using an external OCR service. PS: This is probably a limited view based on my practical experiences. I am sure there are a lot of scenarios where having a lot of profiles to select from is useful. But i am wondering if a sophisticated workflow specification will ever happen inside of Kitodo and not in another (e.g. OCR-D-based) software. |
@BartChris Let me preface this by saying that everything depends on how you use Kitodo.Production and how (digitization) workflows are organized in each individual institution.
If you know what kind of material you are digitizing than you could make an informed decision about which specific OCR-workflow you want to use as early as creating the project. So you can run every process through the tailored OCR workflow and it will work out just fine. Outliers can be dealt with individually. If you use whatever default workflow is available from the start, you then have to individually re-OCR each process with a different OCR-workflow if you'll have figured out later on that there's a better workflow available. This is completely independent from OCR-D, because you could use different engines or OCR models or whatever even with any kind of script task or processing as you choose. But how do you make an informed decision about which OCR-Workflow to choose in the first place?!
This view is too limited. You can have complex layouts, simple layouts, workflows for specific material from specific time frames and even handwriting. The main idea would be to share OCR profiles and give better default workflows for a wide variety of use cases. The choice we have (OCR-D specific) is very limited and will be expanded. Additionally, the idea would be to share workflows and models to gather experience in the community. Not every complex layout is the same and it can be useful to have a more specifically trained workflow at hand. Complex layouts in what kind of complexity? Mix of text and images? Very narrow tables? Overlapping Multi-Columns? These are not the same. If someone has found a useful workflow it could be shared publicly with examples etc. Not every institution has to experiment to find the perfect workflow for common material types or eras themselves. This can be a shared effort. So one could end up with a wide variety of possible workflows, parameters or models. That's what this OCR-profile selector is for. The choice to put this in the ruleset therefore does not seem viable, because that is to heavy on manual labor in maintaining these choices and keeping the actual workflows and options in sync. Just drop a new workflow-profile in the directory and it is there to be used. In the context of OCR-D the main vision is to get rid of workflow choice completely. The project aims for auto-correcting and -optimizing workflows so that your only choice is to use it or not. But as we're not there yet, we need to have the option to choose an OCR-profile that can be used to select a corresponding OCR-workflow. And again, this could be OCR-D or any other script call that can be tuned by specifying a OCR profile that can be read and used accordingly. Hope this makes sense! |
@Erikmitk I completely agree with what you are saying. I think as well that the exchange of standards and Best practices how to approach different types of materials would be one of the most important outcomes of the whole Kitodo-OCR-D project. I would say that the most important thing in general is moving forward in integrating OCR-D or any other OCR technology with Kitodo so that we have a basis for knowledge sharing. With regard to this PR here i would like to stress, that if those knowledge sharing happens i do not really care wether those profiles are stored in Kitodo or somewhere on the server and be used by some external script. |
@BartChris thanks for your feedback! I still see one point which @Erikmitk has not addressed:
Note: we are currently preparing to change our default OCR workflow(s) to include that metadata information (eg. Tesseract model switch block). You can pass that info to the Manager from the Kitodo Script via parameters:
See database entries for default Kitodo workflow in the demo. In a custom/external Kitodo instance, you need to configure your Kitodo workflows to include an OCR-D step. That wiki link also points to documentation on all the available placeholders. |
Closed cause new implementation without the use of an explicit entity with PR #5809 |
directory.ocr.profiles
{ocrprofilefile}
placeholder{projectdmsexportpath}
placeholder