Add configuration for all_text indexing #228
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit adds a configuration for the all_text_* fields. In Essi, they do not use all_text_tsimv nor all_text_timv so the full text catalog search would not work without it. The default way the full text is being captured in IIIF Print is by reading the txt file that is generated through the TextExtractionDerivativeService. Essi does not use that service but instead has their own implementation of generating the alto xml. We leverage that implementation here and use a lambda to to extract full text from the alto xml.
Story
Essi does not use
TextExtractionDerivativeService
anymore but they would still needsall_text_t*
indexed to perform full text catalog searches.Refs notch8/essi#8
Expected Behavior Before Changes
The way all_text_* fields get their full text set was not configurable.
Expected Behavior After Changes
The way all_text_* fields get their full text set is now configurable.