Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configuration for all_text indexing #228

Merged
merged 2 commits into from
Apr 27, 2023
Merged

Conversation

kirkkwang
Copy link
Contributor

This commit adds a configuration for the all_text_* fields. In Essi, they do not use all_text_tsimv nor all_text_timv so the full text catalog search would not work without it. The default way the full text is being captured in IIIF Print is by reading the txt file that is generated through the TextExtractionDerivativeService. Essi does not use that service but instead has their own implementation of generating the alto xml. We leverage that implementation here and use a lambda to to extract full text from the alto xml.

Story

Essi does not use TextExtractionDerivativeService anymore but they would still needs all_text_t* indexed to perform full text catalog searches.

Refs notch8/essi#8

Expected Behavior Before Changes

The way all_text_* fields get their full text set was not configurable.

Expected Behavior After Changes

The way all_text_* fields get their full text set is now configurable.

This commit adds a configuration for the all_text_* fields.  In Essi,
they do not use all_text_tsimv nor all_text_timv so the full text
catalog search would not work without it.  The default way the full text
is being captured in IIIF Print is by reading the txt file that is
generated through the TextExtractionDerivativeService.  Essi does not
use that service but instead has their own implementation of generating
the alto xml.  We leverage that implementation here and use a lambda to
to extract full text from the alto xml.
Instead of chain assigning in one line, this will commit will assign `all_text` in two lines

Co-authored-by: Alisha Evans <[email protected]>
@kirkkwang kirkkwang requested a review from alishaevn April 27, 2023 16:11
@kirkkwang kirkkwang merged commit 516d2b7 into main Apr 27, 2023
@kirkkwang kirkkwang deleted the all-text-configuration branch April 27, 2023 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants