Add configuration for all_text indexing #228

kirkkwang · 2023-04-27T03:19:43Z

This commit adds a configuration for the all_text_* fields. In Essi, they do not use all_text_tsimv nor all_text_timv so the full text catalog search would not work without it. The default way the full text is being captured in IIIF Print is by reading the txt file that is generated through the TextExtractionDerivativeService. Essi does not use that service but instead has their own implementation of generating the alto xml. We leverage that implementation here and use a lambda to to extract full text from the alto xml.

Story

Essi does not use TextExtractionDerivativeService anymore but they would still needs all_text_t* indexed to perform full text catalog searches.

Refs notch8/essi#8

Expected Behavior Before Changes

The way all_text_* fields get their full text set was not configurable.

Expected Behavior After Changes

The way all_text_* fields get their full text set is now configurable.

This commit adds a configuration for the all_text_* fields. In Essi, they do not use all_text_tsimv nor all_text_timv so the full text catalog search would not work without it. The default way the full text is being captured in IIIF Print is by reading the txt file that is generated through the TextExtractionDerivativeService. Essi does not use that service but instead has their own implementation of generating the alto xml. We leverage that implementation here and use a lambda to to extract full text from the alto xml.

app/indexers/concerns/iiif_print/file_set_indexer.rb

Instead of chain assigning in one line, this will commit will assign `all_text` in two lines Co-authored-by: Alisha Evans <[email protected]>

kirkkwang added the minor-ver label Apr 27, 2023

alishaevn suggested changes Apr 27, 2023

View reviewed changes

app/indexers/concerns/iiif_print/file_set_indexer.rb Outdated Show resolved Hide resolved

Remove chain assignment

527b869

Instead of chain assigning in one line, this will commit will assign `all_text` in two lines Co-authored-by: Alisha Evans <[email protected]>

kirkkwang requested a review from alishaevn April 27, 2023 16:11

alishaevn approved these changes Apr 27, 2023

View reviewed changes

kirkkwang merged commit 516d2b7 into main Apr 27, 2023

kirkkwang deleted the all-text-configuration branch April 27, 2023 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add configuration for all_text indexing #228

Add configuration for all_text indexing #228

kirkkwang commented Apr 27, 2023

Add configuration for all_text indexing #228

Add configuration for all_text indexing #228

Conversation

kirkkwang commented Apr 27, 2023

Story

Expected Behavior Before Changes

Expected Behavior After Changes