From 3db4b304e985d7c4aab12aedbfe0e8739878dd29 Mon Sep 17 00:00:00 2001 From: Kevin Day Date: Wed, 11 Jan 2023 10:04:41 -0600 Subject: [PATCH 1/2] Issue 481: Use case insensitive filter and add case insensitive string type. Cannot a filter to the `solr.StrField`. According to the SOLR documentation, a filter can only be added to something tokenized and a `solr.StrField` does not allow tokenization. This uses a `solr.TextField` instead. Several fields need to have case insensitive searches. A new type is added that uses the `KeywordTokenizer`, called `string_ci` and `strings_ci`. The `KeywordTokenizer` essentialy is a pretend token. It tokenizes the whole string, which is effectively the same as not having a tokenizer. The documentation even references the `KeywordTokenizer` as the method of disabling the tokenizer. Fields that should be case insensitive are moved from `string` to `string_ci` and `strings` to `strings_ci` respectively. There are potential performance concerns with using `solr.TextField` rather than `solr.StrField` due to the loss of the docvalues optimization feature. see: https://solr.apache.org/guide/7_7/field-types-included-with-solr.html#field-types-included-with-solr see: https://solr.apache.org/guide/7_7/field-type-definitions-and-properties.html#field-type-definitions-and-properties see: https://solr.apache.org/guide/7_7/field-properties-by-use-case.html#field-properties-by-use-case see: https://solr.apache.org/guide/7_7/tokenizers.html#keyword-tokenizer see: https://solr.apache.org/guide/7_7/docvalues.html --- solr/config/managed-schema | 51 ++++++++++++++++++++++++-------------- 1 file changed, 33 insertions(+), 18 deletions(-) diff --git a/solr/config/managed-schema b/solr/config/managed-schema index 0d723813..22776bdb 100644 --- a/solr/config/managed-schema +++ b/solr/config/managed-schema @@ -123,14 +123,14 @@ - + - - - - + + + + @@ -142,13 +142,13 @@ - - + + - - - + + + @@ -158,13 +158,13 @@ - + - + - - + + @@ -180,11 +180,11 @@ - - - + + + - + @@ -248,6 +248,7 @@ + @@ -255,6 +256,20 @@ + + + + + + + + + + + + + + From 753b83a3880f538eb0b6eb32f91f1b9a5c937ff7 Mon Sep 17 00:00:00 2001 From: Kevin Day Date: Wed, 11 Jan 2023 14:14:14 -0600 Subject: [PATCH 2/2] Issue 481: Re-use whole_strings, rename string_ci to whole_string, fix date_created. The `strings_ci` is close enough to `whole_strings`, just use `whole_strings`. There is no `whole_string`. Rename `string_ci` to `whole_string`. To better prevent future problems, document these custom field types. The date_created is not multi-valued so use `whole_string`. --- solr/config/managed-schema | 64 ++++++++++++++++++-------------------- 1 file changed, 31 insertions(+), 33 deletions(-) diff --git a/solr/config/managed-schema b/solr/config/managed-schema index 22776bdb..96cc6d02 100644 --- a/solr/config/managed-schema +++ b/solr/config/managed-schema @@ -123,14 +123,14 @@ - + - - - - + + + + @@ -142,13 +142,13 @@ - - + + - - - + + + @@ -158,13 +158,13 @@ - + - + - - + + @@ -180,11 +180,11 @@ - - - + + + - + @@ -244,7 +244,19 @@ - + + + + + + + @@ -256,20 +268,6 @@ - - - - - - - - - - - - - -