-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unclear error message when using custom "synonym" with Analyze API #23943
Comments
We discussed this as part of FixItFriday, we agreed that the error message is odd. Also, we were wondering why an index is required and whether we can remove that requirement. I could reproduce against 5.3.0. |
Removing the index requirement would be a big win! |
I wonder if we should just special case it here too.. it's really a cyclic depdendency and I think this could fix is without too much trouble. I agree it's a bit of a hack: diff --git a/core/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java b/core/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java
index d7e299b1cf..37a4bf1788 100644
--- a/core/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java
+++ b/core/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java
@@ -50,6 +50,8 @@ import org.elasticsearch.index.analysis.CharFilterFactory;
import org.elasticsearch.index.analysis.CustomAnalyzer;
import org.elasticsearch.index.analysis.IndexAnalyzers;
import org.elasticsearch.index.analysis.NamedAnalyzer;
+import org.elasticsearch.index.analysis.SynonymGraphTokenFilterFactory;
+import org.elasticsearch.index.analysis.SynonymTokenFilterFactory;
import org.elasticsearch.index.analysis.TokenFilterFactory;
import org.elasticsearch.index.analysis.TokenizerFactory;
import org.elasticsearch.index.mapper.AllFieldMapper;
@@ -505,7 +507,7 @@ public class TransportAnalyzeAction extends TransportSingleShardAction<AnalyzeRe
return charFilterFactories;
}
- private static TokenFilterFactory[] getTokenFilterFactories(AnalyzeRequest request, IndexSettings indexSettings, AnalysisRegistry analysisRegistry,
+ private static TokenFilterFactory[] getTokenFilterFactories(AnalyzeRequest request, IndexSettings indexSettings, final AnalysisRegistry analysisRegistry,
Environment environment, TokenFilterFactory[] tokenFilterFactories) throws IOException {
if (request.tokenFilters() != null && request.tokenFilters().size() > 0) {
tokenFilterFactories = new TokenFilterFactory[request.tokenFilters().size()];
@@ -521,7 +523,16 @@ public class TransportAnalyzeAction extends TransportSingleShardAction<AnalyzeRe
AnalysisModule.AnalysisProvider<TokenFilterFactory> tokenFilterFactoryFactory =
analysisRegistry.getTokenFilterProvider(filterTypeName);
if (tokenFilterFactoryFactory == null) {
- throw new IllegalArgumentException("failed to find global token filter under [" + filterTypeName + "]");
+ switch(filterTypeName) {
+ case "synonym":
+ tokenFilterFactoryFactory = (is, env, name, s) -> new SynonymTokenFilterFactory(is, env, analysisRegistry, name, s);
+ break;
+ case "synonym_graph":
+ tokenFilterFactoryFactory = (is, env, name, s) -> new SynonymGraphTokenFilterFactory(is, env, analysisRegistry, name, s);
+ break;
+ default:
+ throw new IllegalArgumentException("failed to find global token filter under [" + filterTypeName + "]");
+ }
}
// Need to set anonymous "name" of tokenfilter
tokenFilterFactories[i] = tokenFilterFactoryFactory.get(getNaIndexSettings(settings), environment, "_anonymous_tokenfilter_[" + i + "]", settings);
diff --git a/core/src/main/java/org/elasticsearch/indices/analysis/AnalysisModule.java b/core/src/main/java/org/elasticsearch/indices/analysis/AnalysisModule.java
index 61950942e6..ba1df40a7e 100644
--- a/core/src/main/java/org/elasticsearch/indices/analysis/AnalysisModule.java
+++ b/core/src/main/java/org/elasticsearch/indices/analysis/AnalysisModule.java
@@ -128,6 +128,8 @@ import org.elasticsearch.index.analysis.StemmerTokenFilterFactory;
import org.elasticsearch.index.analysis.StopAnalyzerProvider;
import org.elasticsearch.index.analysis.StopTokenFilterFactory;
import org.elasticsearch.index.analysis.SwedishAnalyzerProvider;
+import org.elasticsearch.index.analysis.SynonymGraphTokenFilterFactory;
+import org.elasticsearch.index.analysis.SynonymTokenFilterFactory;
import org.elasticsearch.index.analysis.ThaiAnalyzerProvider;
import org.elasticsearch.index.analysis.ThaiTokenizerFactory;
import org.elasticsearch.index.analysis.TokenFilterFactory;
diff --git a/core/src/test/java/org/elasticsearch/action/admin/indices/TransportAnalyzeActionTests.java b/core/src/test/java/org/elasticsearch/action/admin/indices/TransportAnalyzeActionTests.java
index bcd7bba8d3..3423eb5329 100644
--- a/core/src/test/java/org/elasticsearch/action/admin/indices/TransportAnalyzeActionTests.java
+++ b/core/src/test/java/org/elasticsearch/action/admin/indices/TransportAnalyzeActionTests.java
@@ -35,6 +35,8 @@ import org.elasticsearch.test.ESTestCase;
import org.elasticsearch.test.IndexSettingsModule;
import java.io.IOException;
+import java.util.Arrays;
+import java.util.HashMap;
import java.util.List;
import static java.util.Collections.emptyList;
@@ -108,6 +110,26 @@ public class TransportAnalyzeActionTests extends ESTestCase {
assertEquals("ck", tokens.get(3).getTerm());
assertEquals("brown", tokens.get(4).getTerm());
assertEquals("fox", tokens.get(5).getTerm());
+
+
+ request = new AnalyzeRequest();
+ request.analyzer(null);
+ request.tokenizer("standard");
+ request.addCharFilter("html_strip");
+ HashMap<String, Object> filter = new HashMap<>();
+ filter.put("type", "synonym");
+ filter.put("synonyms", Arrays.asList("test,testing"));
+ request.addTokenFilter(filter);
+ request.text("<p>this is a test</p>");
+ analyze = TransportAnalyzeAction.analyze(request, AllFieldMapper.NAME, null, randomBoolean() ? indexAnalyzers : null,
+ registry, environment);
+ tokens = analyze.getTokens();
+ assertEquals(5, tokens.size());
+ assertEquals("this", tokens.get(0).getTerm());
+ assertEquals("is", tokens.get(1).getTerm());
+ assertEquals("a", tokens.get(2).getTerm());
+ assertEquals("test", tokens.get(3).getTerm());
+ assertEquals("testing", tokens.get(4).getTerm());
}
public void testFillsAttributes() throws IOException {
@@ -190,6 +212,7 @@ public class TransportAnalyzeActionTests extends ESTestCase {
assertEquals("hay", tokens.get(1).getTerm());
}
+
public void testGetIndexAnalyserWithoutIndexAnalyzers() throws IOException {
IllegalArgumentException e = expectThrows(IllegalArgumentException.class,
() -> TransportAnalyzeAction.analyze( |
+1 synonym-filter should work with the _analyze endpoint |
@javanna Any news on this? |
cc @elastic/es-search-aggs |
This ought to work properly in elasticsearch 7.0, after #34034 |
Confirmed that #34034 fixes this, so I'm closing. |
According to
synonym
filter comment:Because of this limitation, if you try to use it like the following (which is a valid syntax since the Analyze API docs mentions it but with
stop
filter instead):You will get the following error:
failed to find global token filter under [synonym]
This is misleading since it is the same error that you get if you use an unknown token filter name and, for newbies, specially the ones that just learned about the Analyze API, it can be confusing since
synonym
should be a valid token filter name and the docs does not mentions this particular limitation ofsynonym
token filter.Moreover, IMO
synonym
should be supported by/_analyze
API anyway, if possible. Since it only works in the context of an index, the need to create a new test index every time you change the synonym configuration can be a time consuming task compared to simply using the/_analyze
API.The text was updated successfully, but these errors were encountered: