Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid hitting max clause limit with the default value of default_field #102378

Closed
felixbarny opened this issue Nov 20, 2023 · 3 comments
Closed
Labels
:Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@felixbarny
Copy link
Member

felixbarny commented Nov 20, 2023

When not specifying a value for index.query.default_field, Elasticsearch uses *

public static final Setting<List<String>> DEFAULT_FIELD_SETTING = Setting.stringListSetting(
"index.query.default_field",
Collections.singletonList("*"),

which makes it resolve all searchable fields that have values in the index. When it resolves too many fields, an error is thrown to guard against excessive memory usage.

static void checkForTooManyFields(int numberOfFields, @Nullable String inputPattern) {
int limit = IndexSearcher.getMaxClauseCount();
if (numberOfFields > limit) {
StringBuilder errorMsg = new StringBuilder("field expansion ");
if (inputPattern != null) {
errorMsg.append("for [" + inputPattern + "] ");
}
errorMsg.append("matches too many fields, limit: " + limit + ", got: " + numberOfFields);
throw new IllegalArgumentException(errorMsg.toString());
}
}

While the limits for the max number of boolean clauses is relatively generous (at least 1024, the exact value depends on the allocated CPU and memory count), there's still a risk of hitting this limit when increasingindex.mapping.total_fields.limit from the default value of 1000.

long maxClauseCount = (heapInMb * 64 / threadPoolSize);

At the moment, Fleet sets the total_fields.limit to 10k by default: https://github.com/elastic/kibana/blob/6775419c0bd6606fa1bbe9ea7cf512dcd6e82093/x-pack/plugins/fleet/server/services/epm/elasticsearch/template/install.ts#L374

The way integrations guard against hitting max clause limit is that they auto-generate the default_field based on the fields that the package declares: https://github.com/elastic/kibana/blob/6775419c0bd6606fa1bbe9ea7cf512dcd6e82093/x-pack/plugins/fleet/server/services/epm/elasticsearch/template/default_settings.ts#L82. As discussed in #99872, we want to remove this and instead rely on the default default_field in order to be able to use the ecs@mappings component template rather than the fields that the package defines. This will also make it possible for users to search by fields that are dynamically mapped today (such as labels).

Once #96235 is merged, we may want to lower the defaults in Fleet. However, we may not be able to do this for all integrations and users can always manually increase the field limit.

All of this is to say that I think we should guard against running into the max clause limit so that using the default value for default_field never runs the risk of causing a search to fail.

That seems to be relatively straightforward by limiting the number of fields we resolve here:

Map<String, Float> fieldMap = resolveMappingField(

For the all field (*)

boolean allField = Regex.isMatchAllPattern(fieldEntry.getKey());

We can retain the logic of failing when a more specific set of fields is selected (such as labels.*). What's most important is that we don't fail in the default case.

cc @ruflin @javanna

@felixbarny felixbarny added the :Search/Search Search-related issues that do not fall into other categories label Nov 20, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Nov 20, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@felixbarny
Copy link
Member Author

To facilitate the discussion, I've created a POC for this: #102385.

It's totally fine by me to discard that POC if we feel like this isn't going in the right direction.

@felixbarny
Copy link
Member Author

felixbarny commented Feb 7, 2024

As discussed in #102385, we'll take the approach leveraging the new ignore_dynamic_beyond_limit index setting and lowering the default number of fields in integrations so that we don't risk to hit the max clause limit. Therefore, no ES changes are required. I'll close this as not planned accordingly. We can re-open this issue if we see issues with that approach.

@felixbarny felixbarny closed this as not planned Won't fix, can't repro, duplicate, stale Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
2 participants