Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bad indexing performance in elasticsearch 8.2.0 #87036

Closed
AlexanderOtt85 opened this issue May 23, 2022 · 3 comments · Fixed by #87123
Closed

bad indexing performance in elasticsearch 8.2.0 #87036

AlexanderOtt85 opened this issue May 23, 2022 · 3 comments · Fixed by #87123
Labels
>bug :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@AlexanderOtt85
Copy link

Elasticsearch Version

8.2.0

Installed Plugins

No response

Java Version

bundled

OS Version

docker-image = docker.elastic.co/elasticsearch/elasticsearch:8.2.0

Problem Description

We are trying to migrate from Elasticsearch 6.7.1 to Elasticsearch 8.2.0.

With the same amount of documents per bulk request (100 docs per request), we always get a socket timeout in Elasticsearch 8.2.0. Everything works fine in Elasticsearch 6.7.1.

I noticed that indexing the same document in Elasticsearch 8.2.0 takes 44152ms. In Elasticsearch 6.7.1, indexing takes 50ms.

Steps to Reproduce

elasticsearch_8.2.0.txt
elasticsearch_6.7.1.txt

Logs (if relevant)

No response

@AlexanderOtt85 AlexanderOtt85 added >bug needs:triage Requires assignment of a team area label labels May 23, 2022
@ywelsch
Copy link
Contributor

ywelsch commented May 23, 2022

I've run the steps and was able to reproduce the issue. The document takes multiple seconds to index on 8.x while it takes fractions of a second to index on 6.x. There looks to be a regression here, hot_threads have not been very helpful pinpointing it though.

I've minimized the example that shows the issue (only the "du": { "bc": [ ... ] part of the document seems to be relevant, and adding more items to that array looks to lead to blow-up of indexing time). I suspect it has to do something with these weird copy_to clauses.

Can you provide more context on why there is so much cross-copying?

@ywelsch ywelsch added the :Search Foundations/Mapping Index mappings, including merging and defining field types label May 23, 2022
@elasticmachine elasticmachine added the Team:Search Meta label for search team label May 23, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@ywelsch ywelsch removed the needs:triage Requires assignment of a team area label label May 23, 2022
@AlexanderOtt85
Copy link
Author

Regarding to our much cross-copying, we have alot of different searchfields on different layers of our search tree in our application. Therefore we do much cross-copying. Up until Elasticsearch 6.7.1, this always worked without any problems.

What may also be interesting in this context is the index size for the index request which i provided previously

  • In Elasticsearch 6.7.1 the size is about 137.7kb
  • In Elasticsearch 8.2.0 the size is about 18.5mb

romseygeek added a commit that referenced this issue Jun 8, 2022

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
We changed how copy_to is implemented in #79922, which moved
the handling of dots in field names into a specialised parser. Unfortunately,
while doing this we added a bug whereby every time a copy_to directive
is processed for a nested field, the nested field's include_in_parent logic
would be run, meaning that the parent would end up with multiple copies
of the nested child's fields.

This commit fixes this by only running include_in_parent when the parser
is not in a copy_to context. It also fixes another bug that meant the parent
document would contain multiple copies of the ID field.

Fixes #87036
romseygeek added a commit to romseygeek/elasticsearch that referenced this issue Jun 8, 2022
We changed how copy_to is implemented in elastic#79922, which moved
the handling of dots in field names into a specialised parser. Unfortunately,
while doing this we added a bug whereby every time a copy_to directive
is processed for a nested field, the nested field's include_in_parent logic
would be run, meaning that the parent would end up with multiple copies
of the nested child's fields.

This commit fixes this by only running include_in_parent when the parser
is not in a copy_to context. It also fixes another bug that meant the parent
document would contain multiple copies of the ID field.

Fixes elastic#87036
elasticsearchmachine pushed a commit that referenced this issue Jun 8, 2022
We changed how copy_to is implemented in #79922, which moved
the handling of dots in field names into a specialised parser. Unfortunately,
while doing this we added a bug whereby every time a copy_to directive
is processed for a nested field, the nested field's include_in_parent logic
would be run, meaning that the parent would end up with multiple copies
of the nested child's fields.

This commit fixes this by only running include_in_parent when the parser
is not in a copy_to context. It also fixes another bug that meant the parent
document would contain multiple copies of the ID field.

Fixes #87036
@javanna javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants