Cluster Stats API Slows down Considerably for Larger Clusters #79563

original-brownbear · 2021-10-20T10:52:09Z

The cluster stats endpoint eventually becomes very slow when working with a large cluster (large index counts are what matters here). E.g. in benchmarking and real world issues (see below linked issue), it can be O(10s) of coordinating node work alone.

As the size of a cluster increases both the node level actions (scales with the number of shards per node) become slower (translog stats etc. are costly to compute for a large number of shards) but also the coordinating node work, that among other things involves deserializing+decompressing all mappings, slows down considerably as could be seen in e.g. #62753.

The coordinating node work can probably be sped up massively by exploiting mapping duplication. The data node slowness is less of a concern since that can be fixed by scaling to more nodes I'd say but there might be possible speed-ups there as well.

elasticmachine · 2021-10-20T10:52:12Z

Pinging @elastic/es-distributed (Team:Distributed)

elasticmachine · 2021-10-20T10:52:12Z

Pinging @elastic/es-data-management (Team:Data Management)

Some trivial fixes to the mapping stats performance: No need to parse the map out of the mapping source twice (given that parsing the map is often most of the runtime of this method this gives a significant speedup). Also, n o need to look up from the map in a hot loop, just using the entry-set is a lot faster (especially considering we're working with a treemap here). relates elastic#79563

Some trivial fixes to the mapping stats performance: No need to parse the map out of the mapping source twice (given that parsing the map is often most of the runtime of this method this gives a significant speedup). Also, no need to look up from the map in a hot loop, just using the entry-set is a lot faster (especially considering we're working with a linked hash map here). relates #79563

Some trivial fixes to the mapping stats performance: No need to parse the map out of the mapping source twice (given that parsing the map is often most of the runtime of this method this gives a significant speedup). Also, no need to look up from the map in a hot loop, just using the entry-set is a lot faster (especially considering we're working with a linked hash map here). relates elastic#79563

Some trivial fixes to the mapping stats performance: No need to parse the map out of the mapping source twice (given that parsing the map is often most of the runtime of this method this gives a significant speedup). Also, no need to look up from the map in a hot loop, just using the entry-set is a lot faster (especially considering we're working with a linked hash map here). relates #79563

DaveCTurner · 2022-08-12T08:50:20Z

I think #82830 fixes the coordinating node work here by exploiting mapping deduplication so I'm removing the distrib team label.

original-brownbear added >bug :Data Management/Stats Statistics tracking and retrieval APIs :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Oct 20, 2021

elasticmachine added Team:Data Management Meta label for data/management team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. labels Oct 20, 2021

This was referenced Oct 20, 2021

Speed up MappingStats #79576

Merged

Fix Large Shard Count Scalability Issues #77466

Open

original-brownbear mentioned this issue Oct 21, 2021

Speed up MappingStats (#79576) #79612

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster Stats API Slows down Considerably for Larger Clusters #79563

Cluster Stats API Slows down Considerably for Larger Clusters #79563

original-brownbear commented Oct 20, 2021 •

edited

Loading

elasticmachine commented Oct 20, 2021

elasticmachine commented Oct 20, 2021

DaveCTurner commented Aug 12, 2022

Cluster Stats API Slows down Considerably for Larger Clusters #79563

Cluster Stats API Slows down Considerably for Larger Clusters #79563

Comments

original-brownbear commented Oct 20, 2021 • edited Loading

elasticmachine commented Oct 20, 2021

elasticmachine commented Oct 20, 2021

DaveCTurner commented Aug 12, 2022

original-brownbear commented Oct 20, 2021 •

edited

Loading