Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isolate tree internals into their own package #9662

Merged
merged 7 commits into from
Oct 21, 2024
Merged

Conversation

chencs
Copy link
Contributor

@chencs chencs commented Oct 17, 2024

What this PR does

This PR moves the MultiQueuingAlgorithmTreeQueue and implementations of QueuingAlgorithm into their own tree package. It also splits tenantQuerierAssignments into two types:

  1. TenantQuerierQueuingAlgorithm, which contains a map of queriers assigned to tenant IDs, and implements QueuingAlgorithm.
  2. tenantQuerierAssignments, which holds tenant information required to handle shuffle-sharding, and computes querier assignment across tenants. It holds a reference to TenantQuerierQueuingAlgorithm, updating the tenant-querier map when the assignments need recomputing.

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@francoposa
Copy link
Member

general draft comments:
General Structure
I am fine with splitting tenant-querier assignments. If we want to do it, this approach makes sense.
But I do not really think splitting TQA is necessary. In terms of splitting tree queue out and treating it like its own package as if it's a 3rd-party library that we have imported, it's acceptable to me that the tree queue package would only have the QueueingAlgorithm interface (and maybe a simple generic implementation like the round robin) inside the package, then the interface can be implemented by types outside the package.
I just don't love update-by-shared-pointer approaches where both TQA updates and dequeues can mutate the same tenant queuing algorithm reference, this is a code smell to be avoided if at all possible in my opinion.

I know that without splitting it, TQA has a lot going on because shuffle-sharding and querier-shutting-down type stuff has its own complexity to worry about, but I do think it still is one unified concept: here's the tenant-querier shard status, and you can use it to implement QueueingAlgorithm too. We have plenty of types in Mimir that necessarily implement multiple interfaces based on the same data/state - like BlockStoreQueryable implementing the necessary Prometheus interface then doing everything else it has to do that's specific to store-gateway querying.

More Minor Feedback

  • I think we could rename TQA to be "Tenant-Querier Sharding/Shard/Shards" - I named that class originally and I think I went too generic. "Shard" is a lot more direct way for an engineer to quickly grok what it's doing without peeking at internals. Sharding still is generic enough term that we're not going to suddenly need to change it if the the behavior evolves.
  • Nit/personal preference: If we do keep the split between TQA-shards and TQA-the-queueing-algorithm, I would want to use getters and setters to update the queueing algorithm state rather than just accessing public map members directly. Even though it's the same level of "protection", leaving those public fields hanging out there seems like half-done separation of concerns. Using setter methods also allows us to put a docstring about the intended uses, etc.

@chencs
Copy link
Contributor Author

chencs commented Oct 18, 2024

But I do not really think splitting TQA is necessary. In terms of splitting tree queue out and treating it like its own package as if it's a 3rd-party library that we have imported, it's acceptable to me that the tree queue package would only have the QueueingAlgorithm interface (and maybe a simple generic implementation like the round robin) inside the package, then the interface can be implemented by types outside the package.

Conversely, a big part of the reason I decided to split TQA and move part of it into the tree package was because of the use of tenantNodes. It felt like a weird smell to me to leave pointers to tree-specific objects outside of the tree package. Maybe I'm overindexing on that smell, though.

@chencs chencs changed the title DRAFT: Isolate tree internals into their own package. Isolate tree internals into their own package Oct 18, 2024
@chencs chencs marked this pull request as ready for review October 18, 2024 22:56
@chencs chencs requested a review from a team as a code owner October 18, 2024 22:56
Copy link
Member

@francoposa francoposa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
minor naming I would lean towards just shard or shards over sharding.
Drop some length, keeping the meaning and not using the -ing seems a lot more more common in the codebase

@chencs chencs merged commit 90558ab into main Oct 21, 2024
29 checks passed
@chencs chencs deleted the casie/tree-into-own-pkg branch October 21, 2024 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants