Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move metadata operations for compactions from coordinator to compactors #4978

Open
keith-turner opened this issue Oct 12, 2024 · 3 comments
Open
Labels
enhancement This issue describes a new feature, improvement, or optimization.
Milestone

Comments

@keith-turner
Copy link
Contributor

keith-turner commented Oct 12, 2024

Is your feature request related to a problem? Please describe.

Currently in the accumulo 4.0-SNAPSHOT branch a compactor process will walk through the following process to execute a compaction.

  1. Request a compaction job from the compaction coordinator, with the coordinator doing the following
    1. Get a job from an in memory priority queue
    2. Generate the next filename and create the directory if needed
    3. Use conditional mutations to reserve the files for compaction in the metadata table
    4. Return the job to the compactor
  2. Run the compaction
  3. Report compaction completion to the coordinator which does the following
    1. Create a fate operation to commit the compaction (because its a multistep process)
    2. Another thread in the manager will eventually run this fate operation

When compactors do this, many compactors talk to a single manager and then the single manager talks to many tservers to execute the reservation and commit.

Describe the solution you'd like

The reservation and compaction commit could move from the coordinator to the compactor process. This would result in many compactors talking to many tablet servers with the compactor doing the following.

  1. Request a compaction job from the compaction coordinator, with the coordinator doing the following
    1. Get a job from an in memory priority queue
    2. Generate the next filename and create the directory if needed (these operation work on in memory data in the manager and would be best done there)
    3. Return the job to the compactor, output file name, and current steady time
  2. Use conditional mutations to reserve the files for compaction in the metadata table
  3. Run the compaction
  4. Create and execute the fate operation to commit the compaction in the compactor process. This is possible because of the changes in Fate reservations moved out of memory #4524. For this to work well, would need to presplit the fate table to allow multiple tablet servers to host it.

These changes would work well with #4664. The amount of work the manager is doing on behalf of a compactor would be greatly reduced making the async handling much simpler. The async operation would usually be only working with in memory data in the manager, making them really quick. So async+in memory processing should theoretically scale to a large number of concurrent connections from compactors. Also if the compactor immediately runs the fate operation to commit a compaction it could reduce the overall latency of a small compaction.

One risk with this change is if a large number of compactors compact tablets that use the same metadata tablet it could create a lot of load on that metadata tablet. In some ways the manager is currently limiting the amount of concurrent compaction commits and reservation operations (it may limit them more than desired). This should not be a problem for the fate table as its keyed on a uuid, so many compactors accessing it would always evenly spread across tablets.

Describe alternatives you've considered

There was discussion of running multiple managers with each executing fate operations, #4524 was done as prerequisite for this. The uses cases for this were compaction commit and system initiated split operation. If this change were made it would remove one of those use cases, leaving system splits as something that may still need that functionality. However a single manager process may be able to work through all of the split operations that realistically happen.

If this changes were implemented and a lot of compactors executing fate compaction commits were to die, then the manager would eventually pick those fate operations up and execute them. This would increase latency or have similar latency to the current way things work.

@keith-turner keith-turner added the enhancement This issue describes a new feature, improvement, or optimization. label Oct 12, 2024
@keith-turner keith-turner added this to the 4.0.0 milestone Oct 12, 2024
@keith-turner
Copy link
Contributor Author

keith-turner commented Oct 15, 2024

The goal of this proposed change is to reduce the overall latency of compactions. It would be good to have metrics that help understand what this latency is in a running system like #4980. Would be nice to be able to compute a ratio like (compaction_run_time)/(compaction_admin_overhead+compaction_run_time). The compaction_admin_overhead is reserving and commiting compactions and any other time not related to running the actual compaction. The closer this ratio is to 1 the more efficient the system is, the further it is from one the less efficient and that could indicate a bug or transient runtime issue that needs investigation. I am going to look into adding metrics to compute this ratio or see if existing metrics could be used. The metrics work should probably precede this work and be used to inform any potential work on this issue.

@keith-turner
Copy link
Contributor Author

I suspect this approach would be more workable it tablet servers used async processing. Then they could gracefully handle a large number of compactor processes submitting conditional mutations for compaction reservation and compaction commit. But that may not be needed given that the current thrift thread pool just grows as needed which kinda works but is not ideal.

@keith-turner
Copy link
Contributor Author

keith-turner commented Dec 14, 2024

The changes in #5184 make this change safer to make. Arsync server side RPC processing plus #5184 would be even better though for lots of compactor processes interacting directly with tservers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This issue describes a new feature, improvement, or optimization.
Projects
None yet
Development

No branches or pull requests

1 participant