-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tablet server fault between file rename and GC candidate addition waste disk space #3811
Comments
In the elasticity branch it may be possible to eventually address this problem by moving compaction commit into FATE. See #3559. If its a fate operation, then do not need to worry about failure between rename and addition of the candidate because it will rerun in that case. |
keith-turner
added a commit
to keith-turner/accumulo
that referenced
this issue
Dec 21, 2023
The custom refresh tracking code was removed and compaction commit was moved to being a FATE operation with the following 4 steps. 1. Rename file done in RenameCompactionFile class 2. Update the metadata table via a conditional mutation done in CommitCompaction class 3. Write the gc candidates done in PutGcCandidates class 4. Optionally send a RPC refresh request if the tablet was hosted done in RefreshTablet class There is some follow on work that still needs to be done to improve how this change works with detecting dead compactions. After that is done these changes should address problems outlined apache#3811 and apache#3802 that were related to process death before adding GC candidates. Now that GC candidates are written in FATE, if it dies it will run again later. This is currently storing the compaction commit FATE operations in zookeeper. This would not be suitable for a cluster because per tablet information should never be stored in zookeeper. However its fine as a temporary situation in the elasticity branch until FATE storage is availabe in an accumulo table, see apache#4049 and apache#3559 WIP WIP WIP
keith-turner
added a commit
that referenced
this issue
Jan 13, 2024
The custom refresh tracking code was removed and compaction commit was moved to being a FATE operation with the following 4 steps. 1. Rename file done in RenameCompactionFile class 2. Update the metadata table via a conditional mutation done in CommitCompaction class 3. Write the gc candidates done in PutGcCandidates class 4. Optionally send a RPC refresh request if the tablet was hosted done in RefreshTablet class There is some follow on work that still needs to be done to improve how this change works with detecting dead compactions. After that is done these changes should address problems outlined #3811 and #3802 that were related to process death before adding GC candidates. Now that GC candidates are written in FATE, if it dies it will run again later. This is currently storing the compaction commit FATE operations in zookeeper. This would not be suitable for a cluster because per tablet information should never be stored in zookeeper. However its fine as a temporary situation in the elasticity branch until FATE storage is availabe in an accumulo table, see #4049 and #3559
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
If a tablet server process dies or throws an exception between the following lines
then a non temp file with no GC candidate entry exists and it will not be cleaned up by the GC process.
Expected behavior
unreferenced files are always cleaned up.
The text was updated successfully, but these errors were encountered: