-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix forgotten parts in cleanup thread #37014
Conversation
} | ||
|
||
|
||
void MergeTreeData::clearPartsFromFilesystemImpl(const DataPartsVector & parts_to_remove, bool throw_on_error, NameSet * part_names_successed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more elegant to pass std::vector<char> was_part_removed
or something ...
You may avoid adding mutex (but it is also ok).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't want to do it at this moment.
finally_remove_parts = parts_to_delete_only_from_filesystem; | ||
} | ||
|
||
removePartsFinally(finally_remove_parts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, the whole code of clearOldPartsAndRemoveFromZK
looks over-complicated and not very solid to me.
Generally, after grabOldParts
we can get a set of parts with Deleting
which we must delete or roll back to Outdated
. But it is not clear if what we do in between is exception safe. Like, here we assume that rollbackDeletingParts
and removePartsFinally
don't throw (which is optimistic cause e.g. the last one writes to part log).
Maybe we can always keep a parts
vector (as a result of grabOldParts
), and also has a vector with statuses (Deleting, DeletedFromZk, DeletedFromFS, Outdated, something else) so that in the final try/catch we can (more or less) reliably roll back statuses?
BTW, grabOldParts
is complicated as well, and it is not (very) clear if we can get exception inside and get half-state where some parts were if fact marked as deleting. However, at least a loop with modifyPartState
is the last one:
ClickHouse/src/Storages/MergeTree/MergeTreeData.cpp
Lines 1647 to 1651 in 27fd255
for (const auto & it_to_delete : parts_to_delete) | |
{ | |
res.emplace_back(*it_to_delete); | |
modifyPartState(it_to_delete, DataPartState::Deleting); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in TG and decide to current version. After that we can apply refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is better then it was. Also it fixes an issue, so I think we can merge it as is, and probably improve later.
@tavplubix could you please double-check?
…ClickHouse into better_delete_outdated_parts
No more packets available: #37068 |
Keeper failed to start -- will investigate separately. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix bug which can lead to forgotten outdated parts in MergeTree table engines family in case of filesystem failures during parts removal. Before fix they will be removed only after first server restart.