-
-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent tilemap editor hang and crash by mutex guarding tile_map_layer dirty.cell_list. #87458
Conversation
TilesEditorUtils calls tile_map->set_pattern from a thread (to refresh tile pattern preview images). set_pattern sets each of the cells into the temporary map, marking them dirty via set_cell. This can race against the tree invoking the TileMapLayer's internal_update, which proceses dirty.cell_list and eventually clears it. If a cell's shared dirty entry ends up losing the race, clear can get stuck in an infinite loop where remove refuses to change it because the entry doesn't think it's a member of the list, but the list head still points to it. This spams ERROR: Condition "p_elem->_root != this" is true. at: remove (./core/templates/self_list.h:80) And eventually leads to a crash when the error message buffer fills. Guarding the list mutations avoids the race that sets up for this failure, fixing https://www.github.com/godotengine/godot/issues/86226
@@ -2111,6 +2111,8 @@ void TileMapLayer::notify_tile_map_change(DirtyFlags p_what) { | |||
} | |||
|
|||
void TileMapLayer::internal_update() { | |||
MutexLock lock(dirty.mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lock might be too strict and cause performance problems, would need to be evaluated, it locks quite a large block of code, might be better to lock more specifically to the critical sections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It basically just prevents set_cell from running during internal_update, which seemed pretty reasonable. ~all of the _ helper functions here can touch the dirty cell list, so it seemed pretty fairly scoped. I'm open to other ideas though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of the stuff in those methods do things unrelated to it though, so unless those are also critical theras potential stuff there
Also unsure if this is the only affected list
MutexLock lock(dirty.mutex); | ||
if (!E->value.dirty_list_element.in_list()) { | ||
dirty.cell_list.add(&(E->value.dirty_list_element)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MutexLock lock(dirty.mutex); | |
if (!E->value.dirty_list_element.in_list()) { | |
dirty.cell_list.add(&(E->value.dirty_list_element)); | |
} | |
{ | |
MutexLock lock(dirty.mutex); | |
if (!E->value.dirty_list_element.in_list()) { | |
dirty.cell_list.add(&(E->value.dirty_list_element)); | |
} | |
} |
I'd at least reduce this one, the rest should be looked into
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though there is a possible race condition between this whole method and the update one, as it looks up an iterator in tile_map
above which is touched by the update, I think this whole class needs some thread safety evaluation
The class itself isn't thread safe as such necessarily, as the scene tree isn't, so the consideration is really with the editor, it might rather be an issue to solve on the editor side as this does seem to be an editor issue, unsure if it pops up elsewhere (but then it's not really a big issue as this isn't guaranteed to be thread safe)
So this might rather be something to solve on the editor interface side, and that there's some threading going on in the tilemap editor that breaks here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is threading for the TileMap
editor and specifically the pattern generation, I suspect this is what's causing this and might be the area to look into, rather than adding it to the TileMap
itself, unsure how to handle that part specifically though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this code is the problem:
godot/editor/plugins/tiles/tiles_editor_plugin.cpp
Lines 70 to 144 in 0bcc0e9
void TilesEditorUtils::_thread() { | |
pattern_thread_exited.clear(); | |
while (!pattern_thread_exit.is_set()) { | |
pattern_preview_sem.wait(); | |
pattern_preview_mutex.lock(); | |
if (pattern_preview_queue.size() == 0) { | |
pattern_preview_mutex.unlock(); | |
} else { | |
QueueItem item = pattern_preview_queue.front()->get(); | |
pattern_preview_queue.pop_front(); | |
pattern_preview_mutex.unlock(); | |
int thumbnail_size = EDITOR_GET("filesystem/file_dialog/thumbnail_size"); | |
thumbnail_size *= EDSCALE; | |
Vector2 thumbnail_size2 = Vector2(thumbnail_size, thumbnail_size); | |
if (item.pattern.is_valid() && !item.pattern->is_empty()) { | |
// Generate the pattern preview | |
SubViewport *viewport = memnew(SubViewport); | |
viewport->set_size(thumbnail_size2); | |
viewport->set_disable_input(true); | |
viewport->set_transparent_background(true); | |
viewport->set_update_mode(SubViewport::UPDATE_ONCE); | |
TileMap *tile_map = memnew(TileMap); | |
tile_map->set_tileset(item.tile_set); | |
tile_map->set_pattern(0, Vector2(), item.pattern); | |
viewport->add_child(tile_map); | |
TypedArray<Vector2i> used_cells = tile_map->get_used_cells(0); | |
Rect2 encompassing_rect; | |
encompassing_rect.set_position(tile_map->map_to_local(used_cells[0])); | |
for (int i = 0; i < used_cells.size(); i++) { | |
Vector2i cell = used_cells[i]; | |
Vector2 world_pos = tile_map->map_to_local(cell); | |
encompassing_rect.expand_to(world_pos); | |
// Texture. | |
Ref<TileSetAtlasSource> atlas_source = item.tile_set->get_source(tile_map->get_cell_source_id(0, cell)); | |
if (atlas_source.is_valid()) { | |
Vector2i coords = tile_map->get_cell_atlas_coords(0, cell); | |
int alternative = tile_map->get_cell_alternative_tile(0, cell); | |
if (atlas_source->has_tile(coords) && atlas_source->has_alternative_tile(coords, alternative)) { | |
Vector2 center = world_pos - atlas_source->get_tile_data(coords, alternative)->get_texture_origin(); | |
encompassing_rect.expand_to(center - atlas_source->get_tile_texture_region(coords).size / 2); | |
encompassing_rect.expand_to(center + atlas_source->get_tile_texture_region(coords).size / 2); | |
} | |
} | |
} | |
Vector2 scale = thumbnail_size2 / MAX(encompassing_rect.size.x, encompassing_rect.size.y); | |
tile_map->set_scale(scale); | |
tile_map->set_position(-(scale * encompassing_rect.get_center()) + thumbnail_size2 / 2); | |
// Add the viewport at the last moment to avoid rendering too early. | |
callable_mp((Node *)EditorNode::get_singleton(), &Node::add_child).call_deferred(viewport, false, Node::INTERNAL_MODE_DISABLED); | |
RS::get_singleton()->connect(SNAME("frame_pre_draw"), callable_mp(const_cast<TilesEditorUtils *>(this), &TilesEditorUtils::_preview_frame_started), Object::CONNECT_ONE_SHOT); | |
pattern_preview_done.wait(); | |
Ref<Image> image = viewport->get_texture()->get_image(); | |
// Find the index for the given pattern. TODO: optimize. | |
item.callback.call(item.pattern, ImageTexture::create_from_image(image)); | |
viewport->queue_free(); | |
} | |
} | |
} | |
pattern_thread_exited.set(); | |
} |
But unsure how to approach it exactly, might work by adding a update_internals()
after setting the pattern, or doing some other deferred operation here, will check if I can replicate the crash later today and see about this as a solution
Alternatively override a local call queue, will investigate
Opened an alternative PR that handles this by making the editor preview thread safe instead, this solves the crash and generates the previews properly We might want to look into general thread safety of this class as well, but I think this issue should be solved by making the editor management of this thread safe instead |
Superseded by #87470. Thanks for your contribution nevertheless! |
TilesEditorUtils calls tile_map->set_pattern from a thread (to refresh tile pattern preview images). set_pattern sets each of the cells into the temporary map, marking them dirty via set_cell.
This can race against the tree invoking the TileMapLayer's internal_update, which proceses dirty.cell_list and eventually clears it. If a cell's shared dirty entry ends up losing the race, clear can get stuck in an infinite loop where remove refuses to change it because the entry doesn't think it's a member of the list, but the list head still points to it. This spams
And eventually leads to a crash when the error message buffer fills.
Guarding the list mutations avoids the race that sets up for this failure, fixes #86226.
TileMap
preview #87470