-
-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a Linux ThreadSanitizer job to CI #73777
Conversation
@myaaaaaaaaa since it sounds like you've been playing with TSAN, have you been able to run it on an actual project? If I just run
|
My motivation at the moment is just to improve CI's ability to test for threading bugs, to keep things at least somewhat manageable. Trying to get something like ThreadSanitizer working immediately on a codebase as large as Godot's is probably a hopeless endeavor.
This may be caused by the fact that ThreadSanitizer does not work with other sanitizers (ASan, UBSan, etc) and has to be enabled by itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great now with the suppressions.
Just added a question about the need for debug_symbols
, if it's confirmed they're needed, this can be merged.
Thanks! |
@myaaaaaaaaa We are starting to get some weird issues with this added workflow. See this and this runs, which fail on the unit tests step without actually failing any unit tests or reporting any errors from the sanitizer (there are warnings though). This seems like it's a false positive failure, and it's inconsistent (between these two there are several other runs which didn't fail). |
ThreadSanitizer is complaining about races in NavigationServer. A quick fix (depending on how tolerable the spurious CI errors are) would be: diff --git a/modules/navigation/nav_map.h b/modules/navigation/nav_map.h
index 0a82f595c7..2188df8981 100644
--- a/modules/navigation/nav_map.h
+++ b/modules/navigation/nav_map.h
@@ -98,15 +98,15 @@ class NavMap : public NavRid {
/// Physics delta time
real_t deltatime = 0.0;
/// Change the id each time the map is updated.
uint32_t map_update_id = 0;
- bool use_threads = true;
+ bool use_threads = false;
bool avoidance_use_multiple_threads = true;
bool avoidance_use_high_priority_threads = true;
// Performance Monitor
int pm_region_count = 0;
int pm_agent_count = 0;
int pm_link_count = 0; A more robust fix would be to merge the diff --git a/modules/navigation/nav_map.cpp b/modules/navigation/nav_map.cpp
index 1e55bd2314..592c4da1a9 100644
--- a/modules/navigation/nav_map.cpp
+++ b/modules/navigation/nav_map.cpp
@@ -1181,21 +1181,28 @@ void NavMap::step(real_t p_deltatime) {
bool use_parallel_for = use_threads && avoidance_use_multiple_threads;
for_range(0, active_2d_avoidance_agents.size(), use_parallel_for, SNAME("RVOAvoidanceAgents2D"), [&](const int i) {
NavAgent *agent = active_2d_avoidance_agents[i];
agent->get_rvo_agent_2d()->computeNeighbors(&rvo_simulation_2d);
agent->get_rvo_agent_2d()->computeNewVelocity(&rvo_simulation_2d);
+ });
+ for_range(0, active_2d_avoidance_agents.size(), use_parallel_for, SNAME("RVOAvoidanceAgents2D"), [&](const int i) {
+ NavAgent *agent = active_2d_avoidance_agents[i];
agent->get_rvo_agent_2d()->update(&rvo_simulation_2d);
agent->update();
});
+
for_range(0, active_3d_avoidance_agents.size(), use_parallel_for, SNAME("RVOAvoidanceAgents3D"), [&](const int i) {
NavAgent *agent = active_3d_avoidance_agents[i];
agent->get_rvo_agent_3d()->computeNeighbors(&rvo_simulation_3d);
agent->get_rvo_agent_3d()->computeNewVelocity(&rvo_simulation_3d);
+ });
+ for_range(0, active_3d_avoidance_agents.size(), use_parallel_for, SNAME("RVOAvoidanceAgents3D"), [&](const int i) {
+ NavAgent *agent = active_3d_avoidance_agents[i];
agent->get_rvo_agent_3d()->update(&rvo_simulation_3d);
agent->update();
});
}
void NavMap::dispatch_callbacks() {
for (NavAgent *agent : active_2d_avoidance_agents) { Although it is concerning that the failures are random. @Scony Do you think it's possible to write NavigationServer tests that would find this issue more consistently? |
I'm more concerned that these warnings fail the CI. If it's not an error, why does it do that? |
From the point of view of Godot CI, they are errors. My understanding is that sanitizers typically refer to their output as warnings, and reserve the word "error" for very critical errors. |
Seems like UB sanitizer reports everything as |
When threads are involved that's going to happen. I don't think we can do much from tests perspective. I'm also not sure if we should - test cases assume simple execution and they test particular things, threading is out of scope in that sense. |
Having ThreadSanitizer enabled in CI allows it to automatically catch many types of race conditions (specifically, data races, where a variable is accessed by two threads at the same time). Since Godot is moving to a fully threaded model, this is basically a requirement for stable behavior.
Some errors have been suppressed for now. See the diff for details.