Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] Fix failing ros2param tests #632

Closed
wants to merge 3 commits into from
Closed

Conversation

hidmic
Copy link
Contributor

@hidmic hidmic commented Apr 23, 2021

CI up to ros2param and ros2cli:

  • Linux Build Status
  • Linux repeated Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@hidmic
Copy link
Contributor Author

hidmic commented Apr 23, 2021

Well, it's finally passing. I have yet to explain why this FIN_WAIT_2 state timeout made all the difference though. FYI @ivanpauno.

@hidmic
Copy link
Contributor Author

hidmic commented Apr 23, 2021

Alright, Ivan's still able to reproduce the failure. It appears it just made it less likely, so this won't do either.

@clalancette @cottsay what shall we do for Galactic? We have:

  • the original approach with SO_REUSEADDR, which caused flakes in ros2cli; or
  • Ensure only one daemon can run at a time #622 approach with SO_LINGER, which causes flakes in ros2param; or
  • an hypothetical third approach using file locks, which we haven't tried yet.

Do we keep trying? Do we stick to a known issue and defer resolution?

@cottsay
Copy link
Member

cottsay commented Apr 23, 2021

So the behavior that ros2cli will exhibit in Galactic is a flaky test, correct? How likely is it that an end-user would experience the bug "in the wild"?

@hidmic
Copy link
Contributor Author

hidmic commented Apr 23, 2021

So the behavior that ros2cli will exhibit in Galactic is a flaky test, correct? How likely is it that an end-user would experience the bug "in the wild"?

That depends on the implementation. With SO_REUSEADDR, you'd need two ros2 daemon start racing with each other. With SO_LINGER, I suspect you need to run into a simultaneous connection close situation (though that's mostly a guess, we don't really know why ros2param flakes happen).

@clalancette
Copy link
Contributor

Do we keep trying? Do we stick to a known issue and defer resolution?

I would say that we should still try to debug this for Galactic, but as a relatively low priority. So far it seems unlikely that a user will hit this in real-life, so I think it is worthwhile to debug the problem and figure out what is really going on before we make further changes.

@hidmic
Copy link
Contributor Author

hidmic commented Jul 12, 2021

Closing in favor of #652.

@hidmic hidmic closed this Jul 12, 2021
@hidmic hidmic deleted the hidmic/ros2param-fix-poc branch July 12, 2021 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants