Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ros2doctor test_api.py hanging during nightly linux repeated jobs. #688

Closed
nuclearsandwich opened this issue Jan 31, 2022 · 12 comments
Closed

Comments

@nuclearsandwich
Copy link
Member

Bug report

Required Info:

The following jobs hung indefinitely (upwards of 24 hours in some cases) with the last output being ros2doctor's test_api.py tests.

Starting >>> ros2doctor
07:28:05 ============================= test session starts ==============================
07:28:05 platform linux -- Python 3.8.10, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
07:28:05 cachedir: /home/jenkins-agent/workspace/nightly_linux_repeated/ws/build/ros2doctor/.pytest_cache
07:28:05 rootdir: /home/jenkins-agent/workspace/nightly_linux_repeated/ws/src/ros2/ros2cli, configfile: pytest.ini
07:28:05 plugins: launch-testing-ros-0.17.0, launch-testing-0.21.0, ament-xmllint-0.11.4, ament-copyright-0.11.4, ament-pep257-0.11.4, ament-flake8-0.11.4, ament-lint-0.11.4, mock-3.7.0, rerunfailures-10.2, cov-3.0.0, repeat-0.9.1, colcon-core-0.7.1
07:28:06 collecting ... 
07:28:06 collected 15 items / 12 deselected / 3 selected                                
07:28:06 
07:28:08 test/test_api.py . 
@Blast545
Copy link
Contributor

Blast545 commented Feb 1, 2022

It happened today as well: https://ci.ros2.org/job/nightly_linux-rhel_repeated/1024/

@clalancette
Copy link
Contributor

A wild guess is that the change from CycloneDDS -> Fast-DDS as the default RMW is causing this. If it is reproducible, I'll suggest running a CI build against a branch where ros2/rmw#315 is reverted to see if it helps.

@Blast545
Copy link
Contributor

Blast545 commented Feb 1, 2022

But that PR is for some days ago, I don't think it's related. Although I don't see any differences between the repos on the first case and the previous one.

I'll run a check to see if I can reproduce it only using ros2doctor on rhel, which seems to be the current reliable case.
Build Status

@Blast545
Copy link
Contributor

Blast545 commented Feb 1, 2022

Current hypothesis is that actually ros2action fails, leaves in a "unrecoverable state" and then ros2doctor. Running CI again, this time with ros2action as well.

Build Status

Adding launch_testing_ros: Build Status
(Not new clues with these green results)

@Blast545
Copy link
Contributor

Blast545 commented Feb 2, 2022

Another new instance I had to kill: https://ci.ros2.org/job/nightly_linux-aarch64_repeated/1858/

@xander-m2k
Copy link

xander-m2k commented Feb 3, 2022

Happened today to me as well, trying to build ROS Foxy binaries for Debian Buster, hanging for over an hour on test_cli.py of ros2topic.
I just had to wait a little longer, I'm dealing with a somewhat slow build server.

@Blast545
Copy link
Contributor

@nuclearsandwich
Copy link
Member Author

@Blast545
Copy link
Contributor

Another instance I have just aborted: https://ci.ros2.org/job/nightly_linux-aarch64_repeated/1875/

@Blast545
Copy link
Contributor

@Blast545
Copy link
Contributor

Blast545 commented Mar 4, 2022

I thought about closing this one and then I saw this in the ci_windows: https://ci.ros2.org/job/ci_windows/16645/
I am not sure it's exactly the same error, it fails earlier, but It's probably related.

EDIT: I don't think it's directly related as the attached job is testing a custom ci branch using foxy.

@clalancette
Copy link
Contributor

I don't think we've seen this one in a long time, so I'm going to go ahead and close this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants