Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No visualization on Ubuntu 20 #1

Closed
setchin opened this issue Aug 21, 2022 · 16 comments
Closed

No visualization on Ubuntu 20 #1

setchin opened this issue Aug 21, 2022 · 16 comments

Comments

@setchin
Copy link

setchin commented Aug 21, 2022

Hello! Thanks a lot for your great work.
I tried installing this project on Ubuntu 20.0.4 with ros noetic and met no problems with building the code. However when I tried to play rosbag, it failed to display proper visualization of the map built (yet robot movement could be seen). I tried on an old computer with Ubuntu 18 and ros melodic, and it worked out pretty good. I did see that you have mentioned that test passed on Ubuntu 16 and 18, and I am wondering why it doesn't work on Ubuntu 20?

@JINXER000
Copy link
Owner

Sorry for the late reply. Sorry I haven't tried on 20 yet. We will try soon and see if a similar problem occurs.
Btw, is there any error reported? Did you try 'rostopic echo /glb_edt_map' and see if there is data?

@setchin
Copy link
Author

setchin commented Sep 28, 2022

It has been found out that a problem of visualization not the code itself is the case. In ROS noetic, it seems that rviz cannot parse frame id "/map" correctly, change it to "map" simply and the problem is solved.
Do something like this:
change _occ_pnt_cld->header.frame_id = "/map";
to _occ_pnt_cld->header.frame_id = "map";

@JINXER000
Copy link
Owner

Thanks a lot for your detailed explanation! I will add it to Readme in the next update.

@Hao-Starrr
Copy link

I encontered the same problem. Same with ubuntu 20.04 Noetic.
As @setchin menstioned, I change _occ_pnt_cld->header.frame_id = "/map"; in src/volumetric_mapper.cpp file. (I also changed similar lines, for example _edt_pnt_cld->header.frame_id.) And then run catkin_make. It seems problem still exists.
Then I tried UGV-corridor and UAV-3DLiDAR datasets respectively. The first has no visualization. The second has only the wave of lidar but no mapping.
When the bag is playing, I tried rostopic echo /glb_edt_map but got

WARNING: no messages received and simulated time is active.
Is /clock being published?

So I am wondering whether I changed the right file.

@setchin
Copy link
Author

setchin commented Oct 1, 2022

I encontered the same problem. Same with ubuntu 20.04 Noetic.
As @setchin menstioned, I change _occ_pnt_cld->header.frame_id = "/map"; in src/volumetric_mapper.cpp file. (I also changed similar lines, for example _edt_pnt_cld->header.frame_id.) And then run catkin_make. It seems problem still exists.
Then I tried UGV-corridor and UAV-3DLiDAR datasets respectively. The first has no visualization. The second has only the wave of lidar but no mapping.
When the bag is playing, I tried rostopic echo /glb_edt_map but got

WARNING: no messages received and simulated time is active.
Is /clock being published?

So I am wondering whether I changed the right file.

I suppose I did exactly what you have done, changing all the /map to map, no other amendment is made. I only tested uav-depth and uav-2dlidar and both work out fine.

@setchin
Copy link
Author

setchin commented Oct 1, 2022

I encontered the same problem. Same with ubuntu 20.04 Noetic.
As @setchin menstioned, I change _occ_pnt_cld->header.frame_id = "/map"; in src/volumetric_mapper.cpp file. (I also changed similar lines, for example _edt_pnt_cld->header.frame_id.) And then run catkin_make. It seems problem still exists.
Then I tried UGV-corridor and UAV-3DLiDAR datasets respectively. The first has no visualization. The second has only the wave of lidar but no mapping.
When the bag is playing, I tried rostopic echo /glb_edt_map but got

WARNING: no messages received and simulated time is active.
Is /clock being published?

So I am wondering whether I changed the right file.

Did you add --clock parameter when you play rosbag?

@Hao-Starrr
Copy link

Yes I run exactly rosbag play ugv-cut-filter.bag --clock __name:=profile_bag to play the bag.

Also, I have some new findings. When I launch the nodes by running roslaunch GIE ugv_dataset.launch, I always receive an error like this:

the provided PTX was compiled with an unsupported toolchain

That is caused by the mismatch between the GPU driver and toolbox versions. So I updated the driver. I did everything again, including compiling cuTT, then this error disappeared. But it still has no visualization.

I notice there is still a warning when I launch the nodes, which may cause the problem.

Local Map initialized
*** stack smashing detected ***: terminated
[GIE_mapping-2] process has died [pid 5723, exit code -6, cmd /home/haostarrr/GIE_ws/devel/lib/GIE/GIE_node __name:=GIE_mapping __log:=/home/haostarrr/.ros/log/29ea4ed6-41b4-11ed-9213-c3ff5c59be87/GIE_mapping-2.log].
log file: /home/haostarrr/.ros/log/29ea4ed6-41b4-11ed-9213-c3ff5c59be87/GIE_mapping-2*.log

I am not sure how to take a deeper look on this error. Do you have any suggestions?

@Hao-Starrr
Copy link

I encontered the same problem. Same with ubuntu 20.04 Noetic.
As @setchin menstioned, I change _occ_pnt_cld->header.frame_id = "/map"; in src/volumetric_mapper.cpp file. (I also changed similar lines, for example _edt_pnt_cld->header.frame_id.) And then run catkin_make. It seems problem still exists.
Then I tried UGV-corridor and UAV-3DLiDAR datasets respectively. The first has no visualization. The second has only the wave of lidar but no mapping.
When the bag is playing, I tried rostopic echo /glb_edt_map but got

WARNING: no messages received and simulated time is active.
Is /clock being published?

So I am wondering whether I changed the right file.

I suppose I did exactly what you have done, changing all the /map to map, no other amendment is made. I only tested uav-depth and uav-2dlidar and both work out fine.

Also, I tried uav-depth and uav-2dlidar. Same problem.
Only axes are moving on the screen, with no mapping.
If --clock is removed from the command, even the axes disappear.

@setchin
Copy link
Author

setchin commented Oct 2, 2022

In this case, I believe you are having a problem with the code, not visualization. Very likely the problem with GPU. You may try removing CUDA, cleaning envs related to CUDAand resinstall a stable version (11.3 in my case) and compile again. If you are using RTX 30XX. always use CUDA>=11.

@JINXER000
Copy link
Owner

Yes I run exactly rosbag play ugv-cut-filter.bag --clock __name:=profile_bag to play the bag.

Also, I have some new findings. When I launch the nodes by running roslaunch GIE ugv_dataset.launch, I always receive an error like this:

the provided PTX was compiled with an unsupported toolchain

That is caused by the mismatch between the GPU driver and toolbox versions. So I updated the driver. I did everything again, including compiling cuTT, then this error disappeared. But it still has no visualization.

I notice there is still a warning when I launch the nodes, which may cause the problem.

Local Map initialized
*** stack smashing detected ***: terminated
[GIE_mapping-2] process has died [pid 5723, exit code -6, cmd /home/haostarrr/GIE_ws/devel/lib/GIE/GIE_node __name:=GIE_mapping __log:=/home/haostarrr/.ros/log/29ea4ed6-41b4-11ed-9213-c3ff5c59be87/GIE_mapping-2.log].
log file: /home/haostarrr/.ros/log/29ea4ed6-41b4-11ed-9213-c3ff5c59be87/GIE_mapping-2*.log

I am not sure how to take a deeper look on this error. Do you have any suggestions?

Hi Hao, it seems to be a problem with the software. May I ask if this error (stack smashed) occurs before or after you play the rosbag?

@JINXER000 JINXER000 reopened this Oct 3, 2022
@Hao-Starrr
Copy link

Yes I run exactly rosbag play ugv-cut-filter.bag --clock __name:=profile_bag to play the bag.
Also, I have some new findings. When I launch the nodes by running roslaunch GIE ugv_dataset.launch, I always receive an error like this:

the provided PTX was compiled with an unsupported toolchain

That is caused by the mismatch between the GPU driver and toolbox versions. So I updated the driver. I did everything again, including compiling cuTT, then this error disappeared. But it still has no visualization.
I notice there is still a warning when I launch the nodes, which may cause the problem.

Local Map initialized
*** stack smashing detected ***: terminated
[GIE_mapping-2] process has died [pid 5723, exit code -6, cmd /home/haostarrr/GIE_ws/devel/lib/GIE/GIE_node __name:=GIE_mapping __log:=/home/haostarrr/.ros/log/29ea4ed6-41b4-11ed-9213-c3ff5c59be87/GIE_mapping-2.log].
log file: /home/haostarrr/.ros/log/29ea4ed6-41b4-11ed-9213-c3ff5c59be87/GIE_mapping-2*.log

I am not sure how to take a deeper look on this error. Do you have any suggestions?

Hi Hao, it seems to be a problem with the software. May I ask if this error (stack smashed) occurs before or after you play the rosbag?

Before playing the bag, after launching the nodes.

@JINXER000
Copy link
Owner

JINXER000 commented Oct 5, 2022

Yes I run exactly rosbag play ugv-cut-filter.bag --clock __name:=profile_bag to play the bag.
Also, I have some new findings. When I launch the nodes by running roslaunch GIE ugv_dataset.launch, I always receive an error like this:

the provided PTX was compiled with an unsupported toolchain

That is caused by the mismatch between the GPU driver and toolbox versions. So I updated the driver. I did everything again, including compiling cuTT, then this error disappeared. But it still has no visualization.
I notice there is still a warning when I launch the nodes, which may cause the problem.

Local Map initialized
*** stack smashing detected ***: terminated
[GIE_mapping-2] process has died [pid 5723, exit code -6, cmd /home/haostarrr/GIE_ws/devel/lib/GIE/GIE_node __name:=GIE_mapping __log:=/home/haostarrr/.ros/log/29ea4ed6-41b4-11ed-9213-c3ff5c59be87/GIE_mapping-2.log].
log file: /home/haostarrr/.ros/log/29ea4ed6-41b4-11ed-9213-c3ff5c59be87/GIE_mapping-2*.log

I am not sure how to take a deeper look on this error. Do you have any suggestions?

Hi Hao, it seems to be a problem with the software. May I ask if this error (stack smashed) occurs before or after you play the rosbag?

Before playing the bag, after launching the nodes.

I guess it is because of the development environment... Can you locate which sentence causes the trouble?

@Hao-Starrr
Copy link

Yes I run exactly rosbag play ugv-cut-filter.bag --clock __name:=profile_bag to play the bag.
Also, I have some new findings. When I launch the nodes by running roslaunch GIE ugv_dataset.launch, I always receive an error like this:

the provided PTX was compiled with an unsupported toolchain

That is caused by the mismatch between the GPU driver and toolbox versions. So I updated the driver. I did everything again, including compiling cuTT, then this error disappeared. But it still has no visualization.
I notice there is still a warning when I launch the nodes, which may cause the problem.

Local Map initialized
*** stack smashing detected ***: terminated
[GIE_mapping-2] process has died [pid 5723, exit code -6, cmd /home/haostarrr/GIE_ws/devel/lib/GIE/GIE_node __name:=GIE_mapping __log:=/home/haostarrr/.ros/log/29ea4ed6-41b4-11ed-9213-c3ff5c59be87/GIE_mapping-2.log].
log file: /home/haostarrr/.ros/log/29ea4ed6-41b4-11ed-9213-c3ff5c59be87/GIE_mapping-2*.log

I am not sure how to take a deeper look on this error. Do you have any suggestions?

Hi Hao, it seems to be a problem with the software. May I ask if this error (stack smashed) occurs before or after you play the rosbag?

Before playing the bag, after launching the nodes.

I guess it is because of the development environment... Can you locate which sentence causes the trouble?

Well... Do you have any idea on how to locate that?

I am also trying to set the CUDA environment again, but it may take some time.

@Hao-Starrr
Copy link

Problem solved yesterday. Here is the record of process and what happened. It make be helpful who suffer from the same problem. Getting detailed ways to do so should refer to blogs on google.

First when installing unbuntu 20.04, the driver of GPU is already been installed to the computer, but not the newest version. So it seems we only need to install the cuda toolkits. Actually this usually causes error.

There are two ways to do so. The one is installing cuda toolkit from ubuntu repository by simply run the command sudo apt install nvidia-cuda-toolkit. But it cannot install the newest version. Until 2022 October, it can only install 10.1 version. The other way is visiting the website of Nvidia and download a .run file at https://developer.nvidia.com/cuda-11-7-0-download-archive. It provides the newest version. So the second way is recommended.

There are 2 different things may cause the final errors, so do neither of them.

1 Installing from ubuntu repository and official website at the same time. Please only use the second one. Remove the nvidia-cuda-toolkit if it has been installed from command line.

2 Install the drive and toolkits separately. It will probably cause the incompatibility for the driver and toolkits. Please install them together, by simply follow the instruction of that .run file downloaded from the website. If the driver has been installed before (like most of ubuntu 20.04), there will be an error. So the driver needs to be removed first. Unfortunately, some processes may occupy the current driver, for example, Nouveau, making it unsuccessful to remove the current driver. Terminate or disable them first.

The reason why driver and toolkits should be installed together is, there is a requirement for the versions compatibility.

The version of driver should be newer than the version of cuda toolkits.

The version of cuda toolkits should be newer than the computation ability of real GPU.

For example, driver version 515.43 can support toolkits version 11.7. If toolkits version is 10.1, it is ok. If driver version is 435.43, not ok.

Note that nvidia-smi will gives current version of driver, and the version of toolkits that it MAX SUPPORTS (very misleading message here!). nvcc -V can gives the information on current toolkits.

Currently, my version is driver 515.43.04, toolkits 11.7, GPU 1660Ti. The system is ubuntu 20.04, ROS noetic. It succeeds.

@Hao-Starrr
Copy link

Report a solved bug again:

When I run the GIE several days later, the vitalization disappear again.

This time there is error message.

CUDA error at: /home/haostarrr/GIE_ws/src/GIE-mapping/include/map_structure/local_batch.h : 64
no CUDA-capable device is detected cudaMalloc(&_ray_count,_map_volume*sizeof(int))
[GIE_mapping-2] process has died [pid 8702, exit code 1, cmd /home/haostarrr/GIE_ws/devel/lib/GIE/GIE_node __name:=GIE_mapping __log:=/home/haostarrr/.ros/log/f2703b9c-5413-11ed-aa7e-73707b48acb5/GIE_mapping-2.log].
log file: /home/haostarrr/.ros/log/f2703b9c-5413-11ed-aa7e-73707b48acb5/GIE_mapping-2*.log
[ WARN] [1666668508.639461318, 1629189857.970285238]: Detected jump back in time of 257.106s. Clearing TF buffer.

So i ran nvidia-smi to see the driver.

Luckily I found:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. 
Make sure that the latest NVIDIA driver is installed and running.

See the CUDA by nvcc -V:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

So the problem is on the driver.

See the driver version:

ls /usr/src | grep nvidia

I got

nvidia-515.43.04

Then I ran

sudo apt-get install dkms
sudo dkms install -m nvidia -v 515.43.04

Ran nvidia-smi to see the driver again:

Tue Oct 25 11:49:44 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.43.04    Driver Version: 515.43.04    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   55C    P0    30W /  N/A |      0MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Great!!

The problem is probably caused be the linux kernel. When the linux update, the driver cannot adapt to it.

Doing what I did can fix it.

@JINXER000
Copy link
Owner

The issue is solved. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants