Fails when running RGB based PPO baseline #519

rodcar · 2024-08-22T07:28:33Z

Hello, I got RuntimeError: values expected sparse tensor layout but got Strided when running the RGB based PPO baseline. I ran it on Colab.

It fails with the 3.0.0b8 version, but it works with the previous one 3.0.0b7.

2024-08-22 07:04:19.211807: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-08-22 07:04:19.229796: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-22 07:04:19.251816: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-22 07:04:19.258633: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-22 07:04:19.274371: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-22 07:04:20.431472: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/usr/local/lib/python3.10/dist-packages/tyro/_fields.py:330: UserWarning: The field wandb_entity is annotated with type <class 'str'>, but the default value None has type <class 'NoneType'>. We'll try to handle this gracefully, but it may cause unexpected behavior.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/tyro/_fields.py:330: UserWarning: The field checkpoint is annotated with type <class 'str'>, but the default value None has type <class 'NoneType'>. We'll try to handle this gracefully, but it may cause unexpected behavior.
  warnings.warn(
Downloading PhysX GPU library to /root/.sapien/physx/105.1-physx-5.3.1.patch0 from Github. This can take several minutes. If it fails to download, please manually download fhttps://github.com/sapien-sim/physx-precompiled/releases/download/105.1-physx-5.3.1.patch0/linux-so.zip and unzip at /root/.sapien/physx/105.1-physx-5.3.1.patch0.
Download complete.
Saving eval videos to runs/rgb-pushcube/videos
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.max_episode_steps to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.max_episode_steps` for environment variables or `env.get_wrapper_attr('max_episode_steps')` that will search the reminding wrappers.
  logger.warn(
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.single_observation_space to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.single_observation_space` for environment variables or `env.get_wrapper_attr('single_observation_space')` that will search the reminding wrappers.
  logger.warn(
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.single_action_space to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.single_action_space` for environment variables or `env.get_wrapper_attr('single_action_space')` that will search the reminding wrappers.
  logger.warn(
Running training
####
args.num_iterations=97 args.num_envs=1024 args.num_eval_envs=8
args.minibatch_size=1600 args.batch_size=51200 args.update_epochs=8
####
Epoch: 1, global_step=0
Evaluating
Traceback (most recent call last):
  File "/content/ppo_rgb.py", line 387, in <module>
    eval_obs, eval_rew, eval_terminations, eval_truncations, eval_infos = eval_envs.step(agent.get_action(eval_obs, deterministic=True))
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/vector/wrappers/gymnasium.py", line 89, in step
    obs, rew, terminations, truncations, infos = self._env.step(actions)
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/utils/wrappers/record.py", line 428, in step
    self.render_images.append(self.capture_image())
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/utils/wrappers/record.py", line 321, in capture_image
    img = self.env.render()
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 471, in render
    return self.env.render()
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 471, in render
    return self.env.render()
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/wrappers/order_enforcing.py", line 70, in render
    return self.env.render(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/envs/sapien_env.py", line 1224, in render
    return self.render_all()
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/envs/sapien_env.py", line 1196, in render_all
    for img in image.values():
RuntimeError: values expected sparse tensor layout but got Strided
Exception ignored in: <function VectorEnv.__del__ at 0x796977a5c0d0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/vector/vector_env.py", line 330, in __del__
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/vector/wrappers/gymnasium.py", line 120, in close
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/utils/wrappers/record.py", line 777, in close
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 475, in close
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 475, in close
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 475, in close
  [Previous line repeated 1 more time]
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/envs/sapien_env.py", line 1043, in close
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/envs/sapien_env.py", line 1040, in _clear
TypeError: 'NoneType' object is not callable
Exception ignored in: <function VectorEnv.__del__ at 0x796977a5c0d0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/vector/vector_env.py", line 330, in __del__
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/vector/wrappers/gymnasium.py", line 120, in close
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 475, in close
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 475, in close
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 475, in close
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/envs/sapien_env.py", line 1043, in close
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/envs/sapien_env.py", line 1040, in _clear
TypeError: 'NoneType' object is not callable

Code:

!mkdir -p /usr/share/vulkan/icd.d
!wget -q https://raw.githubusercontent.com/rodcar/qmul_ai_dissertation/main/config/nvidia_icd.json
!wget -q https://raw.githubusercontent.com/rodcar/qmul_ai_dissertation/main/config/10_nvidia.json
!mv nvidia_icd.json /usr/share/vulkan/icd.d
!mv 10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json
!apt-get install -y --no-install-recommends libvulkan-dev

# it fails with this specific version
!pip install mani-skill==3.0.0b8

# but it works with the previous one
#!pip install mani-skill==3.0.0b7

!pip install --upgrade tyro

!wget https://raw.githubusercontent.com/haosulab/ManiSkill/main/examples/baselines/ppo/ppo_rgb.py -O ppo_rgb.py

# parameters
env_id = "PushCube-v1"
num_envs = 1024
update_epochs = 8
num_minibatches = 32
total_timesteps = 5_000_000
eval_freq = 8
num_steps = 20
seed = 2024
model_name = "ppo_rgb"
exp_name = "rgb-pushcube"

!python {model_name}.py --seed={seed} --env_id={env_id} --exp-name={exp_name} --num_envs={num_envs} --update_epochs={update_epochs} --num_minibatches={num_minibatches} --total_timesteps={total_timesteps} --eval_freq={eval_freq} --no_partial_reset --reconfiguration_freq=1 --reward_scale=1

The text was updated successfully, but these errors were encountered:

StoneT2000 · 2024-08-22T09:41:15Z

Found the bug, I'll push a fix

StoneT2000 · 2024-08-22T09:49:37Z

Try installing the new maniskill version v3.0.0b9 now, should work

rodcar · 2024-08-22T10:03:01Z

It's working now, thank you!

StoneT2000 mentioned this issue Aug 22, 2024

[BugFix] Fix render all mode #521

Merged

StoneT2000 closed this as completed in #521 Aug 22, 2024

StoneT2000 reopened this Aug 22, 2024

rodcar closed this as completed Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fails when running RGB based PPO baseline #519

Fails when running RGB based PPO baseline #519

rodcar commented Aug 22, 2024

StoneT2000 commented Aug 22, 2024

StoneT2000 commented Aug 22, 2024

rodcar commented Aug 22, 2024

Fails when running RGB based PPO baseline #519

Fails when running RGB based PPO baseline #519

Comments

rodcar commented Aug 22, 2024

StoneT2000 commented Aug 22, 2024

StoneT2000 commented Aug 22, 2024

rodcar commented Aug 22, 2024