Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails when running RGB based PPO baseline #519

Closed
rodcar opened this issue Aug 22, 2024 · 3 comments · Fixed by #521
Closed

Fails when running RGB based PPO baseline #519

rodcar opened this issue Aug 22, 2024 · 3 comments · Fixed by #521

Comments

@rodcar
Copy link

rodcar commented Aug 22, 2024

Hello, I got RuntimeError: values expected sparse tensor layout but got Strided when running the RGB based PPO baseline. I ran it on Colab.

It fails with the 3.0.0b8 version, but it works with the previous one 3.0.0b7.

2024-08-22 07:04:19.211807: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-08-22 07:04:19.229796: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-22 07:04:19.251816: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-22 07:04:19.258633: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-22 07:04:19.274371: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-22 07:04:20.431472: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/usr/local/lib/python3.10/dist-packages/tyro/_fields.py:330: UserWarning: The field wandb_entity is annotated with type <class 'str'>, but the default value None has type <class 'NoneType'>. We'll try to handle this gracefully, but it may cause unexpected behavior.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/tyro/_fields.py:330: UserWarning: The field checkpoint is annotated with type <class 'str'>, but the default value None has type <class 'NoneType'>. We'll try to handle this gracefully, but it may cause unexpected behavior.
  warnings.warn(
Downloading PhysX GPU library to /root/.sapien/physx/105.1-physx-5.3.1.patch0 from Github. This can take several minutes. If it fails to download, please manually download fhttps://github.com/sapien-sim/physx-precompiled/releases/download/105.1-physx-5.3.1.patch0/linux-so.zip and unzip at /root/.sapien/physx/105.1-physx-5.3.1.patch0.
Download complete.
Saving eval videos to runs/rgb-pushcube/videos
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.max_episode_steps to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.max_episode_steps` for environment variables or `env.get_wrapper_attr('max_episode_steps')` that will search the reminding wrappers.
  logger.warn(
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.single_observation_space to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.single_observation_space` for environment variables or `env.get_wrapper_attr('single_observation_space')` that will search the reminding wrappers.
  logger.warn(
/usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.single_action_space to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.single_action_space` for environment variables or `env.get_wrapper_attr('single_action_space')` that will search the reminding wrappers.
  logger.warn(
Running training
####
args.num_iterations=97 args.num_envs=1024 args.num_eval_envs=8
args.minibatch_size=1600 args.batch_size=51200 args.update_epochs=8
####
Epoch: 1, global_step=0
Evaluating
Traceback (most recent call last):
  File "/content/ppo_rgb.py", line 387, in <module>
    eval_obs, eval_rew, eval_terminations, eval_truncations, eval_infos = eval_envs.step(agent.get_action(eval_obs, deterministic=True))
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/vector/wrappers/gymnasium.py", line 89, in step
    obs, rew, terminations, truncations, infos = self._env.step(actions)
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/utils/wrappers/record.py", line 428, in step
    self.render_images.append(self.capture_image())
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/utils/wrappers/record.py", line 321, in capture_image
    img = self.env.render()
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 471, in render
    return self.env.render()
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 471, in render
    return self.env.render()
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/wrappers/order_enforcing.py", line 70, in render
    return self.env.render(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/envs/sapien_env.py", line 1224, in render
    return self.render_all()
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/envs/sapien_env.py", line 1196, in render_all
    for img in image.values():
RuntimeError: values expected sparse tensor layout but got Strided
Exception ignored in: <function VectorEnv.__del__ at 0x796977a5c0d0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/vector/vector_env.py", line 330, in __del__
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/vector/wrappers/gymnasium.py", line 120, in close
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/utils/wrappers/record.py", line 777, in close
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 475, in close
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 475, in close
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 475, in close
  [Previous line repeated 1 more time]
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/envs/sapien_env.py", line 1043, in close
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/envs/sapien_env.py", line 1040, in _clear
TypeError: 'NoneType' object is not callable
Exception ignored in: <function VectorEnv.__del__ at 0x796977a5c0d0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/vector/vector_env.py", line 330, in __del__
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/vector/wrappers/gymnasium.py", line 120, in close
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 475, in close
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 475, in close
  File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 475, in close
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/envs/sapien_env.py", line 1043, in close
  File "/usr/local/lib/python3.10/dist-packages/mani_skill/envs/sapien_env.py", line 1040, in _clear
TypeError: 'NoneType' object is not callable

Code:

!mkdir -p /usr/share/vulkan/icd.d
!wget -q https://raw.githubusercontent.com/rodcar/qmul_ai_dissertation/main/config/nvidia_icd.json
!wget -q https://raw.githubusercontent.com/rodcar/qmul_ai_dissertation/main/config/10_nvidia.json
!mv nvidia_icd.json /usr/share/vulkan/icd.d
!mv 10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json
!apt-get install -y --no-install-recommends libvulkan-dev

# it fails with this specific version
!pip install mani-skill==3.0.0b8

# but it works with the previous one
#!pip install mani-skill==3.0.0b7

!pip install --upgrade tyro
!wget https://raw.githubusercontent.com/haosulab/ManiSkill/main/examples/baselines/ppo/ppo_rgb.py -O ppo_rgb.py
# parameters
env_id = "PushCube-v1"
num_envs = 1024
update_epochs = 8
num_minibatches = 32
total_timesteps = 5_000_000
eval_freq = 8
num_steps = 20
seed = 2024
model_name = "ppo_rgb"
exp_name = "rgb-pushcube"
!python {model_name}.py --seed={seed} --env_id={env_id} --exp-name={exp_name} --num_envs={num_envs} --update_epochs={update_epochs} --num_minibatches={num_minibatches} --total_timesteps={total_timesteps} --eval_freq={eval_freq} --no_partial_reset --reconfiguration_freq=1 --reward_scale=1
@StoneT2000
Copy link
Member

Found the bug, I'll push a fix

@StoneT2000 StoneT2000 reopened this Aug 22, 2024
@StoneT2000
Copy link
Member

Try installing the new maniskill version v3.0.0b9 now, should work

@rodcar
Copy link
Author

rodcar commented Aug 22, 2024

It's working now, thank you!

@rodcar rodcar closed this as completed Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants