Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] New shader config system and refactors #499

Merged
merged 22 commits into from
Aug 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/source/contributing/tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ class PushCube(BaseEnv):
@property
def _default_sim_config(self):
return SimConfig(
gpu_memory_cfg=GPUMemoryConfig(
gpu_memory_config=GPUMemoryConfig(
found_lost_pairs_capacity=2**25, max_rigid_patch_count=2**18
)
)
Expand All @@ -83,15 +83,15 @@ class RotateSingleObjectInHand(BaseEnv):
@property
def _default_sim_config(self):
return SimConfig(
gpu_memory_cfg=GPUMemoryConfig(
gpu_memory_config=GPUMemoryConfig(
max_rigid_contact_count=self.num_envs * max(1024, self.num_envs) * 8,
max_rigid_patch_count=self.num_envs * max(1024, self.num_envs) * 2,
found_lost_pairs_capacity=2**26,
)
)
```

For GPU simulation tuning, there are generally two considerations, memory and speed. It is recommended to set `gpu_memory_cfg` in such a way so that no errors are outputted when simulating as many as `4096` parallel environments with state observations on a single GPU.
For GPU simulation tuning, there are generally two considerations, memory and speed. It is recommended to set `gpu_memory_config` in such a way so that no errors are outputted when simulating as many as `4096` parallel environments with state observations on a single GPU.

A simple way to test is to run the GPU sim benchmarking script on your already registered environment and check if any errors are reported

Expand Down Expand Up @@ -126,5 +126,5 @@ Examples of task cards are found throughout the [task documentation](../tasks/in
When contributing the task, make sure you do the following:

- The task code itself should have a reasonable unique name and be placed in `mani_skill/envs/tasks`.
- Added a demo video of the task being solved successfully (for each variation if there are several) to `figures/environment_demos`. The video should have ray-tracing on so it looks nicer! This can be done by replaying a trajectory with `shader_dir="rt"` passed into `gym.make` when making the environment.
- Added a demo video of the task being solved successfully (for each variation if there are several) to `figures/environment_demos`. The video should have ray-tracing on so it looks nicer! This can be done by replaying a trajectory with `human_render_camera_configs=dict(shader_pack="rt")` passed into `gym.make` when making the environment.
- Added a task card to `docs/source/tasks/index.md`.
2 changes: 1 addition & 1 deletion docs/source/user_guide/concepts/gpu_simulation.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ManiSkill leverages [PhysX](https://github.com/NVIDIA-Omniverse/PhysX) to perfor

With GPU parallelization, the concept is that one can simulate a task thousands of times at once per GPU. In ManiSkill/SAPIEN this is realized by effectively putting all actors and articulations <span style="color:#F1A430">**into the same physx scene**</span> and give each task it's own small workspace in the physx scene known as a <span style="color:#0086E7">**sub-scene**</span>.

The idea of sub-scenes is that reading data of e.g. actor poses is automatically pre-processed to be relative to the center of the sub-scene and not the physx scene. The diagram below shows how 64 sub-scenes may be organized. Note that each sub-scene's distance to each other is defined by the simulation configuration `sim_cfg.spacing` value which can be set when building your own task.
The idea of sub-scenes is that reading data of e.g. actor poses is automatically pre-processed to be relative to the center of the sub-scene and not the physx scene. The diagram below shows how 64 sub-scenes may be organized. Note that each sub-scene's distance to each other is defined by the simulation configuration `sim_config.spacing` value which can be set when building your own task.

:::{figure} images/physx_scene_subscene_relationship.png
:::
Expand Down
17 changes: 9 additions & 8 deletions docs/source/user_guide/concepts/observation.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,20 @@
# Observation

<!-- See our [colab tutorial](https://colab.research.google.com/github/haosulab/ManiSkill/blob/main/examples/tutorials/customize_environments.ipynb#scrollTo=NaSQ7CD2sswC) for how to customize cameras. -->

## Observation mode

**The observation mode defines the observation space.**
All ManiSkill tasks take the observation mode (`obs_mode`) as one of the input arguments of `__init__`.
In general, the observation is organized as a dictionary (with an observation space of `gym.spaces.Dict`).

There are two raw observations modes: `state_dict` (privileged states) and `sensor_data` (raw sensor data like visual data without postprocessing). `state` is a flat version of `state_dict`. `rgbd` and `pointcloud` apply post-processing on `sensor_data` to give convenient representations of visual data.
There are two raw observations modes: `state_dict` (privileged states) and `sensor_data` (raw sensor data like visual data without postprocessing). `state` is a flat version of `state_dict`. `rgb+depth`, `rgb+depth+segmentation` (or any combination of `rgb`, `depth`, `segmentation`), and `pointcloud` apply post-processing on `sensor_data` to give convenient representations of visual data.

The details here show the unbatched shapes. In general there is always a batch dimension unless you are using CPU simulation. Moreover, we annotate what dtype some values are, where some have both a torch and numpy dtype depending on whether you are using GPU or CPU simulation repspectively.

### state_dict

The observation is a dictionary of states. It usually contains privileged information such as object poses. It is not supported for soft-body tasks.

- `agent`: robot proprioception
- `agent`: robot proprioception (return value of a task's `_get_obs_agent` function)
- `qpos`: [nq], current joint positions. *nq* is the degree of freedom.
- `qvel`: [nq], current joint velocities
<!-- - `base_pose`: [7], robot position (xyz) and quaternion (wxyz) in the world frame -->
Expand All @@ -29,7 +27,7 @@ It is a flat version of *state_dict*. The observation space is `gym.spaces.Box`.

### sensor_data

In addition to `agent` and `extra`, `sensor_data` and `sensor_param` are introduced.
In addition to `agent` and `extra`, `sensor_data` and `sensor_param` are introduced. At the moment there are only Camera type sensors. Cameras are special in that they can be run with different choices of shaders. The default shader is called `minimal` which is the fastest and most memory efficient option. The shader chosen determines what data is stored in this observation mode. We describe the raw data format for the `minimal` shader here. Detailed information on how sensors/cameras can be customized can be found in the [sensors](../tutorials/sensors/index.md) section.

- `sensor_data`: data captured by sensors configured in the environment
- `{sensor_uid}`:
Expand All @@ -46,7 +44,7 @@ In addition to `agent` and `extra`, `sensor_data` and `sensor_param` are introdu
- `extrinsic_cv`: [4, 4], camera extrinsic (OpenCV convention)
- `intrinsic_cv`: [3, 3], camera intrinsic (OpenCV convention)

### rgbd
### rgb+depth+segmentation

This observation mode has the same data format as the [sensor_data mode](#sensor_data), but all sensor data from cameras are replaced with the following structure

Expand All @@ -58,9 +56,10 @@ This observation mode has the same data format as the [sensor_data mode](#sensor
- `depth`: [H, W, 1], `torch.int16, np.uint16`. The unit is millimeters. 0 stands for an invalid pixel (beyond the camera far).
- `segmentation`: [H, W, 1], `torch.int16, np.uint16`. See the [Segmentation data section](#segmentation-data) for more details.

Otherwise keep the same data without any additional processing as in the sensor_data mode
Note that this data is not scaled/normalized to [0, 1] or [-1, 1] in order to conserve memory, so if you consider to train on RGB, depth, and/or segmentation data be sure to scale your data before training on it.


Note that this data is not scaled/normalized to [0, 1] or [-1, 1] in order to conserve memory, so if you consider to train on RGBD data be sure to scale your data before training on it.
ManiSkill by default flexibly supports different combinations of RGB, depth, and segmentation data, namely `rgb`, `depth`, `segmentation`, `rgb+depth`, `rgb+depth+segmentation`, `rgb+segmentation`, and`depth+segmentation`. (`rgbd` is a short hand for `rgb+depth`). Whichever image modality that is not chosen will not be included in the observation and conserves some memory and GPU bandwith.

The RGB and depth data visualized can look like below:
```{image} images/replica_cad_rgbd.png
Expand All @@ -69,6 +68,8 @@ alt: RGBD from two cameras of Fetch robot inside the ReplicaCAD dataset scene
---
```



### pointcloud
This observation mode has the same data format as the [sensor_data mode](#sensor_data), but all sensor data from cameras are removed and instead a new key is added called `pointcloud`.

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions docs/source/user_guide/getting_started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ For the full documentation of options you can provide for gym.make see the [docs

## GPU Parallelized/Vectorized Tasks

ManiSkill is powered by SAPIEN which supports GPU parallelized physics simulation and GPU parallelized rendering. This enables achieving 200,000+ state-based simulation FPS and 10,000+ FPS with rendering on a single 4090 GPU on a e.g. manipulation tasks. The FPS can be higher or lower depending on what is simulated. For full benchmarking results see [this page](../additional_resources/performance_benchmarking)
ManiSkill is powered by SAPIEN which supports GPU parallelized physics simulation and GPU parallelized rendering. This enables achieving 200,000+ state-based simulation FPS and 30,000+ FPS with rendering on a single 4090 GPU on a e.g. manipulation tasks. The FPS can be higher or lower depending on what is simulated. For full benchmarking results see [this page](../additional_resources/performance_benchmarking)

In order to run massively parallelized tasks on a GPU, it is as simple as adding the `num_envs` argument to `gym.make` as so

Expand Down Expand Up @@ -137,7 +137,7 @@ which will look something like this

### Parallel Rendering in one Scene

We further support via recording or GUI to view all parallel environments at once, and you can also turn on ray-tracing for more photo-realism. Note that this feature is not useful for any practical purposes (for e.g. machine learning) apart from generating cool demonstration videos and so it is not well optimized.
We further support via recording or GUI to view all parallel environments at once, and you can also turn on ray-tracing for more photo-realism. Note that this feature is not useful for any practical purposes (for e.g. machine learning) apart from generating cool demonstration videos.

Turning the parallel GUI render on simply requires adding the argument `parallel_in_single_scene` to `gym.make` as so

Expand All @@ -151,7 +151,7 @@ env = gym.make(
control_mode="pd_joint_delta_pos",
num_envs=16,
parallel_in_single_scene=True,
shader_dir="rt-fast" # optionally set this argument for more photo-realistic rendering
viewer_camera_configs=dict(shader_pack="rt-fast"),
)
```

Expand All @@ -170,7 +170,7 @@ We currently do not properly support exposing multiple visible CUDA devices to a

Each ManiSkill task supports different **observation modes** and **control modes**, which determine its **observation space** and **action space**. They can be specified by `gym.make(env_id, obs_mode=..., control_mode=...)`.

The common observation modes are `state`, `rgbd`, `pointcloud`. We also support `state_dict` (states organized as a hierarchical dictionary) and `sensor_data` (raw visual observations without postprocessing). Please refer to [Observation](../concepts/observation.md) for more details.
The common observation modes are `state`, `rgbd`, `pointcloud`. We also support `state_dict` (states organized as a hierarchical dictionary) and `sensor_data` (raw visual observations without postprocessing). Please refer to [Observation](../concepts/observation.md) for more details. Furthermore, visual data generated by the simulator can be modified in many ways via shaders. Please refer to [the sensors/cameras tutorial](../tutorials/sensors/index.md) for more details.

We support a wide range of controllers. Different controllers can have different effects on your algorithms. Thus, it is recommended to understand the action space you are going to use. Please refer to [Controllers](../concepts/controllers.md) for more details.

Expand Down
1 change: 1 addition & 0 deletions docs/source/user_guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ datasets/index
data_collection/index
reinforcement_learning/index
learning_from_demos/index
wrappers/index
```

```{toctree}
Expand Down
16 changes: 16 additions & 0 deletions docs/source/user_guide/reinforcement_learning/setup.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Setup

This page documents key things to know when setting up ManiSkill environments for reinforcement learning, including:

- How to convert ManiSkill environments to gymnasium API compatible environments, both [single](#gym-environment-api) and [vectorized](#gym-vectorized-environment-api) APIs.
- [Useful Wrappers](#useful-wrappers)

ManiSkill environments are created by gymnasium's `make` function. The result is by default a "batched" environment where every input and output is batched. Note that this is not standard gymnasium API. If you want the standard gymnasium environemnt / vectorized environment API see the next sections.

```python
Expand Down Expand Up @@ -56,3 +61,14 @@ You may also notice that there are two additional options when creating a vector

Note that for efficiency, everything returned by the environment will be a batched torch tensor on the GPU and not a batched numpy array on the CPU. This the only difference you may need to account for between ManiSkill vectorized environments and gymnasium vectorized environments.

## Useful Wrappers

RL practitioners often use wrappers to modify and augment environments. These are documented in the [wrappers](../wrappers/index.md) section. Some commonly used ones include:
- [RecordEpisode](../wrappers/record.md) for recording videos/trajectories of rollouts.
- [FlattenRGBDObservations](../wrappers/flatten.md#flatten-rgbd-observations) for flattening the `obs_mode="rgbd"` or `obs_mode="rgb+depth"` observations into a simple dictionary with just a combined `rgbd` tensor and `state` tensor.

## Common Mistakes / Gotchas

In old environments/benchmarks, people often have used `env.render(mode="rgb_array")` or `env.render()` to get image inputs for RL agents. This is not correct because image observations are returned by `env.reset()` and `env.step()` directly and `env.render` is just for visualization/video recording only in ManiSkill.

For robotics tasks observations often are composed of state information (like robot joint angles) and image observations (like camera images). All tasks in ManiSkill will specifically remove certain priviliged state information from the observations when the `obs_mode` is not `state` or `state_dict` like ground truth object poses. Moreover, the image observations returned by `env.reset()` and `env.step()` are usually from cameras that are positioned in specific locations to provide a good view of the task to make it solvable.
12 changes: 6 additions & 6 deletions docs/source/user_guide/tutorials/custom_tasks/advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ In the drop down below is a copy of all the configurations possible
:::{dropdown} All sim configs
:icon: code

```
```python
@dataclass
class GPUMemoryConfig:
"""A gpu memory configuration dataclass that neatly holds all parameters that configure physx GPU memory for simulation"""
Expand Down Expand Up @@ -232,16 +232,16 @@ class DefaultMaterialsConfig:

@dataclass
class SimConfig:
spacing: int = 5
spacing: float = 5
"""Controls the spacing between parallel environments when simulating on GPU in meters. Increase this value
if you expect objects in one parallel environment to impact objects within this spacing distance"""
sim_freq: int = 100
"""simulation frequency (Hz)"""
control_freq: int = 20
"""control frequency (Hz). Every control step (e.g. env.step) contains sim_freq / control_freq physx simulation steps"""
gpu_memory_cfg: GPUMemoryConfig = field(default_factory=GPUMemoryConfig)
scene_cfg: SceneConfig = field(default_factory=SceneConfig)
default_materials_cfg: DefaultMaterialsConfig = field(
gpu_memory_config: GPUMemoryConfig = field(default_factory=GPUMemoryConfig)
scene_config: SceneConfig = field(default_factory=SceneConfig)
default_materials_config: DefaultMaterialsConfig = field(
default_factory=DefaultMaterialsConfig
)

Expand All @@ -259,7 +259,7 @@ class MyCustomTask(BaseEnv):
@property
def _default_sim_config(self):
return SimConfig(
gpu_memory_cfg=GPUMemoryConfig(
gpu_memory_config=GPUMemoryConfig(
max_rigid_contact_count=self.num_envs * max(1024, self.num_envs) * 8,
max_rigid_patch_count=self.num_envs * max(1024, self.num_envs) * 2,
found_lost_pairs_capacity=2**26,
Expand Down
1 change: 1 addition & 0 deletions docs/source/user_guide/tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ For those looking for a quickstart/tutorial on Google Colab, checkout the [quick

custom_tasks/index
custom_robots
sensors/index
custom_reusable_scenes
domain_randomization
```
37 changes: 37 additions & 0 deletions docs/source/user_guide/tutorials/sensors/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Sensors / Cameras

This page documents how to use / customize sensors and cameras in ManiSkill in depth at runtime and in task/environment definitions. In ManiSkill, sensors are "devices" that can capture some modality of data. At the moment there is only the Camera sensor type.

## Cameras

Cameras in ManiSkill can capture a ton of different modalities of data. By default ManiSkill limits those to just `rgb`, `depth`, `position` (which is used to derive depth), and `segmentation`. Internally ManiSkill uses [SAPIEN](https://sapien.ucsd.edu/) which has a highly optimized rendering system that leverages shaders to render different modalities of data.

Each shader has a preset configuration that generates textures containing data in a image format, often in a somewhat difficult to use format due to heavy optimization. ManiSkill uses a shader configuration system in python that parses these different shaders into more user friendly formats (namely the well known `rgb`, `depth`, `position`, and `segmentation` type data). This shader config system resides in this file on [Github](https://github.com/haosulab/ManiSkill/blob/main/mani_skill/render/shaders.py) and defines a few friendly defaults for minimal/fast rendering and ray-tracing.


Every ManiSkill environment will have 3 categories of cameras (although some categories can be empty): sensors for observations for agents/policies, human_render_cameras for (high-quality) video capture for humans, and a single viewing camera which is used by the GUI application to render the environment.


At runtime when creating environments with `gym.make`, you can pass runtime overrides to any of these cameras as so. Below changes human render cameras to use the ray-tracing shader for photorealistic rendering, modifies sensor cameras to have width 320 and height 240, and changes the viewer camera to have a different field of view value.

```python
gym.make("PickCube-v1",
sensor_configs=dict(width=320, height=240),
human_render_camera_configs=dict(shader_pack="rt"),
viewer_camera_configs=dict(fov=1),
)
```

These overrides will affect every camera in the environment in that group. So `sensor_configs=dict(width=320, height=240)` will change the width and height of every sensor camera in the environment, but will not affect the human render cameras or the viewer camera.

To override specific cameras, you can do it by camera name. For example, if you want to override the sensor camera with name `camera_0` to have a different width and height, you can do it as so:

```python
gym.make("PickCube-v1",
sensor_configs=dict(camera_0=dict(width=320, height=240)),
)
```

Now all other sensor cameras will have the default width and height, and `camera_0` will have the specified width and height.

These specific customizations can be useful for those looking to customize how they render or generate policy observations to suit their needs.
Loading