Skip to content

A Reinforcement Learning Friendly Simulator for Mobile Robot

License

Notifications You must be signed in to change notification settings

XinJingHao/Sparrow-V1

Repository files navigation

Sparrow-V1.1: A Reinforcement Learning Friendly Simulator for Mobile Robot


What's New in V1.1:

Sparrow-V1.1 is a new-generation mobile robot simulator from the Sparrow family, which puts paramount importance on its simulation speed and lightness. The comparison between Sparrow-V0 and Sparrow-V1.1 is shown below.

In Sparrow-V0, the vectorization relies on gym.vector, which leads to 2 unavoidable limitations: 1) the core calculation of Sparrow-V0 is separately requested to GPU by different CPU cores, wasting the parallel computation feature of GPU and resulting in heavy GPU memory occupation. 2) the gym.vector only supports data (e.g. the state, action, reward, terminated, truncated signals) in numpy format, giving rise to a unfavorable data conversion procedure ( Sparrow.Variables(gpu) → Gym.Variables(cpu) → DRL.Variables(gpu) ) that dramatically slow down the training speed.

To tackle these two issues, the Sparrow-V1.1 concatenates all the variables from different worlds (vectorized environments) and feeds them to the GPU together, unleashing the parallel computing power of GPU and bypassing gym.vector so that omits the data conversion. Additionally, with the publication of PyTorch 2.0, the core calculation process of Sparrow, the LiDAR scan process, is now compiled by torch.compile, which brings about 2.X speeding up. The simulation speed comparison is given as follows.

A more detailed comparison w.r.t. simulation speed and hardware occupation is given below.

Features

  • Vectorizable (Super fast data collection)
  • Domain Randomization (Control interval, maximum velocity, inertia, friction, magnitude of state noise can be randomized while training)
  • Lightweight (Consume only 140~300 mb GPU memories, even with vectorized environments)
  • Standard Gym API with Pytorch data flow
  • GPU/CPU are both acceptable (Fast & Compatible)
  • Conversion-free data flow (The state generated by Sparrow are in torch.tensor format. If you use Pytorch to build your DRL model, you can run your RL model and Sparrow both on GPU, obviating the need to transfer the data from CPU to GPU.)
  • Easy to use (20kb pure Python files. Just import, never worry about installation)
  • Ubuntu/Windows/MacOS are all supported
  • Accept image as map (Draw your own environments easily and rapidly)
  • Detailed comments on source code

Installation

The dependencies for Sparrow-V1.1 are:

torch >= 2.0.1
pygame >= 2.4.0
numpy >= 1.24.3

You can install torch by following the guidance from its official website. We strongly suggest you install the CUDA 11.7 (or higher) version, though CPU version or lower CUDA version is also supported.

Then you can install pygame, numpy via:

pip3 install pygame==2.4.0 numpy==1.24.3

Additionally, we recommended python>=3.10.0. Although other versions might also work.

Quick Start

After installation, you can play with Sparrow-V1.1 with your keyboard (UP/DOWN/LEFT/RIGHT) to test if you have installed it successfully:

python play_with_keyboard.py

Train a DDQN model with Sparrow

The Sparrow is a mobile robot simulator mainly designed for Deep Reinforcement Learning. In this section, we have prepared a simple Python script to show you how to train a DDQN model with Sparrow-V1.1. By the way, other clean and robust Pytorch implementations of popular DRL algorithms can be found here.

Start training:

To train a DDQN model with Sparrow-V1.1, you can run:

python train_DDQN_vector.py

By default, the above script will run on your GPU (although CPU is also supported, running Sparrow-V1.1 with GPU can be remarkably faster ). Additionally, the script will train with the maps in ~/SparrowV1/train_maps with 16 vectorized environments.

Additionally, due to the fact that Sparrow-V1.1 only provides sparse reward, using Prioritized Experience Replay (PER) is promising to achieve a better result. We also provide a single file python script to train Sparrow-V1.1 with PER&DDQN:

python train_PER_DDQN_vector.py

Visualize the training curve:

The above scripts has been incorporated with tensorboard to visualize the training curve, as shown on the right. To enable it, you can just set the write to True, e.g.

python train_PER_DDQN_vector.py --write True

The training curve will be saved in the runs folder, for more details about how to install and use tensorboard, please click here.

Play with trained model:

During training, the model will be saved in the model folder automatically (e.g. model/467k.pth). After training, you can play with it via:

python train_DDQN_vector.py --render True --Loadmodel True --ModelIdex 467 # 467 means using '467k.pth'

You can set --map_address and --ri to play the model with different maps. For more details, please see the following section.

Dive into Sparrow

Create your first env:

Before instantiating Sparrow-V1.1, it is necessary to specify the parameters so that you can customize your own env:

from SparrowV1_1.core import Sparrow, str2bool
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--map_address', type=str, default='train_maps', help='map address: train_maps / test_maps')
parser.add_argument('--device', type=str, default='cuda', help='running device of Sparrow: cuda / cpu')
parser.add_argument('--ld_num', type=int, default=27, help='number of lidar streams in each world')
parser.add_argument('--ld_GN', type=int, default=3, help='how many lidar streams are grouped in each group')
parser.add_argument('--ri', type=int, default=0, help='render index: the index of world that be rendered')
parser.add_argument('--render_mode', type=str, default='human', help='human / rgb_array / None')
parser.add_argument('--render_speed', type=str, default='fast', help='real / fast / slow')
parser.add_argument('--max_ep_steps', type=int, default=1000, help='maximum episodic steps')
parser.add_argument('--AWARD', type=float, default=80, help='reward of reaching target area')
parser.add_argument('--PUNISH', type=float, default=-10, help='reward when collision happens')
parser.add_argument('--STEP', type=float, default=0.0, help='reward of each step')
parser.add_argument('--normalization', type=str2bool, default=False, help='whether to normalize the observations')
parser.add_argument('--flip', type=str2bool, default=False, help='whether to expand training maps with fliped maps')
parser.add_argument('--noise', type=str2bool, default=False, help='whether to add noise to the observations')
parser.add_argument('--DR', type=str2bool, default=False, help='whether to use Domain Randomization')
parser.add_argument('--DR_freq', type=int, default=int(5e3), help='frequency of Domain Randomization, in total steps')
parser.add_argument('--compile', type=str2bool, default=False, help='whether to torch.compile to boost simulation speed')
opt = parser.parse_args()
opt.grouped_ld_num = int(opt.ld_num/opt.ld_GN)
opt.state_dim = 5+opt.grouped_ld_num # [dx,dy,orientation,v_linear,v_angular] + [lidar result]

Afterward, you can create the Sparrow-V1.1 environment via:

envs = Sparrow(**vars(opt))

The above command will instantiate a Sparrow-V1.1 environment with standard Gym API, and you can interact with it via:

import torch
device = torch.device(opt.device)

s, info = envs.reset()
while True:
    a = torch.randint(0,5,(envs.N,),device=device) # 5 is the action dimension; envs.N is the number of vectorized envs
    s_next, r, terminated, truncated, info = envs.step(a)

Note that Sparrow-V1.1 runs in a vectorized manner, thus the dimension of s, a, r, terminated, truncated are (N,32), (N,), (N,), (N,), (N,) respectively, where N is the number of vectorized environments (Here, N=16). In addition, Sparrow-V1.1 has its own AutoReset mechanism. Users only need to reset the envs once in the beginning.

Number of vectorized environments:

The number of vectorized environments is equal to the number of maps currently used. For example, here we set --map_address as train_maps. Because there are 16 maps in SparrowV1/train_maps, the default environmental copies are 16. You can create more environmental copies by putting more maps into the folder.

Coordinate Frames:

There are two coordinate frames: World Coordinate Frame and Grid Coordinate Frame as illustrated below:

The state of the robot is represented in World Coordinate Frame, while the Grid Coordinate Frame is for visualization and LiDAR scanning.

Basic Robot Information:

The LiDAR perception range is 100cm×270°, with an accuracy of 3 cm. The radius of the robot is 9 cm, and its collision threshold is 14 cm.

The maximum linear and angular velocity of the robot is 18 cm/s and 1 rad/s, respectively. The control frequency of the robot is 10Hz. And we use a simple but useful model to describe the kinematics of the robot

$$[V^{i+1}_{linear},\ V^{i+1}_{angular}] = K·[V^{i}_{linear},\ V^{i}_{angular}]+(1-K)·[V^{target}_{linear},\ V^{target}_{angular}]$$

$$[dx^{i+1},dy^{i+1},\theta^{i+1}] = [dx^{i},dy^{i},\theta^{i}] + [V^{i+1}_{linear},V^{i+1}_{linear},\ V^{i+1}_{angular}]·\Delta t · [\cos(\theta^{i}), -\sin(\theta^{i}), 1]$$

Here, K is a hyperparameter between (0,1), describing the combined effect of inertia, friction and the underlying velocity control algorithm, default: 0.6. The parameters mentioned in this section can be found in the Robot initialization and Lidar initialization part of SparrowV1_1/core.py and customized according to your own scenario.

LiDAR Group:

In DRL-based navigation, it is common that the dimension of the LiDAR state is far larger than the dimension of the motion state (e.g. the location, orientation, and speed). To prevent the LiDAR state from submerging the motion state, we provide a useful LiDAR group option. The raw LiDAR output will be grouped every ld_GN streams, and each group outputs its minimal distance.

# Example of ld_num=12, ld_GN=2:
raw_lidar_output = [100,100, 40,60, 90,100, 100,100, 100,100, 100,100]

grouped_lidar_output = [100, 40, 90, 100, 100, 100]

RL representation:

The basic task in Sparrow is about driving the robot from the start point to the end point as fast as possible, without colliding with obstacles. To this end, in the following sub-sections, we will define several basic components of the Markov Decision Process.

State:

The state of the robot is a vector of length 32 (when ld_GN=1), containing position (state[0:2] = [dx,dy]), orientation (state[2]=θ), velocity (state[3:5]=[v_linear, v_angular]), LiDAR (state[5:32] = scanning result). Note that if the --normalization were set to False when instantiating the env, the env would output the raw state in World Coordinate Frame. Otherwise, the env outputs a normalized state. For more details, please check the _Normalize() function in SparrowV1_1/core.py.

Action:

There are 6 discrete actions in Sparrow, controlling the target velocity of the robot:

  • Turn Left: [ 3.6 cm/s, 1 rad/s ]
  • Turn Left + Move forward: [ 18 cm/s, 1 rad/s ]
  • Move forward: [ 18 cm/s, 0 rad/s ]
  • Turn Right + Move forward: [ 18 cm/s, -1 rad/s ]
  • Turn Right: [ 3.6 cm/s, -1 rad/s ]
  • Stop: [ 0 cm/s, 0 rad/s ]

We strongly suggest not using the Stop action when training an RL model, because it may result in the robot standing still and generating low-quality data. You might have also noted that when the robot is turning left or right, we also give it a small linear velocity. We do this to help the robot escape from the deadlock.

Reward:

In Sparrow-V1.1, we only provide the naive reward function. R=80, when arrive; R=-10, when collide; R=0, otherwise

Termination:

The episode would be terminated only when the robot collides with the obstacles or reaches the target area.

Truncation:

The episode would be truncated only when the episode steps exceed params.max_ep_steps.

Random initialization:

At the beginning of every episode, the robot will be randomly initialized in the lower right corner of the map with different orientations to avoid overfitting.

Render:

If render_mode=None or render_mode="rgb_array", Sparrow would run at its maximum simulation speed (depending on the hardware). However, if render_mode="human", there would be three options regarding the simulation speed:

  • render_speed == 'fast': render the Sparrow in a pygame window with maximum FPS
  • render_speed == 'slow': render the Sparrow in a pygame window with 5 FPS. Might be useful when debugging.
  • render_speed == 'real': render the Sparrow in a pygame window with 1/ctrl_interval FPS, in accordance with the real world speed.

You can configure different --map_address and --ri to render on different maps.

Customize your own maps:

Sparrow takes .png images as its maps, e.g. the map0.png~map15.png in SparrowV1/train_maps/. Therefore, you can draw your own maps with any image process software easily and conveniently, as long as it satisfies the following requirements:

  • saved in .png format
  • resolution (namely the map size) equals 366×366
  • obstacles are in black (0,0,0) and free space is in white (255,255,255)
  • adding a fence to surround the map so that the robot cannot run out of the map

AutoReset:

The environment copies inside the vectorized environment may be done (terminated or truncated) in different timesteps. Consequently, it is inefficient or even improper to call the env.reset() function to reset all copies whenever one copy is done, necessitating the design of AutoReset mechanism, which is illustrated below:

Note:

a) the AutoReset mechanism of Sparrow-V1.1 is different from Sparrow-V0

b) different environment copies are reset independently

c) the interaction in train_DDQN_vector.py runs in the following way:

A1 = model.select_action(S1)
S2, R2, dw2, tr2, info2 = envs.step(A1)
buffer.add(S1, A1, R2, dw2, ct1)
done2 = dw2+tr2
ct2 = ~(done2)

The Sparrow families

  • Sparrow-V1: Single Robot, Static environments
  • Sparrow-V2: Single Robot, Dynamic/Static environments
  • Sparrow-V3: Multiple/Single Robot, Dynamic/Static environments

Important Differences from Sparrow-V0:

Some features from Sparrow-V0 are modified in Sparrow-V1.1. They are:

  • Numpy data flow: Discarded
  • Map0.png with randomly generated obstacles: Discarded
  • Random initialization: train_maps_startpoints is Discarded
  • Reward Function: Replaced with sparse reward function
  • Coordinate system: Modified
  • AutoReset mechanism: Modified

Important Differences from Sparrow-V1.0:

  • Sensor Noise: supported in V1.1
  • Domain Randomization: supported in V1.1
  • LiDAR Group: supported in V1.1

Citing the Project

To cite this repository in publications:

@article{Color2025XJH,
title = {Train a real-world local path planner in one hour via partially decoupled reinforcement learning and vectorized diversity},
journal = {Engineering Applications of Artificial Intelligence},
volume = {141},
pages = {109726},
year = {2025},
issn = {0952-1976},
doi = {https://doi.org/10.1016/j.engappai.2024.109726},
}

Writing in the end

The name "Sparrow" actually comes from an old saying “麻雀虽小,五脏俱全.”

Hope you enjoy using Sparrow!

Additionally, we have made detailed comments on the source code (SparrowV1_1/core.py) so that you can modify Sparrow to fit your own problem. But only for non-commercial purposes, and all rights are reserved by Jinghao Xin.

About

A Reinforcement Learning Friendly Simulator for Mobile Robot

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages