- change loss function
- Analysis the performance of predictor training
- Unify the coordinate of mujoco and ros fetch simulation
- joint training
- Reduce RL traning steps
- Baseline training
- Whether fine-tuning for predictor training
- Whether smooth training process (two datasets)
- Different predictor network sizes
- Predict only end-effector
- GUI (@xuanz)
- python3.6
- tensorflow==1.12
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia-396
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/$YOUR_HOME_DIR/.mujoco/mjpro150/bin
# export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so:/usr/lib/nvidia-396/libGL.so
- copy gym files to your gym directory
cp gym_file/jointvel.xml to $GYM_PATH/gym/envs/robotics/assets/fetch
cp gym_file/shared_xz.xml to $GYM_PATH/gym/envs/robotics/assets/fetch
- install baselines
cd baselines
pip install -e .
- if can not import baseline.logger: remove old package and reinstall baselines
-
Download pretrained model
-
joint training RL policy with seq2seq predictor
bash train_cycle.sh ${ITER_STEP} ${PRED_WEIGHT}
- visualize rl training process
python results_plotter.py --log_num=${ITER_STEP}
- Env code
python env_test.py
- RL code
cd baselines/baselines/ppo2
python run.py
For training policy, please set
--train=True
--display=False
--load=False
For sampling dataset, please set
--train=False
--display=False
--load=True
--point="$YOUR_CHECKPOINT_NUMBER"
For displaying performance, please set
--train=False
--display=True
--load=True
--point="$YOUR_CHECKPOINT_NUMBER"
- LSTM training code
python predictor_new.py
python predictor_new.py --test
obs = env.reset()
origin_obs = env.origin_obs
done = False
while not done:
act = actor.act(obs)
obs, rew, done, _ = env.step(act)
origin_obs = env.origin_obs
- 0.1.0
- complete environment test
- 0.2.0
- complete reward function for env
- complete reset function for env
- 0.3.0
- add reinforcement learning code to train fetch
- complete no predictable reward training
- 0.3.5
- add visualization of obs in ppo2.py (example in line 389 to 402)
- 0.3.6
- change prediction to sequence to sequence mode
- use new tensorflow seq2seq api
- 0.4.0
- add a script for training
- finish two reward framework
- 0.5.0
- joint training
- 0.6.0
- smooth traning process (two datasets)
- reset entropy for rl training