Trained policy export to ONNX via PyTorch #922

crobarcro · 2020-07-05T22:34:17Z

I am attempting to export a trained policy to the ONNX common interchange format for use in prediction only. As such I found a very useful discussion in issue #372 . This issue describes how to convert a model to an equivalent pytorch model. In turn pytorch has support for export to ONNX. Using the code in the collab notebook linked to in #372 I was able to create a script which trained a cartpole model, converted it to pytorch, then exported to ONNX, as shown below.

import shutil, os
import gym
import tensorflow as tf
import torch.onnx

from stable_baselines.common.policies import MlpPolicy
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines import PPO2

from ceorl_stable_baselines.onnx_export import PyTorchMlp, copy_mlp_weights

env = gym.make('CartPole-v1')
# Optional: PPO2 requires a vectorized environment to run
# the env is now wrapped automatically when passing it to the constructor
# env = DummyVecEnv([lambda: env])

model = PPO2(MlpPolicy, env, verbose=1)
model.learn(total_timesteps=1000)

for key, value in model.get_parameters().items():
  print(key, value.shape)

containing_dir = os.path.dirname(os.path.realpath(__file__))

path = os.path.join(containing_dir, 'export_model_test.onnx')

shutil.rmtree (path, ignore_errors=True)

#os.mkdir (path)

th_model = copy_mlp_weights(model)

obs = env.reset()

# Input to the model
batch_size = 1    # just a random number
x = torch.randn(batch_size, 1, 1, 4, requires_grad=True)
torch_out = th_model(x)

# Export the model
torch.onnx.export(th_model,               # model being run
                  x,                         # model input (or a tuple for multiple inputs)
                  path,   # where to save the model (can be a file or file-like object)
                  export_params=True,        # store the trained parameter weights inside the model file
                  opset_version=9,          # the ONNX version to export the model to
                  do_constant_folding=True,  # whether to execute constant folding for optimization
                  input_names = ['input'],   # the model's input names
                  output_names = ['output'], # the model's output names
                  dynamic_axes={'input' : {0 : 'batch_size'},    # variable length axes
                                'output' : {0 : 'batch_size'}})

I used the PyTorchCnnPolicy class and copy_cnn_weights function from the linked notebook. The ONNX network produced can be visualised with Netron, and is shown below:

The results of this were good, or at least produced something that had the same numbers in it as the stable baselines policy. However, what I would like to know, is, how applicable is this to other environments trained using PPO1 and PPO2 with the MlpPolicy. Particularly for the case where the action space is not discrete, but rather continuous. Are modifications required in this case?

The text was updated successfully, but these errors were encountered:

araffin · 2020-07-06T08:54:07Z

However, what I would like to know, is, how applicable is this to other environments trained using PPO1 and PPO2 with the MlpPolicy

This should be applicable.

Particularly for the case where the action space is not discrete, but rather continuous. Are modifications required in this case?

I recommend you to learn more on how it works for continuous actions ;)

See resources in the doc: https://stable-baselines.readthedocs.io/en/master/guide/rl.html (especially Spinning Up)

In the case of continuous actions, a Gaussian distribution is usually used, so the network will output a mean (the deterministic action) and standard deviation that will be used to sample actions.

crobarcro · 2020-07-06T09:00:04Z

Pinging @pstansell the member of our project who has the deeper understanding of RL (I am more of a code monkey in this area).

araffin · 2020-10-24T16:07:10Z

Best for you now would be to use Stable-Baselines3 (directly in PyTorch).

crobarcro · 2020-10-29T15:16:10Z

We did actually get this to work, and without pytorch. In our case we wanted to export directly to a mat file, we ended up with something like this:

def write_mlp_weights_to_mat(baselines_model, filename):

    from scipy.io import savemat

    model_params = baselines_model.get_parameters()

    policy_keys = [key for key in model_params.keys() if "pi" in key or "shared" in key]
    policy_params = [model_params[key] for key in policy_keys]

    mat_export = {}
    layernum = 0
    for key, policy_param in zip(policy_keys, policy_params):

        if "/w" in key:

            layernum = layernum + 1

            mat_export[f"Layer{layernum}Weights"] = np.atleast_2d (policy_param).transpose ()

            mat_export[f"Layer{layernum}Scale"] = 1.0

            mat_export[f"Layer{layernum}Offset"] = 0.0

        elif "/b" in key:

            mat_export[f"Layer{layernum}Bias"] = np.atleast_2d (policy_param).transpose ()

        else:

            pass

araffin added the question Further information is requested label Jul 6, 2020

araffin closed this as completed Oct 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trained policy export to ONNX via PyTorch #922

Trained policy export to ONNX via PyTorch #922

crobarcro commented Jul 5, 2020 •

edited by araffin

Loading

araffin commented Jul 6, 2020

crobarcro commented Jul 6, 2020

araffin commented Oct 24, 2020 •

edited

Loading

crobarcro commented Oct 29, 2020

Trained policy export to ONNX via PyTorch #922

Trained policy export to ONNX via PyTorch #922

Comments

crobarcro commented Jul 5, 2020 • edited by araffin Loading

araffin commented Jul 6, 2020

crobarcro commented Jul 6, 2020

araffin commented Oct 24, 2020 • edited Loading

crobarcro commented Oct 29, 2020

crobarcro commented Jul 5, 2020 •

edited by araffin

Loading

araffin commented Oct 24, 2020 •

edited

Loading