Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trained policy export to ONNX via PyTorch #922

Closed
crobarcro opened this issue Jul 5, 2020 · 4 comments
Closed

Trained policy export to ONNX via PyTorch #922

crobarcro opened this issue Jul 5, 2020 · 4 comments
Labels
question Further information is requested

Comments

@crobarcro
Copy link

crobarcro commented Jul 5, 2020

I am attempting to export a trained policy to the ONNX common interchange format for use in prediction only. As such I found a very useful discussion in issue #372 . This issue describes how to convert a model to an equivalent pytorch model. In turn pytorch has support for export to ONNX. Using the code in the collab notebook linked to in #372 I was able to create a script which trained a cartpole model, converted it to pytorch, then exported to ONNX, as shown below.

import shutil, os
import gym
import tensorflow as tf
import torch.onnx

from stable_baselines.common.policies import MlpPolicy
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines import PPO2

from ceorl_stable_baselines.onnx_export import PyTorchMlp, copy_mlp_weights

env = gym.make('CartPole-v1')
# Optional: PPO2 requires a vectorized environment to run
# the env is now wrapped automatically when passing it to the constructor
# env = DummyVecEnv([lambda: env])

model = PPO2(MlpPolicy, env, verbose=1)
model.learn(total_timesteps=1000)

for key, value in model.get_parameters().items():
  print(key, value.shape)

containing_dir = os.path.dirname(os.path.realpath(__file__))

path = os.path.join(containing_dir, 'export_model_test.onnx')

shutil.rmtree (path, ignore_errors=True)

#os.mkdir (path)

th_model = copy_mlp_weights(model)

obs = env.reset()

# Input to the model
batch_size = 1    # just a random number
x = torch.randn(batch_size, 1, 1, 4, requires_grad=True)
torch_out = th_model(x)

# Export the model
torch.onnx.export(th_model,               # model being run
                  x,                         # model input (or a tuple for multiple inputs)
                  path,   # where to save the model (can be a file or file-like object)
                  export_params=True,        # store the trained parameter weights inside the model file
                  opset_version=9,          # the ONNX version to export the model to
                  do_constant_folding=True,  # whether to execute constant folding for optimization
                  input_names = ['input'],   # the model's input names
                  output_names = ['output'], # the model's output names
                  dynamic_axes={'input' : {0 : 'batch_size'},    # variable length axes
                                'output' : {0 : 'batch_size'}})

I used the PyTorchCnnPolicy class and copy_cnn_weights function from the linked notebook. The ONNX network produced can be visualised with Netron, and is shown below:

export_onnx_test

The results of this were good, or at least produced something that had the same numbers in it as the stable baselines policy. However, what I would like to know, is, how applicable is this to other environments trained using PPO1 and PPO2 with the MlpPolicy. Particularly for the case where the action space is not discrete, but rather continuous. Are modifications required in this case?

@araffin araffin added the question Further information is requested label Jul 6, 2020
@araffin
Copy link
Collaborator

araffin commented Jul 6, 2020

However, what I would like to know, is, how applicable is this to other environments trained using PPO1 and PPO2 with the MlpPolicy

This should be applicable.

Particularly for the case where the action space is not discrete, but rather continuous. Are modifications required in this case?

I recommend you to learn more on how it works for continuous actions ;)

See resources in the doc: https://stable-baselines.readthedocs.io/en/master/guide/rl.html (especially Spinning Up)

In the case of continuous actions, a Gaussian distribution is usually used, so the network will output a mean (the deterministic action) and standard deviation that will be used to sample actions.

@crobarcro
Copy link
Author

Pinging @pstansell the member of our project who has the deeper understanding of RL (I am more of a code monkey in this area).

@araffin
Copy link
Collaborator

araffin commented Oct 24, 2020

Best for you now would be to use Stable-Baselines3 (directly in PyTorch).

@araffin araffin closed this as completed Oct 24, 2020
@crobarcro
Copy link
Author

We did actually get this to work, and without pytorch. In our case we wanted to export directly to a mat file, we ended up with something like this:

def write_mlp_weights_to_mat(baselines_model, filename):

    from scipy.io import savemat

    model_params = baselines_model.get_parameters()

    policy_keys = [key for key in model_params.keys() if "pi" in key or "shared" in key]
    policy_params = [model_params[key] for key in policy_keys]

    mat_export = {}
    layernum = 0
    for key, policy_param in zip(policy_keys, policy_params):

        if "/w" in key:

            layernum = layernum + 1

            mat_export[f"Layer{layernum}Weights"] = np.atleast_2d (policy_param).transpose ()

            mat_export[f"Layer{layernum}Scale"] = 1.0

            mat_export[f"Layer{layernum}Offset"] = 0.0

        elif "/b" in key:

            mat_export[f"Layer{layernum}Bias"] = np.atleast_2d (policy_param).transpose ()

        else:

            pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants