It is recommended to give an example of off policy using the feature extractor #982

Zero1366166516 · 2022-07-25T01:54:36Z

Important Note: We do not do technical support, nor consulting and don't answer personal questions per email.
Please post your question on the RL Discord, Reddit or Stack Overflow in that case.

If your issue is related to a custom gym environment, please use the custom gym env template.

🐛 Bug

I want to customize the feature extractor. According to the program written in the example, I get the following errors. I have seen: too many errors when customizing policy, a full example for off policy algorithms should be added in user guide #425, this issue, mentioned
The off policy network should also use the feature extractor. It is recommended to give an example of off policy using the feature extractor. Thank you!
class CustomCombinedExtractor(BaseFeaturesExtractor):
def init(self, observation_space: gym.spaces.Dict):
# We do not know features-dim here before going over all the items,
# so put something dummy for now. PyTorch requires calling
# nn.Module.init before adding modules
super(CustomCombinedExtractor, self).init(observation_space, features_dim=1)

    extractors = {}

    total_concat_size = 0
    print(observation_space)
    #print(observation_space.items(0))

    print(observation_space.spaces.items())
    exit()
    # We need to know size of the output of this extractor,
    # so go over all the spaces and compute output feature sizes
    for key, subspace in observation_space.spaces.items():
        if key == "image":
            # We will just downsample one channel of the image by 4x4 and flatten.
            # Assume the image is single-channel (subspace.shape[0] == 0)
            extractors[key] = nn.Sequential(nn.MaxPool2d(4), nn.Flatten())
            total_concat_size += subspace.shape[1] // 4 * subspace.shape[2] // 4
        elif key == "vector":
            # Run through a simple MLP
            extractors[key] = nn.Linear(subspace.shape[0], 16)
            total_concat_size += 16

    self.extractors = nn.ModuleDict(extractors)

    # Update the features dim manually
    self._features_dim = total_concat_size

def forward(self, observations) -> th.Tensor:
    encoded_tensor_list = []

    # self.extractors contain nn.Modules that do all the processing.
    for key, extractor in self.extractors.items():
        encoded_tensor_list.append(extractor(observations[key]))
    # Return a (B, self._features_dim) PyTorch tensor, where B is batch dimension.
    return th.cat(encoded_tensor_list, dim=1)


policy_kwargs = dict(
    features_extractor_class=CustomCombinedExtractor,
    share_features_extractor=False,
    features_extractor_kwargs=dict(features_dim=128))
#policy_kwargs = dict(activation_fn=th.nn.ReLU,
#                     net_arch=[dict(pi=[32, 32], vf=[32, 32])])
def get_model(
    self,
    model_name: str,
    #policy: str = "MlpPolicy",  
    policy: str = "MultiInputPolicy",
    policy_kwargs: dict = policy_kwargs,
    model_kwargs: dict = None,
    verbose: int = 1
) -> Any:
   
    print("set Debug!")

    if model_name not in MODELS:
        raise NotImplementedError("NotImplementedError")
    
    if model_kwargs is None:
        model_kwargs = MODEL_KWARGS[model_name]

    if "action_noise" in model_kwargs:
        n_actions = self.env.action_space.shape[-1]                         
        model_kwargs["action_noise"] = NOISE[model_kwargs["action_noise"]](
            mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions)
        )
    print(model_kwargs)    
    print(policy, self.env)
    print(model_name)
    model = MODELS[model_name](         
        policy=policy,
        env=self.env,
        tensorboard_log="{}/{}".format(config.TENSORBOARD_LOG_DIR, model_name),
        verbose=verbose,
        policy_kwargs=policy_kwargs,
        **model_kwargs
    )

Traceback (most recent call last):
File "C:/Users/Administrator/PycharmProjects/demo/utils/models.py", line 419, in
model_sac = agent.get_model("sac", model_kwargs=SAC_PARAMS)
File "C:/Users/Administrator/PycharmProjects/demo/utils/models.py", line 328, in get_model
model = MODELS[model_name](
File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\sac\sac.py", line 144, in init
self._setup_model()
File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\sac\sac.py", line 147, in _setup_model
super(SAC, self)._setup_model()
File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\common\off_policy_algorithm.py", line 216, in _setup_model
self.policy = self.policy_class( # pytype:disable=not-instantiable
File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\sac\policies.py", line 498, in init
super(MultiInputPolicy, self).init(
File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\sac\policies.py", line 292, in init
self._build(lr_schedule)
File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\sac\policies.py", line 295, in _build
self.actor = self.make_actor()
File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\sac\policies.py", line 348, in make_actor
actor_kwargs = self._update_features_extractor(self.actor_kwargs, features_extractor)
File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\common\policies.py", line 112, in _update_features_extractor
features_extractor = self.make_features_extractor()
File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\common\policies.py", line 118, in make_features_extractor
return self.features_extractor_class(self.observation_space, **self.features_extractor_kwargs)
TypeError: init() got an unexpected keyword argument 'features_dim'

A clear and concise description of what the bug is.

To Reproduce

Steps to reproduce the behavior.

Please try to provide a minimal example to reproduce the bug. Error messages and stack traces are also helpful.

Please use the markdown code blocks
for both code and stack traces.

from stable_baselines3 import ...

Traceback (most recent call last): File ...

Expected behavior

A clear and concise description of what you expected to happen.

### System Info

Describe the characteristic of your environment:

Describe how the library was installed (pip, docker, source, ...)
GPU models and configuration
Python version
PyTorch version
Gym version
Versions of any other relevant libraries

You can use sb3.get_system_info() to print relevant packages info:

import stable_baselines3 as sb3
sb3.get_system_info()

Additional context

Add any other context about the problem here.

Checklist

I have checked that there is no similar issue in the repo (required)
I have read the documentation (required)
I have provided a minimal working example to reproduce the bug (required)

The text was updated successfully, but these errors were encountered:

qgallouedec · 2022-07-25T04:45:29Z

Next time, please help us to help you by taking the necessary time to feel the issue template.

As the error suggests, your feature extractor does not take the feature dimension as argument. Then try

def __init__(self, observation_space, features_dim):

araffin · 2022-07-25T07:50:26Z

Next time, please help us to help you by taking the necessary time to feel the issue template.

As the error suggests, your feature extractor does not take the feature dimension as argument. Then try
def init(self, observation_space, features_dim):

The features extractor for off-policy is the same as on-policy and is already documented: https://stable-baselines3.readthedocs.io/en/master/guide/custom_policy.html#custom-feature-extractor

as @qgallouedec wrote, the error you get is because you pass argument to the features extractor (features_extractor_kwargs=dict(features_dim=128))) but you don't have that parameter (features_dim) as argument in your class.

Zero1366166516 · 2022-07-26T22:44:30Z

First of all, thank you very much for your help.

I modified the class CustomCNN as follows:
MODELS = {"a2c": A2C, "ddpg": DDPG, "td3": TD3, "sac": SAC, "ppo": PPO}
MODEL_KWARGS = {x: config.dict["{}_PARAMS".format(x.upper())] for x in MODELS.keys()}

NOISE = {
"normal": NormalActionNoise,
"ornstein_uhlenbeck": OrnsteinUhlenbeckActionNoise
}

class CustomCNN(BaseFeaturesExtractor):
"""
:param observation_space: (gym.Space)
:param features_dim: (int) Number of features extracted.
This corresponds to the number of unit for the last layer.
"""

def __init__(self, observation_space: gym.spaces.Box, features_dim: int = 1):
    super(CustomCNN, self).__init__(observation_space, features_dim)
    # We assume CxHxW images (channels first)
    # Re-ordering will be done by pre-preprocessing or wrapper
    n_input_channels = observation_space.shape[0]

    print("n_input_channels", observation_space.shape[0])
    #print("features = ", features_dim)
    #observation_space = observation_space.T
    self.cnn = nn.Sequential(
        nn.Conv1d(1, n_input_channels, kernel_size=1, stride=1, padding=0),
        nn.ReLU(),
        nn.Conv1d(n_input_channels, 1, kernel_size=1, stride=1, padding=0),
        nn.ReLU(),
        nn.Flatten(),
    )
    print(self.cnn.type)

    # Compute shape by doing one forward pass
    with th.no_grad():

        n_flatten = self.cnn(
            th.as_tensor(observation_space.sample()[None]).float()
        ).shape[1]
        print(observation_space, observation_space.sample()[None])
        print(self.cnn)

    self.linear = nn.Sequential(nn.Linear(n_flatten, features_dim), nn.Tanh())
    print(self.linear)
    #self.linear = th.as_tensor(self.linear)
    #exit()

def forward(self, observations: th.Tensor) -> th.Tensor:
    print("go to the forward:", observations)
    return self.linear(self.cnn(observations))


policy_kwargs = dict(
    features_extractor_class=CustomCNN,
    share_features_extractor=False,
    features_extractor_kwargs=dict(features_dim=1),
    net_arch=[dict(pi=[32, 32], qf=[64, 64])]
)
#policy_kwargs = dict(activation_fn=th.nn.ReLU,
#                     net_arch=[dict(pi=[32, 32], vf=[32, 32])])
def get_model(
    self,
    model_name: str,
    policy: str = "MlpPolicy",     
    #policy: str = "MultiInputPolicy",
    policy_kwargs: dict = policy_kwargs,
    model_kwargs: dict = None,
    verbose: int = 1
) -> Any:
   
    print("set Debug!")

    if model_name not in MODELS:
        raise NotImplementedError("NotImplementedError")
    
    if model_kwargs is None:
        model_kwargs = MODEL_KWARGS[model_name]

    if "action_noise" in model_kwargs:
        n_actions = self.env.action_space.shape[-1]                          
        model_kwargs["action_noise"] = NOISE[model_kwargs["action_noise"]](
            mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions)
        )
    print(model_kwargs)    
    print(policy, self.env)
    print(model_name)
    print(observation)
    print(self.env.observation_space)
    print("dispaly: observation_space")
    model = MODELS[model_name](         
        policy=policy,
        env=self.env,
        tensorboard_log="{}/{}".format(config.TENSORBOARD_LOG_DIR, model_name),
        verbose=verbose,
        policy_kwargs=policy_kwargs,
        **model_kwargs
    )
    print("model display: ", model)
    exit()
    return model

I defined it before: self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape=(self.state_space,)

However, there is another error.

File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\linear.py", line 96, in init
self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:

(tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
(tuple of ints size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)

I'm sorry to ask so many questions. However, I just want to use CNN feature extractor to extract features for mlppolicy policy network. I think this error is stable-baseline3. There is no complete example of off policy algorithm, especially using off policy algorithms such as ddpg, td3, sac.

qgallouedec · 2022-07-27T04:56:01Z

Can you try to provide a minimal and functional code example to reproduce the error. (Remove all your print, use a single agent, ...) Please also use the markdown code blocks for code. It will be easier for us to help you.

Zero1366166516 · 2022-07-27T14:53:48Z

Sincerely thank you for your help.The problem of off policy network has been bothering me for several days.
I use the example to create a class (customcnn) as the feature extractor and define the
policy_ kwargs = dict(
features_ extractor_ class=CustomCNN,
net_ arch=dict(qf=[256, 256], pi=[256, 256])
)
CNN neural network is used as the feature extractor, and the code is as follows:

···

from typing import Any
import pandas as pd
import numpy as np
import time
from stable_baselines3 import DDPG
from stable_baselines3 import A2C
from stable_baselines3 import PPO
from stable_baselines3 import TD3
from stable_baselines3 import SAC
from stable_baselines3.common.noise import NormalActionNoise, OrnsteinUhlenbeckActionNoise
from stable_baselines3.common.policies import register_policy, ActorCriticPolicy
from stable_baselines3.common.torch_layers import BaseFeaturesExtractor
import gym
from gym import spaces
import torch as th
import torch.nn as nn
from typing import Callable, Dict, List, Optional, Tuple, Type, Union
from utils import config
from utils.preprocessors import split_data
from utils.env import StockLearningEnv
···

MODELS = {"a2c": A2C, "ddpg": DDPG, "td3": TD3, "sac": SAC, "ppo": PPO}
MODEL_KWARGS = {x: config.dict["{}_PARAMS".format(x.upper())] for x in MODELS.keys()}

NOISE = {
"normal": NormalActionNoise,
"ornstein_uhlenbeck": OrnsteinUhlenbeckActionNoise
}
···This is the class I defined CustomCNN,Because I want to analyze time series data, I use 1-dimensional conv1d. The input data is 13 columns, and the number of input rows is either 1 or 128, which is random.

`class CustomCNN(BaseFeaturesExtractor):
"""
:param observation_space: (gym.Space)
:param features_dim: (int) Number of features extracted.
This corresponds to the number of unit for the last layer.
"""

def __init__(self, observation_space: gym.spaces.Box, features_dim: int = 1):
    super(CustomCNN, self).__init__(observation_space, features_dim)
    # We assume CxHxW images (channels first)
    # Re-ordering will be done by pre-preprocessing or wrapper
    n_input_channels = observation_space.shape[0]
   
    self.cnn = nn.Sequential(

        nn.Conv1d(self.features_dim, n_input_channels, kernel_size=1, stride=1, padding=0),
        nn.ReLU(),
        nn.Conv1d(n_input_channels, self.features_dim, kernel_size=1, stride=1, padding=0),
        nn.ReLU(),
        nn.Flatten(),
    )
        with th.no_grad():
        n_flatten = self.cnn(
            th.as_tensor(observation_space.sample()[None]).float()
        ).shape[1]
    self.linear = nn.Sequential(nn.Linear(n_flatten, self.features_dim), nn.Tanh())

···
When observations is [1,1, 13], the referenced example can work normally, but when data is randomly sampled [1128,13], nn.sequential needs to be redefined. I modified the code of the example, but still reported an error.

def forward(self, observations: th.Tensor) -> th.Tensor:
    n_flatten = np.array(observations).shape[1]
    features_dim = np.array(observations).shape[0]
    print(features_dim, n_flatten)
    if features_dim != 1:
        self.cnn = nn.Sequential(
            nn.Conv1d(features_dim, n_flatten, kernel_size=1, stride=1, padding=0),
            nn.ReLU(),
            nn.Conv1d(n_flatten, features_dim, kernel_size=1, stride=1, padding=0),
            nn.ReLU(),
            nn.Flatten(),
        )
        self.linear = nn.Sequential(nn.Linear(n_flatten, features_dim), nn.Tanh())
                
    return self.linear(self.cnn(observations))

·····

policy_kwargs = dict(
    features_extractor_class=CustomCNN,
    net_arch=dict(qf=[256, 256], pi=[256, 256])
)

····
Here is the definition of get_ Mode function,

def get_model(
    self,
    model_name: str,
    policy: str = "MlpPolicy",     
    #policy: str = "MultiInputPolicy",
    policy_kwargs: dict = policy_kwargs,
    model_kwargs: dict = None,
    verbose: int = 1
) -> Any:
    if model_name not in MODELS:
        raise NotImplementedError("NotImplementedError")
    if model_kwargs is None:
        model_kwargs = MODEL_KWARGS[model_name]
    if "action_noise" in model_kwargs:
        n_actions = self.env.action_space.shape[-1]                          
        model_kwargs["action_noise"] = NOISE[model_kwargs["action_noise"]](
            mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions)
        )
    model = MODELS[model_name](          
        policy=policy,
        env=self.env,
        tensorboard_log="{}/{}".format(config.TENSORBOARD_LOG_DIR, model_name),
        verbose=verbose,
        policy_kwargs=policy_kwargs,
        **model_kwargs
    )
    return model

···
this is train_model,the erroe in here,model.learn,

def train_model(
    self, model: Any, tb_log_name: str, total_timesteps: int = 5000
    ) -> Any:
    """train model"""
    model = model.learn(total_timesteps=total_timesteps, tb_log_name=tb_log_name)
    return model

Start testing here, input data, initial environment and model.

if __name__ == "__main__":
    from pull_data import Pull_data
    from preprocessors import FeatureEngineer, split_data
    from utils import config
    import time

    # pull data
    #df = Pull_data(config.SSE_50[:2], save_data=False).pull_data()
    df = Pull_data(config.SSE_50[:2]).pull_data()
    df = FeatureEngineer().preprocess_data(df)
    df = split_data(df, '2009-01-01', '2019-01-01')
    print(df.head())

    # 
    stock_dimension = len(df.tic.unique()) # 2
    state_space = 1 + 2*stock_dimension + \
        len(config.TECHNICAL_INDICATORS_LIST)*stock_dimension # 23 
    print("stock_dimension: {}, state_space: {}".format(stock_dimension, state_space))
    env_kwargs = {
        #"stock_dim": stock_dimension,
        "hmax": 100, 
        "initial_amount": 1e6, 
        "buy_cost_pct": 0.001,
        "sell_cost_pct": 0.001,
        #"reward_scaling": 1e-4,
        #"state_space": state_space,
        #"action_space": stock_dimension,
        #"tech_indicator_list": config.TECHNICAL_INDICATORS_LIST
    }

    # test env
    e_train_gym = StockLearningEnv(df=df, **env_kwargs)
    ## mulpt test
    observation = e_train_gym.reset()      
    count = 0
    for t in range(10):
        action = e_train_gym.action_space.sample()  
        observation, reward, done, info = e_train_gym.step(action)  

        if done:
            break
        count+=1
        time.sleep(0.2)      
    print("observation: ", observation)
    print("action: ", action)
    print("reward: {}, done: {},info: {}".format(reward, done, info))

    # test model
    env_train, _ = e_train_gym.get_sb_env()
    print(type(env_train))

    ##register_policy('CustomPolicy', CustomPolicy)
    ##register_policy('CustomActorCriticPolicy', CustomActorCriticPolicy)
    agent = DRL_Agent(env= env_train)
    SAC_PARAMS = {
        "batch_size": 128,
        "buffer_size": 1000000,
        "learning_rate": 0.0001,
        "learning_starts": 100,
        "ent_coef": "auto_0.1"
    }
    model_sac = agent.get_model("sac", model_kwargs=SAC_PARAMS)
        trained_sac = agent.train_model(
        model=model_sac,
        tb_log_name='sac', 
        total_timesteps= 50000
    )
`···
The following is the error message

The error prompt is as follows:
Traceback (most recent call last):
  File "C:/Users/Administrator/PycharmProjects/demo/utils/models.py", line 477, in <module>
    trained_sac = agent.train_model(
  File "C:/Users/Administrator/PycharmProjects/demo/utils/models.py", line 401, in train_model
    model = model.learn(total_timesteps=total_timesteps, tb_log_name=tb_log_name)
  File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\sac\sac.py", line 292, in learn
    return super(SAC, self).learn(
  File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\common\off_policy_algorithm.py", line 366, in learn
    self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
  File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\sac\sac.py", line 206, in train
    actions_pi, log_prob = self.actor.action_log_prob(replay_data.observations)
  File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\sac\policies.py", line 180, in action_log_prob
    mean_actions, log_std, kwargs = self.get_action_dist_params(obs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\sac\policies.py", line 163, in get_action_dist_params
    latent_pi = self.latent_pi(features)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\container.py", line 139, in forward
    input = module(input)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x128 and 1x256)

···
Thank you again for your help. I edited the question again according to your request. Because I do what I want to do on the basis of sunnyswag's code, which is to modify the feature extractor to see whether sac, ddpg and td3 can perform better on the stock portfolio. This problem has bothered me for several days. Thank you again!!!

qgallouedec · 2022-07-27T16:57:28Z

I would really like to help you, but you should at least take into consideration the remarks I give you. I need:

A minimal and functional code.
For example, this code is not minimal because one line can be deleted without removing the error:

import numpy as np

a = np.ones(2)
b = np.ones(2)
c = a / 0

and this code is not functional because the imports are missing:

a = np.ones(2)
c = a / 0

Your code is neither minimal nor functional. So I am not able to reproduce your error.

A properly formatted code so that I can understand it. For that, I send you again the link of my previous message: format markdown code blocks

From what I can see, it appears to be a shape-related error. You may have made a mistake in the network specification.

Zero1366166516 · 2022-07-28T10:50:08Z

Sincerely thank you for your help.The problem of off policy network has been bothering me for several days.
I use the example to create a class (customcnn) as the feature extractor and define the
policy_ kwargs = dict(
features_ extractor_ class=CustomCNN,
net_ arch=dict(qf=[256, 256], pi=[256, 256])
)
CNN neural network is used as the feature extractor, and the code is as follows:

···

from typing import Any
import pandas as pd
import numpy as np
import time
from stable_baselines3 import DDPG
from stable_baselines3 import A2C
from stable_baselines3 import PPO
from stable_baselines3 import TD3
from stable_baselines3 import SAC
from stable_baselines3.common.noise import NormalActionNoise, OrnsteinUhlenbeckActionNoise
from stable_baselines3.common.policies import register_policy, ActorCriticPolicy
from stable_baselines3.common.torch_layers import BaseFeaturesExtractor
import gym
from gym import spaces
import torch as th
import torch.nn as nn
from typing import Callable, Dict, List, Optional, Tuple, Type, Union
from utils import config
from utils.preprocessors import split_data
from utils.env import StockLearningEnv
···

MODELS = {"a2c": A2C, "ddpg": DDPG, "td3": TD3, "sac": SAC, "ppo": PPO}
MODEL_KWARGS = {x: config.dict["{}_PARAMS".format(x.upper())] for x in MODELS.keys()}

NOISE = {
"normal": NormalActionNoise,
"ornstein_uhlenbeck": OrnsteinUhlenbeckActionNoise
}
···This is the class I defined CustomCNN,Because I want to analyze time series data, I use 1-dimensional conv1d. The input data is 13 columns, and the number of input rows is either 1 or 128, which is random.

`class CustomCNN(BaseFeaturesExtractor):
"""
:param observation_space: (gym.Space)
:param features_dim: (int) Number of features extracted.
This corresponds to the number of unit for the last layer.
"""

def init(self, observation_space: gym.spaces.Box, features_dim: int = 1):
super(CustomCNN, self).init(observation_space, features_dim)
# We assume CxHxW images (channels first)
# Re-ordering will be done by pre-preprocessing or wrapper
n_input_channels = observation_space.shape[0]

self.cnn = nn.Sequential(

    nn.Conv1d(self.features_dim, n_input_channels, kernel_size=1, stride=1, padding=0),
    nn.ReLU(),
    nn.Conv1d(n_input_channels, self.features_dim, kernel_size=1, stride=1, padding=0),
    nn.ReLU(),
    nn.Flatten(),
)
    with th.no_grad():
    n_flatten = self.cnn(
        th.as_tensor(observation_space.sample()[None]).float()
    ).shape[1]
self.linear = nn.Sequential(nn.Linear(n_flatten, self.features_dim), nn.Tanh())

···
When observations is [1,1, 13], the referenced example can work normally, but when data is randomly sampled [1128,13], nn.sequential needs to be redefined. I modified the code of the example, but still reported an error.

def forward(self, observations: th.Tensor) -> th.Tensor:
n_flatten = np.array(observations).shape[1]
features_dim = np.array(observations).shape[0]
print(features_dim, n_flatten)
if features_dim != 1:
self.cnn = nn.Sequential(
nn.Conv1d(features_dim, n_flatten, kernel_size=1, stride=1, padding=0),
nn.ReLU(),
nn.Conv1d(n_flatten, features_dim, kernel_size=1, stride=1, padding=0),
nn.ReLU(),
nn.Flatten(),
)
self.linear = nn.Sequential(nn.Linear(n_flatten, features_dim), nn.Tanh())

return self.linear(self.cnn(observations))

·····

policy_kwargs = dict(
features_extractor_class=CustomCNN,
net_arch=dict(qf=[256, 256], pi=[256, 256])
)
····
Here is the definition of get_ Mode function,

def get_model(
self,
model_name: str,
policy: str = "MlpPolicy",
#policy: str = "MultiInputPolicy",
policy_kwargs: dict = policy_kwargs,
model_kwargs: dict = None,
verbose: int = 1
) -> Any:
if model_name not in MODELS:
raise NotImplementedError("NotImplementedError")
if model_kwargs is None:
model_kwargs = MODEL_KWARGS[model_name]
if "action_noise" in model_kwargs:
n_actions = self.env.action_space.shape[-1]
model_kwargs["action_noise"] = NOISE[model_kwargs["action_noise"]](
mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions)
)
model = MODELS[model_name](
policy=policy,
env=self.env,
tensorboard_log="{}/{}".format(config.TENSORBOARD_LOG_DIR, model_name),
verbose=verbose,
policy_kwargs=policy_kwargs,
**model_kwargs
)
return model
···
this is train_model,the erroe in here,model.learn,

def train_model(
self, model: Any, tb_log_name: str, total_timesteps: int = 5000
) -> Any:
"""train model"""
model = model.learn(total_timesteps=total_timesteps, tb_log_name=tb_log_name)
return model
Start testing here, input data, initial environment and model.

if name == "main":
from pull_data import Pull_data
from preprocessors import FeatureEngineer, split_data
from utils import config
import time

# pull data
#df = Pull_data(config.SSE_50[:2], save_data=False).pull_data()
df = Pull_data(config.SSE_50[:2]).pull_data()
df = FeatureEngineer().preprocess_data(df)
df = split_data(df, '2009-01-01', '2019-01-01')
print(df.head())

# 
stock_dimension = len(df.tic.unique()) # 2
state_space = 1 + 2*stock_dimension + \
    len(config.TECHNICAL_INDICATORS_LIST)*stock_dimension # 23 
print("stock_dimension: {}, state_space: {}".format(stock_dimension, state_space))
env_kwargs = {
    #"stock_dim": stock_dimension,
    "hmax": 100, 
    "initial_amount": 1e6, 
    "buy_cost_pct": 0.001,
    "sell_cost_pct": 0.001,
    #"reward_scaling": 1e-4,
    #"state_space": state_space,
    #"action_space": stock_dimension,
    #"tech_indicator_list": config.TECHNICAL_INDICATORS_LIST
}

# test env
e_train_gym = StockLearningEnv(df=df, **env_kwargs)
## mulpt test
observation = e_train_gym.reset()      
count = 0
for t in range(10):
    action = e_train_gym.action_space.sample()  
    observation, reward, done, info = e_train_gym.step(action)  

    if done:
        break
    count+=1
    time.sleep(0.2)      
print("observation: ", observation)
print("action: ", action)
print("reward: {}, done: {},info: {}".format(reward, done, info))

# test model
env_train, _ = e_train_gym.get_sb_env()
print(type(env_train))

##register_policy('CustomPolicy', CustomPolicy)
##register_policy('CustomActorCriticPolicy', CustomActorCriticPolicy)
agent = DRL_Agent(env= env_train)
SAC_PARAMS = {
    "batch_size": 128,
    "buffer_size": 1000000,
    "learning_rate": 0.0001,
    "learning_starts": 100,
    "ent_coef": "auto_0.1"
}
model_sac = agent.get_model("sac", model_kwargs=SAC_PARAMS)
    trained_sac = agent.train_model(
    model=model_sac,
    tb_log_name='sac', 
    total_timesteps= 50000
)

`···
The following is the error message

The error prompt is as follows:
Traceback (most recent call last):
File "C:/Users/Administrator/PycharmProjects/demo/utils/models.py", line 477, in
trained_sac = agent.train_model(
File "C:/Users/Administrator/PycharmProjects/demo/utils/models.py", line 401, in train_model
model = model.learn(total_timesteps=total_timesteps, tb_log_name=tb_log_name)
File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\sac\sac.py", line 292, in learn
return super(SAC, self).learn(
File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\common\off_policy_algorithm.py", line 366, in learn
self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\sac\sac.py", line 206, in train
actions_pi, log_prob = self.actor.action_log_prob(replay_data.observations)
File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\sac\policies.py", line 180, in action_log_prob
mean_actions, log_std, kwargs = self.get_action_dist_params(obs)
File "C:\ProgramData\Anaconda3\lib\site-packages\stable_baselines3\sac\policies.py", line 163, in get_action_dist_params
latent_pi = self.latent_pi(features)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\container.py", line 139, in forward
input = module(input)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x128 and 1x256)

···
Thank you again for your help. I edited the question again according to your request. Because I do what I want to do on the basis of sunnyswag's code, which is to modify the feature extractor to see whether sac, ddpg and td3 can perform better on the stock portfolio. This problem has bothered me for several days.
Thank you again!!!

araffin · 2022-07-28T12:17:02Z

Closing as basic rules for asking for help where not followed despite asking multiple times (#982 (comment))

Zero1366166516 added the bug Something isn't working label Jul 25, 2022

Zero1366166516 changed the title ~~[Bug] bug title~~ It is recommended to give an example of off policy using the feature extractor Jul 25, 2022

araffin mentioned this issue Jul 25, 2022

[Bug] bug title #981

Closed

3 tasks

araffin removed the bug Something isn't working label Jul 28, 2022

araffin mentioned this issue Jul 28, 2022

Value error after trying fix of pip install ale-py==0.7.4 to fix Attribute error during evaluate policy #986

Closed

araffin added the more information needed Please fill the issue template completely label Jul 28, 2022

araffin closed this as completed Jul 28, 2022

qgallouedec mentioned this issue Jul 29, 2022

I have seen the issues#425 problem, but I have new questions about the off policy algorithm. #988

Closed

2 tasks

This was referenced Aug 26, 2022

My Custom env training go for ever #1032

Closed

Model does not get updated when using DDPG and TD3 #1034

Closed

araffin mentioned this issue Oct 31, 2022

Collecting rollout buffer on environment reset rather than after n_steps #1147

Closed

5 tasks

araffin mentioned this issue Jan 13, 2023

[Bug Report] AssertionError: The algorithm (PPO) only supports Discrete as action spaces but Box was provided [bug] #1274

Closed

4 tasks

araffin mentioned this issue Apr 21, 2023

Why agent training gets longer in each loop? #1456

Closed

4 tasks

zapcity mentioned this issue Jul 3, 2023

[Bug]: PPO.load() causes AssertionError #1589

Closed

Harishu1998 mentioned this issue Jul 3, 2023

Why is my model not learning for A2C but working with PPO? #1590

Closed

4 tasks

akane0314 mentioned this issue Jul 4, 2023

[Question] Why does unscaling action behaves differently in training and eval #1592

Closed

4 tasks

anirudhs001 mentioned this issue Jul 5, 2023

log_std filled with NaNs when using PPO with use_sde=True #1593

Closed

5 tasks

zapcity mentioned this issue Jul 5, 2023

[Bug]: AttributeError: 'tuple' object has no attribute 'shape' #1594

Closed

5 tasks

GithubLZI mentioned this issue Jul 5, 2023

[Bug]: bug title About EvalCallback #1595

Closed

5 tasks

koliber31 mentioned this issue Jul 6, 2023

Error while using MaskablePPO in sb3_contrib #1596

Closed

5 tasks

george-adams1 mentioned this issue Jul 6, 2023

[Question] Why does FPS tend to decrease during training? #1597

Closed

4 tasks

bras-p mentioned this issue Jul 7, 2023

Problem with logs and verbose in Windows #1598

Closed

5 tasks

DavidLudl mentioned this issue Oct 22, 2024

[Question] Batch Size Selection for a Finite MDP #2024

Closed

4 tasks

chrisgao99 mentioned this issue Oct 22, 2024

[Bug]: How to avoid saving an external LLM model while saving a cutomized dqn policy #2025

Closed

5 tasks

didu11 mentioned this issue Oct 25, 2024

[Bug]: When using SubprocVecEnv for parallel training of agents, the ep_rew_mean is no longer being recorded. #2027

Closed

5 tasks

jim-rothrock mentioned this issue Oct 25, 2024

[Question] Do I need to have gym installed in addition to gymnasium in order to run the lunar lander example? DLR-RM/rl-baselines3-zoo#473

Closed

5 tasks

ghost mentioned this issue Oct 26, 2024

[Question] How to customize the loss calculation for PPO #2028

Closed

4 tasks

Ssstirm mentioned this issue Oct 29, 2024

[Question] Can I load a RL model with Mac trained on windows platform? #2029

Closed

4 tasks

SummerDiver mentioned this issue Nov 1, 2024

[Question] Can a model be used in environments with different observation_space sizes? #2031

Closed

4 tasks

CAI23sbP mentioned this issue Nov 4, 2024

How can i change a Distribution? #2032

Closed

4 tasks

bajramienes mentioned this issue Nov 4, 2024

[Question] Error Installing stable-baselines3[extra] on Windows #2033

Closed

4 tasks

Feelfeel20088 mentioned this issue Nov 5, 2024

[Bug] importing stable baselines 3 on linux and windows directory issue #2034

Closed

5 tasks

felix-basiliskroko mentioned this issue Nov 7, 2024

[Question] Cannot reproduce results of "EvalCallback" gathered during training. #2036

Closed

4 tasks

tesla-cat mentioned this issue Nov 9, 2024

[Bug]: unable to learn MountainCarContinuous-v0 #2038

Closed

5 tasks

hsaseendran mentioned this issue Nov 18, 2024

[Question] Question on scaling: I am using GCN as a FE to train on a small observation space and test it on a larger observation space #2042

Closed

4 tasks

SachinVashisth mentioned this issue Nov 20, 2024

Issue in forward(....) function of class ActorCriticPolicy while working on Custom Gym Environment. #2043

Closed

5 tasks

abhinavj98 mentioned this issue Nov 21, 2024

[Question] Not updating lstm states during training Stable-Baselines-Team/stable-baselines3-contrib#265

Open

4 tasks

suargi mentioned this issue Nov 29, 2024

[Question] Modify max_episode_steps DLR-RM/rl-baselines3-zoo#478

Closed

5 tasks

JoshuaBluem mentioned this issue Dec 3, 2024

[Bug]: Using the start value of Discrete spaces has no effect #2052

Closed

5 tasks

itwasabhi mentioned this issue Dec 3, 2024

[Bug]: Error on utils.get_system_info #2053

Closed

5 tasks

OliverUrbann mentioned this issue Dec 13, 2024

[Bug]: Video upload to wandb broken since 2.4.0 #2055

Open

5 tasks

JaMueDFKI mentioned this issue Dec 13, 2024

[Bug]: Last step in environment will not be saved to replay buffer, when using a callback #2056

Closed

5 tasks

DebanganMandal mentioned this issue Dec 16, 2024

[Bug]: Action Space mismatch, in vec_env.step(action), but action.shape outputs required shape #2057

Closed

5 tasks

suargi mentioned this issue Dec 16, 2024

[Bug]: FrameStack and VecNormalize DLR-RM/rl-baselines3-zoo#480

Closed

5 tasks

curtiscjohnson mentioned this issue Dec 18, 2024

[Bug]: VecVideoRecorder overwrites previous video at each save #2061

Closed

5 tasks

jmSNU mentioned this issue Dec 19, 2024

[Bug]: Assistance Required with Bottleneck in NatureCNN.forward() on GPU #2062

Closed

5 tasks

sunweice mentioned this issue Dec 23, 2024

[Bug]: DDPG seems unable to solve the MountainCarContinuous-v0 problem. DLR-RM/rl-baselines3-zoo#482

Closed

5 tasks

luchungi mentioned this issue Jan 5, 2025

[Question] Vectorized custom environments that output (num_envs, obs_size) without stacking #2066

Closed

4 tasks

chrisgao99 mentioned this issue Jan 13, 2025

[Bug]: NameNotFound: Environment PongNoFrameskip doesn't exist. #2070

Closed

5 tasks

drulye mentioned this issue Jan 15, 2025

[Bug]: in "RecurrentPPO" not work "model.policy.evaluate_actions()" Stable-Baselines-Team/stable-baselines3-contrib#270

Open

lesonglam mentioned this issue Jan 26, 2025

How to reset the Agent before the rollout ? #2073

Open

4 tasks

finnoshea mentioned this issue Feb 1, 2025

[Question] Tensorboard stops recording early with no error/reason given #2078

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It is recommended to give an example of off policy using the feature extractor #982

It is recommended to give an example of off policy using the feature extractor #982

Zero1366166516 commented Jul 25, 2022 •

edited

Loading

qgallouedec commented Jul 25, 2022 •

edited

Loading

araffin commented Jul 25, 2022

Zero1366166516 commented Jul 26, 2022

qgallouedec commented Jul 27, 2022 •

edited

Loading

Zero1366166516 commented Jul 27, 2022 •

edited

Loading

qgallouedec commented Jul 27, 2022

Zero1366166516 commented Jul 28, 2022

araffin commented Jul 28, 2022

It is recommended to give an example of off policy using the feature extractor #982

It is recommended to give an example of off policy using the feature extractor #982

Comments

Zero1366166516 commented Jul 25, 2022 • edited Loading

🐛 Bug

To Reproduce

Expected behavior

Additional context

Checklist

qgallouedec commented Jul 25, 2022 • edited Loading

araffin commented Jul 25, 2022

Zero1366166516 commented Jul 26, 2022

qgallouedec commented Jul 27, 2022 • edited Loading

Zero1366166516 commented Jul 27, 2022 • edited Loading

qgallouedec commented Jul 27, 2022

Zero1366166516 commented Jul 28, 2022

araffin commented Jul 28, 2022

Zero1366166516 commented Jul 25, 2022 •

edited

Loading

qgallouedec commented Jul 25, 2022 •

edited

Loading

qgallouedec commented Jul 27, 2022 •

edited

Loading

Zero1366166516 commented Jul 27, 2022 •

edited

Loading