Is actions_prob calculation correct? #7

sandwriter · 2023-05-08T05:38:19Z

The following code uses the action logit value for the optimal action, and then diff against the log prob of the action from the last actor model iteration. Should we instead pick the action from old_actions instead just max, so that we are comparing the prob for the same action from two iterations?

                # get action log prob
                actions_prob = (
                    torch.softmax(actions_logits, dim=-1).max(dim=-1).values
                )

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is actions_prob calculation correct? #7

Is actions_prob calculation correct? #7

sandwriter commented May 8, 2023

Is actions_prob calculation correct? #7

Is actions_prob calculation correct? #7

Comments

sandwriter commented May 8, 2023