-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Continuous action #24
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for long wait, last few weeks have been very busy for me!
Other than the smaller comments, there's two main changes I would like to see:
- The environment should not be aware of the int64 representation and just receive floats from specified interval
- There should be a test case for continuous actions in
test_environment.py
Defines one continuous action that can be taken by multiple entities. | ||
""" | ||
|
||
index_to_label: List[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is there a discrete set of labels?
"""mapping from action indices to human readable labels""" | ||
|
||
@property | ||
def labels(self) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably doesn't make sense for continuous action unless i'm missing something?
actors: Sequence[EntityID] | ||
"""the ids of the entities that chose the actions""" | ||
|
||
values: npt.NDArray[np.int64] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably have the action converted to a float here
@@ -20,6 +20,19 @@ def __len__(self) -> int: | |||
return len(self.index_to_label) | |||
|
|||
|
|||
@dataclass | |||
class ContinuousActionSpace: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this have a min/max value?
@@ -45,4 +49,5 @@ | |||
"MineSweeper", | |||
"RockPaperScissors", | |||
"TreasureHunt", | |||
"Slider", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing "ContinuousSlider"?
@dataclass | ||
class ContinuousSlider(Environment): | ||
""" | ||
On each timestep, there is either a generic "Object" entity with a `is_hotdog` property, or a "Hotdog" object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update comment
|
||
a = actions["set"] | ||
assert isinstance(a, ContinuousAction), f"{a} is not a CategoricalAction" | ||
# divide index by max int64 to get a float between 0 and 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
misleading comment? this should also be done automatically by the framework
class Slider(Environment): | ||
""" | ||
On each timestep, there is either a generic "Object" entity with a `is_hotdog` property, or a "Hotdog" object. | ||
The "Player" entity is always present, and has an action to classify the other entity as hotdog or not hotdog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update comment
""" | ||
On each timestep, there is either a generic "Object" entity with a `is_hotdog` property, or a "Hotdog" object. | ||
The "Player" entity is always present, and has an action to classify the other entity as hotdog or not hotdog. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could probably unify the slider and continuous slider environments by passing in a bool that determines whether we use continuous or discrete action
Implements support for ContiuousAction head. It also includes a new testing environment for Continuous actions called a continuous slider.