Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Continuous action #24

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Bene94
Copy link

@Bene94 Bene94 commented Jan 16, 2024

Implements support for ContiuousAction head. It also includes a new testing environment for Continuous actions called a continuous slider.

Copy link
Contributor

@cswinter cswinter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for long wait, last few weeks have been very busy for me!

Other than the smaller comments, there's two main changes I would like to see:

  • The environment should not be aware of the int64 representation and just receive floats from specified interval
  • There should be a test case for continuous actions in test_environment.py

Defines one continuous action that can be taken by multiple entities.
"""

index_to_label: List[str]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there a discrete set of labels?

"""mapping from action indices to human readable labels"""

@property
def labels(self) -> List[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably doesn't make sense for continuous action unless i'm missing something?

actors: Sequence[EntityID]
"""the ids of the entities that chose the actions"""

values: npt.NDArray[np.int64]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably have the action converted to a float here

@@ -20,6 +20,19 @@ def __len__(self) -> int:
return len(self.index_to_label)


@dataclass
class ContinuousActionSpace:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this have a min/max value?

@@ -45,4 +49,5 @@
"MineSweeper",
"RockPaperScissors",
"TreasureHunt",
"Slider",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing "ContinuousSlider"?

@dataclass
class ContinuousSlider(Environment):
"""
On each timestep, there is either a generic "Object" entity with a `is_hotdog` property, or a "Hotdog" object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update comment


a = actions["set"]
assert isinstance(a, ContinuousAction), f"{a} is not a CategoricalAction"
# divide index by max int64 to get a float between 0 and 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

misleading comment? this should also be done automatically by the framework

class Slider(Environment):
"""
On each timestep, there is either a generic "Object" entity with a `is_hotdog` property, or a "Hotdog" object.
The "Player" entity is always present, and has an action to classify the other entity as hotdog or not hotdog.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update comment

"""
On each timestep, there is either a generic "Object" entity with a `is_hotdog` property, or a "Hotdog" object.
The "Player" entity is always present, and has an action to classify the other entity as hotdog or not hotdog.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could probably unify the slider and continuous slider environments by passing in a bool that determines whether we use continuous or discrete action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants