-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integration of webcanvas evaluation and mind2web-live test set into browsergym #180
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @han032206 for this first draft! I have one main concern about validation, which I think can be done much more robustly using task-specific JS event handlers (see my detailed comments). Can you investigate this?
browsergym/webcanvas/src/browsergym/webcanvas/semantic_match/openai.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not needed for the environment to pass the last_action
to the task. Also there is no need to patch the BrowserEnv class, everything can be done from within the WebCanvasTask class. See my detailed comments.
@@ -264,6 +265,14 @@ def override_property(task, env, property): | |||
self.context.expose_binding( | |||
"browsergym_page_activated", lambda source: self._activate_page_from_js(source["page"]) | |||
) | |||
|
|||
self.context.expose_binding( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to modify the BrowserEnv class. This can be done within task.setup(page)
, using
page.context.expose_binding()
This one could also be useful, so that your JS code is executed on all pages visited by the agent (not just the page available at the time of setup())
page.context.add_init_script()
@@ -390,6 +399,10 @@ def report_infeasible_instructions(reason: str): | |||
self.chat.add_message(role="infeasible", msg=reason) | |||
self.infeasible_message_received = True | |||
|
|||
if hasattr(self.task, "webcanvas"): | |||
logger.debug(f"Initiating webcanvas task event listen") | |||
self._event_listener() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, this can be done from within the GenericWebCanvasTask class, during task.setup(page)
.
@@ -584,3 +601,103 @@ def _get_obs(self): | |||
} | |||
|
|||
return obs | |||
|
|||
def _event_listener(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, should be moved to GenericWebCanvasTask
return True | ||
|
||
def validate( | ||
self, page: playwright.sync_api.Page, chat_messages: list[str], action: str = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for action
here.
reward, done, user_message, info = self.task.validate(self.page, self.chat.messages) | ||
|
||
reward, done, user_message, info = self.task.validate( | ||
self.page, self.chat.messages, self.last_action |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need for this. The idea of using Javascript to register actions executed in the browser is so that there is no need any more for the environment to give the executed last_action
to the task. The BrowserGym last action could be bid-based or coord-based, which does not match the XPath-based validation of WebCanvas, as far as I understand. By using an event listener + playwright callback instead, the task can keep track of all the JS events and validate that they correspond to the task's objectives.
self.page, self.chat.messages, self.last_action | |
self.page, self.chat.messages |
No description provided.