Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integration of webcanvas evaluation and mind2web-live test set into browsergym #180

Closed
wants to merge 0 commits into from

Conversation

han032206
Copy link

No description provided.

Copy link
Collaborator

@gasse gasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @han032206 for this first draft! I have one main concern about validation, which I think can be done much more robustly using task-specific JS event handlers (see my detailed comments). Can you investigate this?

browsergym/core/src/browsergym/core/env.py Outdated Show resolved Hide resolved
browsergym/core/src/browsergym/core/env.py Outdated Show resolved Hide resolved
browsergym/webcanvas/pyproject.toml Outdated Show resolved Hide resolved
browsergym/webcanvas/requirements.txt Outdated Show resolved Hide resolved
browsergym/webcanvas/src/browsergym/webcanvas/__init__.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@gasse gasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not needed for the environment to pass the last_action to the task. Also there is no need to patch the BrowserEnv class, everything can be done from within the WebCanvasTask class. See my detailed comments.

@@ -264,6 +265,14 @@ def override_property(task, env, property):
self.context.expose_binding(
"browsergym_page_activated", lambda source: self._activate_page_from_js(source["page"])
)

self.context.expose_binding(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to modify the BrowserEnv class. This can be done within task.setup(page), using

page.context.expose_binding()

This one could also be useful, so that your JS code is executed on all pages visited by the agent (not just the page available at the time of setup())

page.context.add_init_script()

@@ -390,6 +399,10 @@ def report_infeasible_instructions(reason: str):
self.chat.add_message(role="infeasible", msg=reason)
self.infeasible_message_received = True

if hasattr(self.task, "webcanvas"):
logger.debug(f"Initiating webcanvas task event listen")
self._event_listener()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this can be done from within the GenericWebCanvasTask class, during task.setup(page).

@@ -584,3 +601,103 @@ def _get_obs(self):
}

return obs

def _event_listener(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, should be moved to GenericWebCanvasTask

return True

def validate(
self, page: playwright.sync_api.Page, chat_messages: list[str], action: str = ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for action here.

reward, done, user_message, info = self.task.validate(self.page, self.chat.messages)

reward, done, user_message, info = self.task.validate(
self.page, self.chat.messages, self.last_action
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for this. The idea of using Javascript to register actions executed in the browser is so that there is no need any more for the environment to give the executed last_action to the task. The BrowserGym last action could be bid-based or coord-based, which does not match the XPath-based validation of WebCanvas, as far as I understand. By using an event listener + playwright callback instead, the task can keep track of all the JS events and validate that they correspond to the task's objectives.

Suggested change
self.page, self.chat.messages, self.last_action
self.page, self.chat.messages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants