integration of webcanvas evaluation and mind2web-live test set into browsergym #180

han032206 · 2024-10-13T14:37:28Z

No description provided.

gasse

Thanks @han032206 for this first draft! I have one main concern about validation, which I think can be done much more robustly using task-specific JS event handlers (see my detailed comments). Can you investigate this?

browsergym/webcanvas/src/browsergym/webcanvas/semantic_match/openai.py

browsergym/core/src/browsergym/core/env.py

browsergym/webcanvas/pyproject.toml

browsergym/webcanvas/requirements.txt

browsergym/webcanvas/src/browsergym/webcanvas/__init__.py

gasse

There is not needed for the environment to pass the last_action to the task. Also there is no need to patch the BrowserEnv class, everything can be done from within the WebCanvasTask class. See my detailed comments.

gasse · 2024-12-12T20:18:11Z

browsergym/core/src/browsergym/core/env.py

@@ -264,6 +265,14 @@ def override_property(task, env, property):
        self.context.expose_binding(
            "browsergym_page_activated", lambda source: self._activate_page_from_js(source["page"])
        )
+
+        self.context.expose_binding(


There is no need to modify the BrowserEnv class. This can be done within task.setup(page), using

page.context.expose_binding()

This one could also be useful, so that your JS code is executed on all pages visited by the agent (not just the page available at the time of setup())

page.context.add_init_script()

gasse · 2024-12-12T20:18:53Z

browsergym/core/src/browsergym/core/env.py

@@ -390,6 +399,10 @@ def report_infeasible_instructions(reason: str):
            self.chat.add_message(role="infeasible", msg=reason)
            self.infeasible_message_received = True

+        if hasattr(self.task, "webcanvas"):
+            logger.debug(f"Initiating  webcanvas task event listen")
+            self._event_listener()


Same here, this can be done from within the GenericWebCanvasTask class, during task.setup(page).

gasse · 2024-12-12T20:24:03Z

browsergym/core/src/browsergym/core/env.py

@@ -584,3 +601,103 @@ def _get_obs(self):
        }

        return obs
+
+    def _event_listener(self):


Same here, should be moved to GenericWebCanvasTask

gasse · 2024-12-12T20:27:45Z

browsergym/webcanvas/src/browsergym/webcanvas/task.py

+        return True
+
+    def validate(
+        self, page: playwright.sync_api.Page, chat_messages: list[str], action: str = ""


No need for action here.

gasse · 2024-12-12T20:29:36Z

browsergym/core/src/browsergym/core/env.py

-        reward, done, user_message, info = self.task.validate(self.page, self.chat.messages)
-
+        reward, done, user_message, info = self.task.validate(
+            self.page, self.chat.messages, self.last_action


There is no need for this. The idea of using Javascript to register actions executed in the browser is so that there is no need any more for the environment to give the executed last_action to the task. The BrowserGym last action could be bid-based or coord-based, which does not match the XPath-based validation of WebCanvas, as far as I understand. By using an event listener + playwright callback instead, the task can keep track of all the JS events and validate that they correspond to the task's objectives.

Suggested change

self.page, self.chat.messages, self.last_action

self.page, self.chat.messages

gasse requested changes Oct 23, 2024

View reviewed changes

gasse force-pushed the main branch from 6f9872a to 761f21a Compare December 9, 2024 20:56

gasse requested changes Dec 12, 2024

View reviewed changes

han032206 closed this Dec 19, 2024

han032206 force-pushed the main branch from d4cb996 to a77a47e Compare December 19, 2024 16:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integration of webcanvas evaluation and mind2web-live test set into browsergym #180

integration of webcanvas evaluation and mind2web-live test set into browsergym #180

han032206 commented Oct 13, 2024

gasse left a comment

gasse left a comment •

edited

Loading

gasse Dec 12, 2024

gasse Dec 12, 2024

gasse Dec 12, 2024

gasse Dec 12, 2024

gasse Dec 12, 2024

	self.page, self.chat.messages, self.last_action
	self.page, self.chat.messages

integration of webcanvas evaluation and mind2web-live test set into browsergym #180

integration of webcanvas evaluation and mind2web-live test set into browsergym #180

Conversation

han032206 commented Oct 13, 2024

gasse left a comment

Choose a reason for hiding this comment

gasse left a comment • edited Loading

Choose a reason for hiding this comment

gasse Dec 12, 2024

Choose a reason for hiding this comment

gasse Dec 12, 2024

Choose a reason for hiding this comment

gasse Dec 12, 2024

Choose a reason for hiding this comment

gasse Dec 12, 2024

Choose a reason for hiding this comment

gasse Dec 12, 2024

Choose a reason for hiding this comment

gasse left a comment •

edited

Loading