Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sub task benchmark #2

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Sub task benchmark #2

wants to merge 3 commits into from

Conversation

chc012
Copy link

@chc012 chc012 commented Feb 27, 2025

No description provided.

@chc012 chc012 requested a review from sanjari-orb February 27, 2025 20:57
@chc012 chc012 self-assigned this Feb 27, 2025
@@ -144,7 +144,7 @@ def __init__(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, can we undo the changes to other files, so that merging back into browsergym is easier in the future

@@ -69,7 +69,7 @@ def _normalize_number(text: str) -> str:


def _answer_to_bags(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -237,15 +237,15 @@ def override_property(task, env, property):
browser_args = []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

from subtaskbench import Evaluator

self.evaluator = Evaluator(
start_url=self.task_config["start_url"],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm so the start URL field might be a problem for this benchmark, because it will be ephemeral and thus cannot be read from a pre-defined config. One thing we can do let this be an environment id and pass it to a an handler in this file which can return the correct start url (can be left empty for now).
The same applies to the creation of the playwright.sync_api.Page as well.

start_url=self.task_config["start_url"],
goal=self.task_config["goal"],
evaluation_script=self.task_config["evaluation_script"],
expected_output=self.task_config["expected_output"],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expected output field expected to look like? Is this for information seeking tasks? (eg, string_match, fuzzy match type evals)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants