-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sub task benchmark #2
base: main
Are you sure you want to change the base?
Conversation
@@ -144,7 +144,7 @@ def __init__( | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit, can we undo the changes to other files, so that merging back into browsergym is easier in the future
@@ -69,7 +69,7 @@ def _normalize_number(text: str) -> str: | |||
|
|||
|
|||
def _answer_to_bags( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -237,15 +237,15 @@ def override_property(task, env, property): | |||
browser_args = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
from subtaskbench import Evaluator | ||
|
||
self.evaluator = Evaluator( | ||
start_url=self.task_config["start_url"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm so the start URL field might be a problem for this benchmark, because it will be ephemeral and thus cannot be read from a pre-defined config. One thing we can do let this be an environment id and pass it to a an handler in this file which can return the correct start url (can be left empty for now).
The same applies to the creation of the playwright.sync_api.Page
as well.
start_url=self.task_config["start_url"], | ||
goal=self.task_config["goal"], | ||
evaluation_script=self.task_config["evaluation_script"], | ||
expected_output=self.task_config["expected_output"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the expected output field expected to look like? Is this for information seeking tasks? (eg, string_match, fuzzy match type evals)
No description provided.