Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Goal refactor to allow for local image files #110

Merged
merged 42 commits into from
Oct 8, 2024
Merged

Goal refactor to allow for local image files #110

merged 42 commits into from
Oct 8, 2024

Conversation

ljang0
Copy link
Collaborator

@ljang0 ljang0 commented Aug 9, 2024

@ThibaultLSDC can you check this?

@ljang0 ljang0 changed the title test code Downloading Images to TMP file when Task is Instantiated Aug 9, 2024
browsergym/core/src/browsergym/core/env.py Outdated Show resolved Hide resolved
@gasse gasse changed the title Downloading Images to TMP file when Task is Instantiated Goal refactor to allow for local image files Sep 19, 2024
@gasse gasse requested a review from ThibaultLSDC September 19, 2024 19:59
Copy link
Collaborator

@recursix recursix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, But I'm really disappointed in the number of unit tests :p you will regret it.

browsergym/experiments/src/browsergym/experiments/loop.py Outdated Show resolved Hide resolved
browsergym/core/src/browsergym/core/env.py Outdated Show resolved Hide resolved
browsergym/experiments/src/browsergym/experiments/loop.py Outdated Show resolved Hide resolved
recursix
recursix previously approved these changes Sep 25, 2024
}
)
elif msg["role"] == "user_image":
system_msgs.append({"type": "image_url", "image_url": msg["message"]})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
system_msgs.append({"type": "image_url", "image_url": msg["message"]})
user_msgs.append({"type": "image_url", "image_url": msg["message"]})

?

@gasse gasse merged commit 9dd62dd into main Oct 8, 2024
9 of 11 checks passed
@gasse gasse deleted the image_download branch October 8, 2024 15:12
qipeng pushed a commit to orby-ai-engineering/BrowserGym that referenced this pull request Nov 20, 2024
---------

Co-authored-by: Thibault Le Sellier de Chezelles <[email protected]>
Co-authored-by: Maxime Gasse <[email protected]>
qipeng added a commit to orby-ai-engineering/BrowserGym that referenced this pull request Jan 18, 2025
* Patch VWA task IDs

* Add BLIP2 evaluator; patch timeout

* Actually add the captioning_fn into evaluator constructor

* downgrading ubuntu version for github tests (ServiceNow#179)

* making webarena tests not run on PRs (ServiceNow#181)

* making webarena tests not run on PRs

* making visualwebarena tests not run on PRs

* SoM bugfix (ServiceNow#185)

* version bump v0.8.1

* workflow image downgrade: ubuntu-latest -> ubuntu-22.04

* support custom observation

* add user data dir

* Benchmarks (ServiceNow#173)

* new ControlOrMeta key modifier (ServiceNow#187)

* Multi-tab fix (ServiceNow#188)

* Global demo_mode flag (ServiceNow#177)

* HighLevelActionSetArgs default value (ServiceNow#191)

* version bump v0.9.0

* Reverting workarena_l1 benchmark to original seed sampling (ServiceNow#198)

* Benchmarks update (ServiceNow#197)

* Miniwob number of seeds 10 -> 5

* remove most benchmark variants

---------

Co-authored-by: Maxime Gasse <[email protected]>

* New benchmark AssistantBench (ServiceNow#186)


---------

Co-authored-by: Maxime Gasse <[email protected]>

* Default `browsergym_split` metadata for every benchmark (ServiceNow#190)


---------

Co-authored-by: Xing Han Lu <[email protected]>
Co-authored-by: ljang0 <[email protected]>
Co-authored-by: Megh Thakkar <[email protected]>

* Fixing logging with multiple jobs (ServiceNow#182)

* Benchmark updates (ServiceNow#199)


---------

Co-authored-by: Maxime Gasse <[email protected]>

* version bump 0.10.0

* README update (ServiceNow#200)

* Train / test splits for workarena-l2/l3 (ServiceNow#203)

* Fine-grained benchmark action sets (ServiceNow#202)

* version bump v0.10.1

* Update README.md

* Update README.md

* Benchmark.prepare_backend() (ServiceNow#204)

* save_step_info bugfix (obs=None) (ServiceNow#207)

* version bump v0.10.2

* full_reset fixes (ServiceNow#209)

* Hide all bids from obs (ServiceNow#212)

* Adding weblinx config to DEFAULT_BENCHMARKS (ServiceNow#208)


---------

Co-authored-by: Maxime Gasse <[email protected]>

* Leaner Unicode() gym space (ServiceNow#218)

* a method to get the status of an experiment (ServiceNow#219)

* version bump v0.11.0

* Rename benchmark after subset_from_split() (ServiceNow#221)

* exp_dir sanitization (ServiceNow#222)

* get_step_info() bugfix (ServiceNow#223)

* Set webarena / visualwebarena max steps to 30 (ServiceNow#214)

* Benchmark dependencies (ServiceNow#220)

* Include nltk.download() in benchmark.prepare_backend() for webarena / visualwebarena (ServiceNow#224)

* version bump v0.11.1

* ExpResult.status minor fix (ServiceNow#225)

* version bump 0.11.2

* Fix duplicate depends_on in webarena metadata (ServiceNow#228)

* Duplicate webarena dependencies fix (ServiceNow#229)

* nltk.download() during import for webarena and visualwebarena (ServiceNow#227)

* Refactor full_reset() for webarena / visualwebarena (ServiceNow#230)

* webarena_tiny (ServiceNow#232)

* Set ExpArgs.exp_id at post-init time (ServiceNow#231)

* Remove ARIA extraction warnings (ServiceNow#233)

* Update README.md

* Update README.md

* Update README.md

* version bump v0.11.3

* ci tests fix (ServiceNow#234)

* Benchmark update for weblinx (ServiceNow#235)

* Refactor ExpArgs.exp_id generation (ServiceNow#236)

* VisualWebArena task dependencies (ServiceNow#237)

* VWA dependencies fix (ServiceNow#239)

* VWA evaluator fix, missing captioning_fn (ServiceNow#240)

* version bump v0.12.0

* Update README.md

* VWA hide huggingface progress bar (ServiceNow#241)

* WebLINX pre-download data in prepare_backend() (ServiceNow#226)

* AssistantBench + WebLINX fixes (ServiceNow#244)

* Increase assistantbench max_steps to 30

* Setting AssistantBench locale and timezone

* Dedicated AssistantBench action set

* small fix

* missing change

* Lenient frame marking in last retry (ServiceNow#245)

* WA / VWA default action set update (ServiceNow#247)

* version bump v0.13.0

* visualwebarena massage (ServiceNow#248)

* Minor fix (ServiceNow#250)

* Remove gym warnings "obs not within observation space" (ServiceNow#251)

* Lower trace level info -> debug (ServiceNow#252)

* Make env.close() usable after failure (finally block) (ServiceNow#253)

* add init script support

* VWA / WA updates (ServiceNow#254)

* Minor refactors (ServiceNow#255)

* Optional method AbstractBrowserTask.teardown()

* browsergym registration refactor

* Deal with problematic frame unmarking (ServiceNow#256)

* Add missing property exception to _get_obs() retry (ServiceNow#258)

* Bump libwebarena / libvisualwebarena dependencies (ServiceNow#257)

* Massage WebArena instance (ServiceNow#259)

* Refactor AssistantBench output directories (ServiceNow#242)


---------

Co-authored-by: Maxime Gasse <[email protected]>

* version bump v0.13.1

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Authors update (ServiceNow#260)

* TapeAgents export for experiment results (ServiceNow#238)

* Update README.md

* Cleanup

* Add weblinx_browsergym as a dependency (ServiceNow#261)

* Typo fix (ServiceNow#264)

* Update requirements.txt to latest libvisualwebarena package that includes local hosting (ServiceNow#165)

* adding AgentInfo to __init__ for convenience (ServiceNow#166)

* libvisualwebarena==0.0.14 (ServiceNow#171)

fixed the jsons file!

* Leaner traces (ServiceNow#169)

* images aren't saved in pkl files anymore, and are stuffed back in at load time

* added kwargs to control img/som saving

* saving as png, adding screenshots back into obs

* retrocompatibility for image loading

* making get_screenshots work for png and jpg

* fixing image types and closing files

* Goal refactor to allow for local image files (ServiceNow#110)


---------

Co-authored-by: Thibault Le Sellier de Chezelles <[email protected]>
Co-authored-by: Maxime Gasse <[email protected]>

* version bump 0.8.0

* Integrate AgentLab tests (ServiceNow#176)

* downgrading ubuntu version for github tests (ServiceNow#179)

* making webarena tests not run on PRs (ServiceNow#181)

* making webarena tests not run on PRs

* making visualwebarena tests not run on PRs

* SoM bugfix (ServiceNow#185)

* version bump v0.8.1

* workflow image downgrade: ubuntu-latest -> ubuntu-22.04

* Benchmarks (ServiceNow#173)

* new ControlOrMeta key modifier (ServiceNow#187)

* Multi-tab fix (ServiceNow#188)

* Global demo_mode flag (ServiceNow#177)

* HighLevelActionSetArgs default value (ServiceNow#191)

* version bump v0.9.0

* Reverting workarena_l1 benchmark to original seed sampling (ServiceNow#198)

* Benchmarks update (ServiceNow#197)

* Miniwob number of seeds 10 -> 5

* remove most benchmark variants

---------

Co-authored-by: Maxime Gasse <[email protected]>

* New benchmark AssistantBench (ServiceNow#186)


---------

Co-authored-by: Maxime Gasse <[email protected]>

* Default `browsergym_split` metadata for every benchmark (ServiceNow#190)


---------

Co-authored-by: Xing Han Lu <[email protected]>
Co-authored-by: ljang0 <[email protected]>
Co-authored-by: Megh Thakkar <[email protected]>

* Fixing logging with multiple jobs (ServiceNow#182)

* Benchmark updates (ServiceNow#199)


---------

Co-authored-by: Maxime Gasse <[email protected]>

* version bump 0.10.0

* README update (ServiceNow#200)

* Train / test splits for workarena-l2/l3 (ServiceNow#203)

* Fine-grained benchmark action sets (ServiceNow#202)

* version bump v0.10.1

* Update README.md

* Update README.md

* Benchmark.prepare_backend() (ServiceNow#204)

* save_step_info bugfix (obs=None) (ServiceNow#207)

* version bump v0.10.2

* full_reset fixes (ServiceNow#209)

* Hide all bids from obs (ServiceNow#212)

* Adding weblinx config to DEFAULT_BENCHMARKS (ServiceNow#208)


---------

Co-authored-by: Maxime Gasse <[email protected]>

* Leaner Unicode() gym space (ServiceNow#218)

* a method to get the status of an experiment (ServiceNow#219)

* version bump v0.11.0

* Rename benchmark after subset_from_split() (ServiceNow#221)

* exp_dir sanitization (ServiceNow#222)

* get_step_info() bugfix (ServiceNow#223)

* Set webarena / visualwebarena max steps to 30 (ServiceNow#214)

* Benchmark dependencies (ServiceNow#220)

* Include nltk.download() in benchmark.prepare_backend() for webarena / visualwebarena (ServiceNow#224)

* version bump v0.11.1

* ExpResult.status minor fix (ServiceNow#225)

* version bump 0.11.2

* Fix duplicate depends_on in webarena metadata (ServiceNow#228)

* Duplicate webarena dependencies fix (ServiceNow#229)

* nltk.download() during import for webarena and visualwebarena (ServiceNow#227)

* Refactor full_reset() for webarena / visualwebarena (ServiceNow#230)

* webarena_tiny (ServiceNow#232)

* Set ExpArgs.exp_id at post-init time (ServiceNow#231)

* Remove ARIA extraction warnings (ServiceNow#233)

* Update README.md

* Update README.md

* Update README.md

* version bump v0.11.3

* ci tests fix (ServiceNow#234)

* Benchmark update for weblinx (ServiceNow#235)

* Refactor ExpArgs.exp_id generation (ServiceNow#236)

* VisualWebArena task dependencies (ServiceNow#237)

* VWA dependencies fix (ServiceNow#239)

* VWA evaluator fix, missing captioning_fn (ServiceNow#240)

* version bump v0.12.0

* Update README.md

* VWA hide huggingface progress bar (ServiceNow#241)

* WebLINX pre-download data in prepare_backend() (ServiceNow#226)

* AssistantBench + WebLINX fixes (ServiceNow#244)

* Increase assistantbench max_steps to 30

* Setting AssistantBench locale and timezone

* Dedicated AssistantBench action set

* small fix

* missing change

* Lenient frame marking in last retry (ServiceNow#245)

* WA / VWA default action set update (ServiceNow#247)

* version bump v0.13.0

* visualwebarena massage (ServiceNow#248)

* Minor fix (ServiceNow#250)

* Remove gym warnings "obs not within observation space" (ServiceNow#251)

* Lower trace level info -> debug (ServiceNow#252)

* Make env.close() usable after failure (finally block) (ServiceNow#253)

* VWA / WA updates (ServiceNow#254)

* Minor refactors (ServiceNow#255)

* Optional method AbstractBrowserTask.teardown()

* browsergym registration refactor

* Deal with problematic frame unmarking (ServiceNow#256)

* Add missing property exception to _get_obs() retry (ServiceNow#258)

* Bump libwebarena / libvisualwebarena dependencies (ServiceNow#257)

* Massage WebArena instance (ServiceNow#259)

* Refactor AssistantBench output directories (ServiceNow#242)


---------

Co-authored-by: Maxime Gasse <[email protected]>

* version bump v0.13.1

* Fix broken links

* Update README.md

* fix merging issues

* Update README.md (ServiceNow#270)

* Update README.md

* README update

* More permissive WA/VWA instance reset (ServiceNow#272)

* New debug benchmark visualwebarena_tiny (ServiceNow#271)

* Version bump v0.13.2

* Update README.md

* Metadata column fix (ServiceNow#278)

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Shunt WA / VWA unit tests

* README update

* Minor fixes (ServiceNow#281)

* version bump v0.13.3

* remove unused fluff

* revert more unintended changes

---------

Co-authored-by: Peng Qi <[email protected]>
Co-authored-by: Thibault LSDC <[email protected]>
Co-authored-by: Maxime Gasse <[email protected]>
Co-authored-by: Yanan Xie <[email protected]>
Co-authored-by: Alexandre Lacoste <[email protected]>
Co-authored-by: oriyor <[email protected]>
Co-authored-by: Xing Han Lu <[email protected]>
Co-authored-by: ljang0 <[email protected]>
Co-authored-by: Megh Thakkar <[email protected]>
Co-authored-by: Imene Kerboua <[email protected]>
Co-authored-by: Oleh Shliazhko <[email protected]>
Co-authored-by: Thibault Le Sellier de Chezelles <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants