Minor fix #250

gasse · 2024-11-08T15:55:09Z

No description provided.

* Patch VWA task IDs * Add BLIP2 evaluator; patch timeout * Actually add the captioning_fn into evaluator constructor * downgrading ubuntu version for github tests (ServiceNow#179) * making webarena tests not run on PRs (ServiceNow#181) * making webarena tests not run on PRs * making visualwebarena tests not run on PRs * SoM bugfix (ServiceNow#185) * version bump v0.8.1 * workflow image downgrade: ubuntu-latest -> ubuntu-22.04 * support custom observation * add user data dir * Benchmarks (ServiceNow#173) * new ControlOrMeta key modifier (ServiceNow#187) * Multi-tab fix (ServiceNow#188) * Global demo_mode flag (ServiceNow#177) * HighLevelActionSetArgs default value (ServiceNow#191) * version bump v0.9.0 * Reverting workarena_l1 benchmark to original seed sampling (ServiceNow#198) * Benchmarks update (ServiceNow#197) * Miniwob number of seeds 10 -> 5 * remove most benchmark variants --------- Co-authored-by: Maxime Gasse <[email protected]> * New benchmark AssistantBench (ServiceNow#186) --------- Co-authored-by: Maxime Gasse <[email protected]> * Default `browsergym_split` metadata for every benchmark (ServiceNow#190) --------- Co-authored-by: Xing Han Lu <[email protected]> Co-authored-by: ljang0 <[email protected]> Co-authored-by: Megh Thakkar <[email protected]> * Fixing logging with multiple jobs (ServiceNow#182) * Benchmark updates (ServiceNow#199) --------- Co-authored-by: Maxime Gasse <[email protected]> * version bump 0.10.0 * README update (ServiceNow#200) * Train / test splits for workarena-l2/l3 (ServiceNow#203) * Fine-grained benchmark action sets (ServiceNow#202) * version bump v0.10.1 * Update README.md * Update README.md * Benchmark.prepare_backend() (ServiceNow#204) * save_step_info bugfix (obs=None) (ServiceNow#207) * version bump v0.10.2 * full_reset fixes (ServiceNow#209) * Hide all bids from obs (ServiceNow#212) * Adding weblinx config to DEFAULT_BENCHMARKS (ServiceNow#208) --------- Co-authored-by: Maxime Gasse <[email protected]> * Leaner Unicode() gym space (ServiceNow#218) * a method to get the status of an experiment (ServiceNow#219) * version bump v0.11.0 * Rename benchmark after subset_from_split() (ServiceNow#221) * exp_dir sanitization (ServiceNow#222) * get_step_info() bugfix (ServiceNow#223) * Set webarena / visualwebarena max steps to 30 (ServiceNow#214) * Benchmark dependencies (ServiceNow#220) * Include nltk.download() in benchmark.prepare_backend() for webarena / visualwebarena (ServiceNow#224) * version bump v0.11.1 * ExpResult.status minor fix (ServiceNow#225) * version bump 0.11.2 * Fix duplicate depends_on in webarena metadata (ServiceNow#228) * Duplicate webarena dependencies fix (ServiceNow#229) * nltk.download() during import for webarena and visualwebarena (ServiceNow#227) * Refactor full_reset() for webarena / visualwebarena (ServiceNow#230) * webarena_tiny (ServiceNow#232) * Set ExpArgs.exp_id at post-init time (ServiceNow#231) * Remove ARIA extraction warnings (ServiceNow#233) * Update README.md * Update README.md * Update README.md * version bump v0.11.3 * ci tests fix (ServiceNow#234) * Benchmark update for weblinx (ServiceNow#235) * Refactor ExpArgs.exp_id generation (ServiceNow#236) * VisualWebArena task dependencies (ServiceNow#237) * VWA dependencies fix (ServiceNow#239) * VWA evaluator fix, missing captioning_fn (ServiceNow#240) * version bump v0.12.0 * Update README.md * VWA hide huggingface progress bar (ServiceNow#241) * WebLINX pre-download data in prepare_backend() (ServiceNow#226) * AssistantBench + WebLINX fixes (ServiceNow#244) * Increase assistantbench max_steps to 30 * Setting AssistantBench locale and timezone * Dedicated AssistantBench action set * small fix * missing change * Lenient frame marking in last retry (ServiceNow#245) * WA / VWA default action set update (ServiceNow#247) * version bump v0.13.0 * visualwebarena massage (ServiceNow#248) * Minor fix (ServiceNow#250) * Remove gym warnings "obs not within observation space" (ServiceNow#251) * Lower trace level info -> debug (ServiceNow#252) * Make env.close() usable after failure (finally block) (ServiceNow#253) * add init script support * VWA / WA updates (ServiceNow#254) * Minor refactors (ServiceNow#255) * Optional method AbstractBrowserTask.teardown() * browsergym registration refactor * Deal with problematic frame unmarking (ServiceNow#256) * Add missing property exception to _get_obs() retry (ServiceNow#258) * Bump libwebarena / libvisualwebarena dependencies (ServiceNow#257) * Massage WebArena instance (ServiceNow#259) * Refactor AssistantBench output directories (ServiceNow#242) --------- Co-authored-by: Maxime Gasse <[email protected]> * version bump v0.13.1 * Update README.md * Update README.md * Update README.md * Update README.md * Authors update (ServiceNow#260) * TapeAgents export for experiment results (ServiceNow#238) * Update README.md * Cleanup * Add weblinx_browsergym as a dependency (ServiceNow#261) * Typo fix (ServiceNow#264) * Update requirements.txt to latest libvisualwebarena package that includes local hosting (ServiceNow#165) * adding AgentInfo to __init__ for convenience (ServiceNow#166) * libvisualwebarena==0.0.14 (ServiceNow#171) fixed the jsons file! * Leaner traces (ServiceNow#169) * images aren't saved in pkl files anymore, and are stuffed back in at load time * added kwargs to control img/som saving * saving as png, adding screenshots back into obs * retrocompatibility for image loading * making get_screenshots work for png and jpg * fixing image types and closing files * Goal refactor to allow for local image files (ServiceNow#110) --------- Co-authored-by: Thibault Le Sellier de Chezelles <[email protected]> Co-authored-by: Maxime Gasse <[email protected]> * version bump 0.8.0 * Integrate AgentLab tests (ServiceNow#176) * downgrading ubuntu version for github tests (ServiceNow#179) * making webarena tests not run on PRs (ServiceNow#181) * making webarena tests not run on PRs * making visualwebarena tests not run on PRs * SoM bugfix (ServiceNow#185) * version bump v0.8.1 * workflow image downgrade: ubuntu-latest -> ubuntu-22.04 * Benchmarks (ServiceNow#173) * new ControlOrMeta key modifier (ServiceNow#187) * Multi-tab fix (ServiceNow#188) * Global demo_mode flag (ServiceNow#177) * HighLevelActionSetArgs default value (ServiceNow#191) * version bump v0.9.0 * Reverting workarena_l1 benchmark to original seed sampling (ServiceNow#198) * Benchmarks update (ServiceNow#197) * Miniwob number of seeds 10 -> 5 * remove most benchmark variants --------- Co-authored-by: Maxime Gasse <[email protected]> * New benchmark AssistantBench (ServiceNow#186) --------- Co-authored-by: Maxime Gasse <[email protected]> * Default `browsergym_split` metadata for every benchmark (ServiceNow#190) --------- Co-authored-by: Xing Han Lu <[email protected]> Co-authored-by: ljang0 <[email protected]> Co-authored-by: Megh Thakkar <[email protected]> * Fixing logging with multiple jobs (ServiceNow#182) * Benchmark updates (ServiceNow#199) --------- Co-authored-by: Maxime Gasse <[email protected]> * version bump 0.10.0 * README update (ServiceNow#200) * Train / test splits for workarena-l2/l3 (ServiceNow#203) * Fine-grained benchmark action sets (ServiceNow#202) * version bump v0.10.1 * Update README.md * Update README.md * Benchmark.prepare_backend() (ServiceNow#204) * save_step_info bugfix (obs=None) (ServiceNow#207) * version bump v0.10.2 * full_reset fixes (ServiceNow#209) * Hide all bids from obs (ServiceNow#212) * Adding weblinx config to DEFAULT_BENCHMARKS (ServiceNow#208) --------- Co-authored-by: Maxime Gasse <[email protected]> * Leaner Unicode() gym space (ServiceNow#218) * a method to get the status of an experiment (ServiceNow#219) * version bump v0.11.0 * Rename benchmark after subset_from_split() (ServiceNow#221) * exp_dir sanitization (ServiceNow#222) * get_step_info() bugfix (ServiceNow#223) * Set webarena / visualwebarena max steps to 30 (ServiceNow#214) * Benchmark dependencies (ServiceNow#220) * Include nltk.download() in benchmark.prepare_backend() for webarena / visualwebarena (ServiceNow#224) * version bump v0.11.1 * ExpResult.status minor fix (ServiceNow#225) * version bump 0.11.2 * Fix duplicate depends_on in webarena metadata (ServiceNow#228) * Duplicate webarena dependencies fix (ServiceNow#229) * nltk.download() during import for webarena and visualwebarena (ServiceNow#227) * Refactor full_reset() for webarena / visualwebarena (ServiceNow#230) * webarena_tiny (ServiceNow#232) * Set ExpArgs.exp_id at post-init time (ServiceNow#231) * Remove ARIA extraction warnings (ServiceNow#233) * Update README.md * Update README.md * Update README.md * version bump v0.11.3 * ci tests fix (ServiceNow#234) * Benchmark update for weblinx (ServiceNow#235) * Refactor ExpArgs.exp_id generation (ServiceNow#236) * VisualWebArena task dependencies (ServiceNow#237) * VWA dependencies fix (ServiceNow#239) * VWA evaluator fix, missing captioning_fn (ServiceNow#240) * version bump v0.12.0 * Update README.md * VWA hide huggingface progress bar (ServiceNow#241) * WebLINX pre-download data in prepare_backend() (ServiceNow#226) * AssistantBench + WebLINX fixes (ServiceNow#244) * Increase assistantbench max_steps to 30 * Setting AssistantBench locale and timezone * Dedicated AssistantBench action set * small fix * missing change * Lenient frame marking in last retry (ServiceNow#245) * WA / VWA default action set update (ServiceNow#247) * version bump v0.13.0 * visualwebarena massage (ServiceNow#248) * Minor fix (ServiceNow#250) * Remove gym warnings "obs not within observation space" (ServiceNow#251) * Lower trace level info -> debug (ServiceNow#252) * Make env.close() usable after failure (finally block) (ServiceNow#253) * VWA / WA updates (ServiceNow#254) * Minor refactors (ServiceNow#255) * Optional method AbstractBrowserTask.teardown() * browsergym registration refactor * Deal with problematic frame unmarking (ServiceNow#256) * Add missing property exception to _get_obs() retry (ServiceNow#258) * Bump libwebarena / libvisualwebarena dependencies (ServiceNow#257) * Massage WebArena instance (ServiceNow#259) * Refactor AssistantBench output directories (ServiceNow#242) --------- Co-authored-by: Maxime Gasse <[email protected]> * version bump v0.13.1 * Fix broken links * Update README.md * fix merging issues * Update README.md (ServiceNow#270) * Update README.md * README update * More permissive WA/VWA instance reset (ServiceNow#272) * New debug benchmark visualwebarena_tiny (ServiceNow#271) * Version bump v0.13.2 * Update README.md * Metadata column fix (ServiceNow#278) * Update README.md * Update README.md * Update README.md * Update README.md * Shunt WA / VWA unit tests * README update * Minor fixes (ServiceNow#281) * version bump v0.13.3 * remove unused fluff * revert more unintended changes --------- Co-authored-by: Peng Qi <[email protected]> Co-authored-by: Thibault LSDC <[email protected]> Co-authored-by: Maxime Gasse <[email protected]> Co-authored-by: Yanan Xie <[email protected]> Co-authored-by: Alexandre Lacoste <[email protected]> Co-authored-by: oriyor <[email protected]> Co-authored-by: Xing Han Lu <[email protected]> Co-authored-by: ljang0 <[email protected]> Co-authored-by: Megh Thakkar <[email protected]> Co-authored-by: Imene Kerboua <[email protected]> Co-authored-by: Oleh Shliazhko <[email protected]> Co-authored-by: Thibault Le Sellier de Chezelles <[email protected]>

minor log fix

9727f48

gasse force-pushed the vwa_massage branch from edc2096 to 9727f48 Compare November 8, 2024 15:56

gasse added 2 commits November 8, 2024 13:09

trace message

8a6e20f

vwa massage update

aaca74d

gasse merged commit 3d6c37b into main Nov 11, 2024
12 of 13 checks passed

gasse deleted the vwa_massage branch November 11, 2024 13:46

qipeng pushed a commit to orby-ai-engineering/BrowserGym that referenced this pull request Nov 20, 2024

Minor fix (ServiceNow#250)

ebd74fd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor fix #250

Minor fix #250

gasse commented Nov 8, 2024

Minor fix #250

Minor fix #250

Conversation

gasse commented Nov 8, 2024