Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Web toolkit #1406

Open
1 of 2 tasks
Wendong-Fan opened this issue Jan 7, 2025 · 2 comments · May be fixed by #1471
Open
1 of 2 tasks

[Feature Request] Web toolkit #1406

Wendong-Fan opened this issue Jan 7, 2025 · 2 comments · May be fixed by #1471
Assignees
Labels
New Feature P0 Task with high level priority
Milestone

Comments

@Wendong-Fan
Copy link
Member

Wendong-Fan commented Jan 7, 2025

Required prerequisites

Motivation

A toolkit that can achieve a certain degree of webpage (rendered) interaction, performs web-based tasks. (e.g. click elements and scrolling pages, open a given url, make a screenshot, use MLLM to understand the webpage content)

example task:

         "Question": "Eva Draconis has a personal website which can be accessed on her YouTube page. What is the meaning of the only symbol seen in the top banner that has a curved line that isn't a circle or a portion of a circle? Answer without punctuation.",
          "Final answer": "War is not here this is a land of peace",
          "Annotation Metadata": {
              "Steps": "1. By googling Eva Draconis youtube, you can find her channel.\n2. In her about section, she has written her website URL, orionmindproject.com.\n3. Entering this website, you can see a series of symbols at the top, and the text \"> see what the symbols mean here\" below it.\n4. Reading through the entries, you can see a short description of some of the symbols.\n5. The only symbol with a curved line that isn't a circle or a portion of a circle is the last one.\n6. Note that the symbol supposedly means \"War is not here, this is a land of peace.\"",
              "Number of steps": "6",
              "How long did this take?": "30 minutes.",
              "Tools": "1. A web browser.\n2. A search engine.\n3. Access to YouTube\n4. Image recognition tools",
              "Number of tools": "4"
          }
      },

Solution

study solutions like
https://www.browserbase.com/ (https://docs.stagehand.dev/get_started/introduction)
https://github.com/steel-dev/steel-browser
https://pptr.dev/guides/getting-started
https://playwright.dev/

Alternatives

No response

Additional context

No response

@Aaron617
Copy link
Collaborator

Aaron617 commented Jan 8, 2025

I studied these solutions:

  1. stagehand : compared to other libraries, stagehand provide natural language APIs (act, extract, and observe) on top of Playwright. Its key feature is offering a lightweight, model-agnostic framework for executing atomic web tasks via natural language instructions. (e.g., "Click the link to the quickstart")
  2. steel-browser : from my perspective, the key feature of steel-browser lies in 1) post-processing of page data. (Easily extract page data as cleaned HTML, markdown, PDFs, or screenshots) 2) Bypass anti-bot measures 3) Optimizing data formats to reduce LLM token usage
  3. Puppeteer/Selenium/Playwright are similar.

@Wendong-Fan
Copy link
Member Author

lead: @X-TRON404 , support & review: @koch3092 , @Asher-hss , @Aaron617

@X-TRON404 X-TRON404 linked a pull request Jan 20, 2025 that will close this issue
12 tasks
@Wendong-Fan Wendong-Fan linked a pull request Jan 21, 2025 that will close this issue
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New Feature P0 Task with high level priority
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

6 participants