Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Features, Examples Refactoring and Bug Fix #879

Merged
merged 17 commits into from
Jan 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
30 changes: 30 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,33 @@
## [1.36.0-beta.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.35.1-beta.1...v1.36.0-beta.1) (2025-01-12)


### Features

* add example of collab ([1fad118](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/1fad1181a6b2d654c4eb996348907940b1d8a7af))


### Bug Fixes

* updated ollama structured output ([3b95911](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/3b9591156d96ac7266055703e7ffb354e90b01f0))


### Docs

* improved readme + fix csv scraper imports ([14b4b19](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/14b4b19f60e33c855bee4eea0a1a6fcc01a98c1a))
* refactoring of the doc ([5ca325c](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/5ca325c7257b71fc4cd12ee26bde3e992ade5756))

## [1.35.1-beta.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.35.0...v1.35.1-beta.1) (2025-01-12)


### Bug Fixes

* ollama tokenizer limited to 1024 tokens + ollama structured output + fix browser backend ([ad693b2](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/ad693b2bb201b4d9280139e70a2930358e779366))


### Docs

* ✨ code quality badge update ([02022cc](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/02022cc5db39fede1a1d920d17e18ba0d05328ba))

## [1.35.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.34.2...v1.35.0) (2025-01-06)


Expand Down
9 changes: 0 additions & 9 deletions cookbook/README.md

This file was deleted.

12 changes: 6 additions & 6 deletions docs/source/getting_started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,18 +25,18 @@ The library is available on PyPI, so it can be installed using the following com

It is higly recommended to install the library in a virtual environment (conda, venv, etc.)

If your clone the repository, it is recommended to use a package manager like `rye <https://rye.astral.sh/>`_.
To install the library using rye, you can run the following command:
If your clone the repository, it is recommended to use a package manager like `uv <https://github.com/astral-sh/uv>`_.
To install the library using uv, you can run the following command:

.. code-block:: bash

rye pin 3.10
rye sync
rye build
uv pin 3.10
uv sync
uv build

.. caution::

**Rye** must be installed first by following the instructions on the `official website <https://rye.astral.sh/>`_.
**Rye** must be installed first by following the instructions on the `official website <https://github.com/astral-sh/uv>`_.

Additionally on Windows when using WSL
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
79 changes: 70 additions & 9 deletions docs/source/introduction/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,37 +30,93 @@ ScrapGraphAI supports a wide range of AI models from various providers. Each mod
OpenAI Models
-------------
- GPT-3.5 Turbo (16,385 tokens)
- GPT-4 (8,192 tokens)
- GPT-3.5 (4,096 tokens)
- GPT-3.5 Turbo Instruct (4,096 tokens)
- GPT-4 Turbo Preview (128,000 tokens)
- GPT-4o (128000 tokens)
- GTP-4o-mini (128000 tokens)
- GPT-4 Vision Preview (128,000 tokens)
- GPT-4 (8,192 tokens)
- GPT-4 32k (32,768 tokens)
- GPT-4o (128,000 tokens)
- O1 Preview (128,000 tokens)
- O1 Mini (128,000 tokens)

Azure OpenAI Models
-------------------
- GPT-3.5 Turbo (16,385 tokens)
- GPT-4 (8,192 tokens)
- GPT-3.5 (4,096 tokens)
- GPT-4 Turbo Preview (128,000 tokens)
- GPT-4o (128000 tokens)
- GTP-4o-mini (128000 tokens)
- GPT-4 (8,192 tokens)
- GPT-4 32k (32,768 tokens)
- GPT-4o (128,000 tokens)
- O1 Preview (128,000 tokens)
- O1 Mini (128,000 tokens)

Google AI Models
----------------
- Gemini Pro (128,000 tokens)
- Gemini 1.5 Flash (128,000 tokens)
- Gemini 1.5 Pro (128,000 tokens)
- Gemini 1.0 Pro (128,000 tokens)

Anthropic Models
----------------
- Claude Instant (100,000 tokens)
- Claude 2 (200,000 tokens)
- Claude 2 (9,000 tokens)
- Claude 2.1 (200,000 tokens)
- Claude 3 (200,000 tokens)
- Claude 3.5 (200,000 tokens)
- Claude 3 Opus (200,000 tokens)
- Claude 3 Sonnet (200,000 tokens)
- Claude 3 Haiku (200,000 tokens)

Mistral AI Models
-----------------
- Mistral Large (128,000 tokens)
- Mistral Large Latest (128,000 tokens)
- Open Mistral Nemo (128,000 tokens)
- Codestral Latest (32,000 tokens)
- Open Mistral 7B (32,000 tokens)
- Open Mixtral 8x7B (32,000 tokens)
- Open Mixtral 8x22B (64,000 tokens)
- Open Codestral Mamba (256,000 tokens)

For a complete list of supported models and their token limits, please refer to the API documentation.
Ollama Models
-------------
- Command-R (12,800 tokens)
- CodeLlama (16,000 tokens)
- DBRX (32,768 tokens)
- DeepSeek Coder 33B (16,000 tokens)
- Llama2 Series (4,096 tokens)
- Llama3 Series (8,192-128,000 tokens)
- Mistral Models (32,000-128,000 tokens)
- Mixtral 8x22B Instruct (65,536 tokens)
- Phi3 Series (12,800-128,000 tokens)
- Qwen Series (32,000 tokens)

Hugging Face Models
------------------
- Grok-1 (8,192 tokens)
- Meta Llama 3 Series (8,192 tokens)
- Google Gemma Series (8,192 tokens)
- Microsoft Phi Series (2,048-131,072 tokens)
- GPT-2 Series (1,024 tokens)
- DeepSeek V2 Series (131,072 tokens)

Bedrock Models
-------------
- Claude 3 Series (200,000 tokens)
- Llama2 & Llama3 Series (4,096-8,192 tokens)
- Mistral Series (32,768 tokens)
- Titan Embed Text (8,000 tokens)
- Cohere Embed (512 tokens)

Fireworks Models
---------------
- Llama V2 7B (4,096 tokens)
- Mixtral 8x7B Instruct (4,096 tokens)
- Llama 3.1 Series (131,072 tokens)
- Mixtral MoE Series (65,536 tokens)

For a complete and up-to-date list of supported models and their token limits, please refer to the API documentation.

Understanding token limits is crucial for optimizing your scraping tasks. Larger token limits allow for processing more text in a single API call, which can be beneficial for scraping lengthy web pages or documents.

Expand Down Expand Up @@ -139,3 +195,8 @@ Sponsors
:width: 15%
:alt: Stat Proxies
:target: https://dashboard.statproxies.com/?refferal=scrapegraph

.. image:: ../../assets/scrapedo.png
:width: 11%
:alt: Scrapedo
:target: https://scrape.do
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Example usage:
print(f"GPT-4 token limit: {gpt4_limit}")

# Check the token limit for a specific model
model_name = "gpt-3.5-turbo"
model_name = "gpt-4o-mini"
if model_name in models_tokens['openai']:
print(f"{model_name} token limit: {models_tokens['openai'][model_name]}")
else:
Expand Down
23 changes: 0 additions & 23 deletions docs/source/scrapers/benchmarks.rst

This file was deleted.

915 changes: 915 additions & 0 deletions examples/ScrapegraphAI_cookbook.ipynb

Large diffs are not rendered by default.

1 change: 0 additions & 1 deletion examples/anthropic/.env.example

This file was deleted.

59 changes: 0 additions & 59 deletions examples/anthropic/code_generator_graph_anthropic.py

This file was deleted.

56 changes: 0 additions & 56 deletions examples/anthropic/csv_scraper_anthropic.py

This file was deleted.

50 changes: 0 additions & 50 deletions examples/anthropic/csv_scraper_graph_multi_anthropic.py

This file was deleted.

Loading
Loading