Skip to content

Commit

Permalink
Merge branch 'master' into DOCS-1217
Browse files Browse the repository at this point in the history
  • Loading branch information
J2-D2-3PO authored Jan 27, 2025
2 parents bfd6df3 + ce57a2e commit e054a26
Show file tree
Hide file tree
Showing 31 changed files with 914 additions and 109 deletions.
1 change: 1 addition & 0 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,7 @@ jobs:
yarn tslint
yarn prettier
yarn run tsc
yarn testci
# ==== Trace Jobs ====
lint:
Expand Down
68 changes: 68 additions & 0 deletions docs/docs/guides/core-types/evaluations.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,3 +186,71 @@ def function_to_evaluate(question: str):

asyncio.run(evaluation.evaluate(function_to_evaluate))
```

## Advanced evaluation usage

### Using `preprocess_model_input` to format dataset rows before evaluating

The `preprocess_model_input` parameter allows you to transform your dataset examples before they are passed to your evaluation function. This is useful when you need to:
- Rename fields to match your model's expected input
- Transform data into the correct format
- Add or remove fields
- Load additional data for each example

Here's a simple example that shows how to use `preprocess_model_input` to rename fields:

```python
import weave
from weave import Evaluation
import asyncio

# Our dataset has "input_text" but our model expects "question"
examples = [
{"input_text": "What is the capital of France?", "expected": "Paris"},
{"input_text": "Who wrote 'To Kill a Mockingbird'?", "expected": "Harper Lee"},
{"input_text": "What is the square root of 64?", "expected": "8"},
]

@weave.op()
def preprocess_example(example):
# Rename input_text to question
return {
"question": example["input_text"]
}

@weave.op()
def match_score(expected: str, model_output: dict) -> dict:
return {'match': expected == model_output['generated_text']}

@weave.op()
def function_to_evaluate(question: str):
return {'generated_text': f'Answer to: {question}'}

# Create evaluation with preprocessing
evaluation = Evaluation(
dataset=examples,
scorers=[match_score],
preprocess_model_input=preprocess_example
)

# Run the evaluation
weave.init('preprocessing-example')
asyncio.run(evaluation.evaluate(function_to_evaluate))
```

In this example, our dataset contains examples with an `input_text` field, but our evaluation function expects a `question` parameter. The `preprocess_example` function transforms each example by renaming the field, allowing the evaluation to work correctly.

The preprocessing function:
1. Receives the raw example from your dataset
2. Returns a dictionary with the fields your model expects
3. Is applied to each example before it's passed to your evaluation function

This is particularly useful when working with external datasets that may have different field names or structures than what your model expects.

### Using HuggingFace Datasets with evaluations

We are continuously improving our integrations with third-party services and libraries.

While we work on building more seamless integrations, you can use `preprocess_model_input` as a temporary workaround for using HuggingFace Datasets in Weave evaluations.

See our [Using HuggingFace Datasets in evaluations cookbook](/reference/gen_notebooks/hf_dataset_evals) for the current approach.
4 changes: 4 additions & 0 deletions docs/docs/guides/integrations/anthropic.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

Weave automatically tracks and logs LLM calls made via the [Anthropic Python library](https://github.com/anthropics/anthropic-sdk-python), after `weave.init()` is called.

:::note
Do you want to experiment with Anthropic models on Weave without any set up? Try the [LLM Playground](../tools/playground.md).
:::

## Traces

It’s important to store traces of LLM applications in a central database, both during development and in production. You’ll use these traces for debugging, and as a dataset that will help you improve your application.
Expand Down
4 changes: 4 additions & 0 deletions docs/docs/guides/integrations/azure.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

Weights & Biases integrates with Microsoft Azure OpenAI services, helping teams to manage, debug, and optimize their Azure AI workflows at scale. This guide introduces the W&B integration, what it means for Weave users, its key features, and how to get started.

:::tip
For the latest tutorials, visit [Weights & Biases on Microsoft Azure](https://wandb.ai/site/partners/azure).
:::

## Key features

- **LLM evaluations**: Evaluate and monitor LLM-powered applications using Weave, optimized for Azure infrastructure.
Expand Down
8 changes: 8 additions & 0 deletions docs/docs/guides/integrations/bedrock.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,14 @@

Weave automatically tracks and logs LLM calls made via Amazon Bedrock, AWS's fully managed service that offers foundation models from leading AI companies through a unified API.

:::tip
For the latest tutorials, visit [Weights & Biases on Amazon Web Services](https://wandb.ai/site/partners/aws/).
:::

:::note
Do you want to experiment with Amazon Bedrock models on Weave without any set up? Try the [LLM Playground](../tools/playground.md).
:::

## Traces

Weave will automatically capture traces for Bedrock API calls. You can use the Bedrock client as usual after initializing Weave and patching the client:
Expand Down
8 changes: 8 additions & 0 deletions docs/docs/guides/integrations/google-gemini.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# Google Gemini

:::tip
For the latest tutorials, visit [Weights & Biases on Google Cloud](https://wandb.ai/site/partners/googlecloud/).
:::

:::note
Do you want to experiment with Google Gemini models on Weave without any set up? Try the [LLM Playground](../tools/playground.md).
:::

Google offers two ways of calling Gemini via API:

1. Via the [Vertex APIs](https://cloud.google.com/vertex-ai/docs).
Expand Down
4 changes: 4 additions & 0 deletions docs/docs/guides/integrations/groq.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Groq

:::note
Do you want to experiment with Groq models on Weave without any set up? Try the [LLM Playground](../tools/playground.md).
:::

[Groq](https://groq.com/) is the AI infrastructure company that delivers fast AI inference. The LPU™ Inference Engine by Groq is a hardware and software platform that delivers exceptional compute speed, quality, and energy efficiency. Weave automatically tracks and logs calls made using Groq chat completion calls.

## Tracing
Expand Down
13 changes: 6 additions & 7 deletions docs/docs/guides/integrations/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,19 @@ Weave provides automatic logging integrations for popular LLM providers and orch

LLM providers are the vendors that offer access to large language models for generating predictions. Weave integrates with these providers to log and trace the interactions with their APIs:

- **[OpenAI](/guides/integrations/openai)**
- **[Amazon Bedrock](/guides/integrations/bedrock)**
- **[Anthropic](/guides/integrations/anthropic)**
- **[Cerebras](/guides/integrations/cerebras)**
- **[Cohere](/guides/integrations/cohere)**
- **[MistralAI](/guides/integrations/mistral)**
- **[Microsoft Azure](/guides/integrations/azure)**
- **[Google Gemini](/guides/integrations/google-gemini)**
- **[Together AI](/guides/integrations/together_ai)**
- **[Groq](/guides/integrations/groq)**
- **[Open Router](/guides/integrations/openrouter)**
- **[LiteLLM](/guides/integrations/litellm)**
- **[Microsoft Azure](/guides/integrations/azure)**
- **[MistralAI](/guides/integrations/mistral)**
- **[NVIDIA NIM](/guides/integrations/nvidia_nim)**


- **[OpenAI](/guides/integrations/openai)**
- **[Open Router](/guides/integrations/openrouter)**
- **[Together AI](/guides/integrations/together_ai)**

**[Local Models](/guides/integrations/local_models)**: For when you're running models on your own infrastructure.

Expand Down
4 changes: 4 additions & 0 deletions docs/docs/guides/integrations/nvidia_nim.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ import TabItem from '@theme/TabItem';

Weave automatically tracks and logs LLM calls made via the [ChatNVIDIA](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/) library, after `weave.init()` is called.

:::tip
For the latest tutorials, visit [Weights & Biases on NVIDIA](https://wandb.ai/site/partners/nvidia).
:::

## Tracing

It’s important to store traces of LLM applications in a central database, both during development and in production. You’ll use these traces for debugging and to help build a dataset of tricky examples to evaluate against while improving your application.
Expand Down
4 changes: 4 additions & 0 deletions docs/docs/guides/integrations/openai.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@ import TabItem from '@theme/TabItem';

# OpenAI

:::note
Do you want to experiment with OpenAI models on Weave without any set up? Try the [LLM Playground](../tools/playground.md).
:::

## Tracing

It’s important to store traces of LLM applications in a central database, both during development and in production. You’ll use these traces for debugging and to help build a dataset of tricky examples to evaluate against while improving your application.
Expand Down
106 changes: 47 additions & 59 deletions docs/docs/guides/tools/playground.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

> **The LLM Playground is currently in preview.**
Evaluating LLM prompts and responses is challenging. The Weave Playground is designed to simplify the process of iterating on LLM prompts and responses, making it easier to experiment with different models and prompts. With features like prompt editing, message retrying, and model comparison, Playground helps you to quickly test and improve your LLM applications. Playground currently supports OpenAI, Anthropic, Gemini, and Groq.
Evaluating LLM prompts and responses is challenging. The Weave Playground is designed to simplify the process of iterating on LLM prompts and responses, making it easier to experiment with different models and prompts. With features like prompt editing, message retrying, and model comparison, Playground helps you to quickly test and improve your LLM applications. Playground currently supports OpenAI, Anthropic, Google Gemini, Groq, and Amazon Bedrock models.

## Features

Expand Down Expand Up @@ -37,7 +37,7 @@ To use one of the available models, add the appropriate information to your team

- OpenAI: `OPENAI_API_KEY`
- Anthropic: `ANTHROPIC_API_KEY`
- Gemini: `GOOGLE_API_KEY`
- Google Gemini: `GOOGLE_API_KEY`
- Groq: `GEMMA_API_KEY`
- Amazon Bedrock:
- `AWS_ACCESS_KEY_ID`
Expand All @@ -60,117 +60,105 @@ There are two ways to access the Playground:

You can switch the LLM using the dropdown menu in the top left. The available models from various providers are listed below:

- [AI21](#ai21)
- [Amazon](#amazon)
- [Amazon Bedrock](#amazon-bedrock)
- [Anthropic](#anthropic)
- [Cohere](#cohere)
- [Google](#google)
- [Google Gemini](#gemini)
- [Groq](#groq)
- [Meta](#meta)
- [Mistral](#mistral)
- [OpenAI](#openai)
- [X.AI](#xai)

### [Amazon Bedrock](../integrations/bedrock.md)

### AI21
- ai21.j2-mid-v1
- ai21.j2-ultra-v1

### Amazon
- amazon.nova-lite
- amazon.nova-micro
- amazon.nova-pro
- amazon.titan-text-express-v1
- amazon.nova-micro-v1:0
- amazon.nova-lite-v1:0
- amazon.nova-pro-v1:0
- amazon.titan-text-lite-v1

### Anthropic
- amazon.titan-text-express-v1
- mistral.mistral-7b-instruct-v0:2
- mistral.mixtral-8x7b-instruct-v0:1
- mistral.mistral-large-2402-v1:0
- mistral.mistral-large-2407-v1:0
- anthropic.claude-3-sonnet-20240229-v1:0
- anthropic.claude-3-5-sonnet-20240620-v1:0
- anthropic.claude-3-haiku-20240307-v1:0
- anthropic.claude-3-opus-20240229-v1:0
- anthropic.claude-3-sonnet-20240229-v1:0
- anthropic.claude-instant-v1
- anthropic.claude-v2
- anthropic.claude-v2:1
- anthropic.claude-instant-v1
- cohere.command-text-v14
- cohere.command-light-text-v14
- cohere.command-r-plus-v1:0
- cohere.command-r-v1:0
- meta.llama2-13b-chat-v1
- meta.llama2-70b-chat-v1
- meta.llama3-8b-instruct-v1:0
- meta.llama3-70b-instruct-v1:0
- meta.llama3-1-8b-instruct-v1:0
- meta.llama3-1-70b-instruct-v1:0
- meta.llama3-1-405b-instruct-v1:0

### [Anthropic](../integrations/anthropic.md)

- claude-3-5-sonnet-20240620
- claude-3-5-sonnet-20241022
- claude-3-haiku-20240307
- claude-3-opus-20240229
- claude-3-sonnet-20240229

### Cohere
- cohere.command-light-text-v14
- cohere.command-r-plus-v1:0
- cohere.command-r-v1:0
- cohere.command-text-v14
### [Google Gemini](../integrations/google-gemini.md)

### Google
- gemini/gemini-1.5-flash
- gemini/gemini-1.5-flash-001
- gemini/gemini-1.5-flash-002
- gemini/gemini-1.5-flash-8b-exp-0827
- gemini/gemini-1.5-flash-8b-exp-0924
- gemini/gemini-1.5-flash-exp-0827
- gemini/gemini-1.5-flash-latest
- gemini/gemini-1.5-pro
- gemini/gemini-1.5-flash
- gemini/gemini-1.5-pro-001
- gemini/gemini-1.5-pro-002
- gemini/gemini-1.5-pro-exp-0801
- gemini/gemini-1.5-pro-exp-0827
- gemini/gemini-1.5-pro-latest
- gemini/gemini-1.5-pro
- gemini/gemini-pro

### Groq
### [Groq](../integrations/groq.md)

- groq/gemma-7b-it
- groq/gemma2-9b-it
- groq/llama-3.1-70b-versatile
- groq/llama-3.1-8b-instant
- groq/llama3-70b-8192
- groq/llama3-8b-8192
- groq/llama3-groq-70b-8192-tool-use-preview
- groq/llama3-groq-8b-8192-tool-use-preview
- groq/mixtral-8x7b-32768

### Meta
- meta.llama2-13b-chat-v1
- meta.llama2-70b-chat-v1
- meta.llama3-1-405b-instruct-v1:0
- meta.llama3-1-70b-instruct-v1:0
- meta.llama3-1-8b-instruct-v1:0
- meta.llama3-70b-instruct-v1:0
- meta.llama3-8b-instruct-v1:0

### Mistral
- mistral.mistral-7b-instruct-v0:2
- mistral.mistral-large-2402-v1:0
- mistral.mistral-large-2407-v1:0
- mistral.mixtral-8x7b-instruct-v0:1
### [OpenAI](../integrations/openai.md)

### OpenAI
- gpt-3.5-turbo
- gpt-4o-mini
- gpt-3.5-turbo-0125
- gpt-3.5-turbo-1106
- gpt-3.5-turbo-16k
- gpt-4
- gpt-4-0125-preview
- gpt-4-0314
- gpt-4-0613
- gpt-4-1106-preview
- gpt-4-32k-0314
- gpt-4-turbo
- gpt-4-turbo-2024-04-09
- gpt-4-turbo-preview
- gpt-40-2024-05-13
- gpt-40-2024-08-06
- gpt-40-mini
- gpt-40-mini-2024-07-18
- gpt-4-turbo
- gpt-4
- gpt-4o-2024-05-13
- gpt-4o-2024-08-06
- gpt-4o-mini-2024-07-18
- gpt-4o
- o1-mini
- gpt-4o-2024-11-20
- o1-mini-2024-09-12
- o1-preview
- o1-mini
- o1-preview-2024-09-12
- o1-preview
- o1-2024-12-17

### X.AI
- xai/grok-beta

- xai/grok-beta

## Adjust LLM parameters

Expand Down
Loading

0 comments on commit e054a26

Please sign in to comment.