Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup #30

Merged
merged 20 commits into from
Feb 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .changeset/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"baseBranch": "main",
"updateInternalDependencies": "patch",
"ignore": [],
"bumpVersionsWithWorkspaceProtocolOnly": true,
"bumpVersionsWithWorkspaceProtocolOnly": false,
"privatePackages": {
"version": false,
"tag": false
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ jobs:
environment: Publish
env:
NPM_TOKEN: ${{ secrets.NPM_TOKEN }}
# TURBO_TOKEN: ${{ secrets.TURBO_TOKEN }}
# TURBO_TEAM: ${{ secrets.TURBO_TEAM }}
# TURBO_REMOTE_ONLY: true
TURBO_TOKEN: ${{ secrets.TURBO_TOKEN }}
TURBO_TEAM: ${{ secrets.TURBO_TEAM }}
TURBO_REMOTE_ONLY: true


steps:
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/release-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@ jobs:
runs-on: ubuntu-latest
environment: Publish
env:
# TURBO_TOKEN: ${{ secrets.TURBO_TOKEN }}
# TURBO_TEAM: ${{ secrets.TURBO_TEAM }}
# TURBO_REMOTE_ONLY: true
TURBO_TOKEN: ${{ secrets.TURBO_TOKEN }}
TURBO_TEAM: ${{ secrets.TURBO_TEAM }}
TURBO_REMOTE_ONLY: true
NPM_TOKEN: ${{ secrets.NPM_TOKEN }}

steps:
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ jobs:
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANYSCALE_API_KEY: ${{ secrets.ANYSCALE_API_KEY }}
# TURBO_TOKEN: ${{ secrets.TURBO_TOKEN }}
# TURBO_TEAM: ${{ secrets.TURBO_TEAM }}
# TURBO_REMOTE_ONLY: true
TURBO_TOKEN: ${{ secrets.TURBO_TOKEN }}
TURBO_TEAM: ${{ secrets.TURBO_TEAM }}
TURBO_REMOTE_ONLY: true

steps:
- uses: actions/checkout@v3
Expand Down
7 changes: 7 additions & 0 deletions apps/next-demo/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# island-next-demos

## 1.0.10

### Patch Changes

- Updated dependencies [[`a79bd11a9caaf4f9d99eebe0e528b04fd4ca811e`](https://github.com/hack-dance/island-ai/commit/a79bd11a9caaf4f9d99eebe0e528b04fd4ca811e)]:
- [email protected]

## 1.0.9

### Patch Changes
Expand Down
2 changes: 1 addition & 1 deletion apps/next-demo/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "island-next-demos",
"version": "1.0.9",
"version": "1.0.10",
"private": true,
"scripts": {
"dev": "next dev --port=4000",
Expand Down
16 changes: 16 additions & 0 deletions apps/www/src/config/docs.ts
Original file line number Diff line number Diff line change
Expand Up @@ -140,5 +140,21 @@ export const docs: Record<string, DocType> = {
]
}
]
},
"evalz": {
title: "evalz",
indexRoute: "evalz/introduction",
sections: [
{
title: "Overview",
pages: [
{
title: "Introduction",
slug: "evalz/introduction",
id: "evalz"
}
]
}
]
}
}
13 changes: 13 additions & 0 deletions apps/www/src/docs/evalz.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# evalz

## Introduction

### Overview
**evalz** is a TypeScript package designed to facilitate model-graded evaluations with a focus on structured output. Leveraging Zod schemas, **evalz** streamline s the evaluation of AI-generated responses. It provides a set of tools to assess the quality of responses based on custom criteria such as relevance, fluency, and completeness. The package leverages OpenAI's GPT models to perform evaluations, offering both simple and weighted evaluation mechanisms.

### Key Features
- **Structured Evaluation Models**: Define your evaluation logic using Zod schemas to ensure data integrity throughout your application.
- **Flexible Evaluation Strategies**: Supports various evaluation strategies, including score-based and binary evaluations, with customizable evaluators.
- **Easy Integration**: Designed to integrate seamlessly with existing TypeScript projects, enhancing AI and data processing workflows with minimal setup.
- **Custom Evaluations**: Define evaluation criteria tailored to your specific requirements.
- **Weighted Evaluations**: Combine multiple evaluations with custom weights to calculate a composite score.
Binary file modified bun.lockb
Binary file not shown.
1 change: 0 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@
"engines": {
"node": ">=18"
},
"dependencies": {},
"packageManager": "[email protected]",
"workspaces": [
"apps/*",
Expand Down
20 changes: 20 additions & 0 deletions public-packages/evalz/.eslintrc.cjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
const { resolve } = require("node:path")

const project = resolve(__dirname, "tsconfig.lint.json")

/** @type {import("eslint").Linter.Config} */
module.exports = {
root: true,
ignorePatterns: [".eslintrc.cjs"],
extends: ["@repo/eslint-config/react-internal.js"],
parser: "@typescript-eslint/parser",
parserOptions: {
project
},
overrides: [
{
extends: ["plugin:@typescript-eslint/disable-type-checked"],
files: ["./**/*.js", "*.js"]
}
]
}
Empty file.
93 changes: 93 additions & 0 deletions public-packages/evalz/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# evalz

<div align="center">
<img alt="GitHub issues" src="https://img.shields.io/github/issues/hack-dance/island-ai.svg?style=flat-square&labelColor=000000">
<img alt="NPM version" src="https://img.shields.io/npm/v/evalz.svg?style=flat-square&logo=npm&labelColor=000000&label=evalz">
<img alt="License" src="https://img.shields.io/npm/l/evalz.svg?style=flat-square&labelColor=000000">
</div>

**evalz** is a TypeScript package designed to facilitate model-graded evaluations with a focus on structured output. Leveraging Zod schemas, **evalz** streamline s the evaluation of AI-generated responses. It provides a set of tools to assess the quality of responses based on custom criteria such as relevance, fluency, and completeness. The package leverages OpenAI's GPT models to perform evaluations, offering both simple and weighted evaluation mechanisms.

## Features

- **Structured Evaluation Models**: Define your evaluation logic using Zod schemas to ensure data integrity throughout your application.
- **Flexible Evaluation Strategies**: Supports various evaluation strategies, including score-based and binary evaluations, with customizable evaluators.
- **Easy Integration**: Designed to integrate seamlessly with existing TypeScript projects, enhancing AI and data processing workflows with minimal setup.
- **Custom Evaluations**: Define evaluation criteria tailored to your specific requirements.
- **Weighted Evaluations**: Combine multiple evaluations with custom weights to calculate a composite score.


## Installation

Install `evalz` using your preferred package manager:

```bash
npm install evalz openai zod

bun add evalz openai zod

pnpm add evalz openai zod
```

## Basic Usage

### Creating an Evaluator

First, create an evaluator for assessing a single aspect of a response, such as its relevance:

```typescript
import { createEvaluator } from "evalz";
import OpenAI from "openai";

const oai = new OpenAI({
apiKey: process.env["OPENAI_API_KEY"],
organization: process.env["OPENAI_ORG_ID"]
});

function relevanceEval() {
return createEvaluator({
client: oai,
model: "gpt-4-1106-preview",
evaluationDescription: "Rate the relevance from 0 to 1."
});
}
```

### Conducting an Evaluation

Evaluate AI-generated content by passing the response data to your evaluator:

```typescript
const evaluator = relevanceEval();

const result = await evaluator({ data: yourResponseData });
console.log(result.scoreResults);
```

### Weighted Evaluation

Combine multiple evaluators with specified weights for a comprehensive assessment:

```typescript
import { createWeightedEvaluator } from "evalz";

const weightedEvaluator = createWeightedEvaluator({
evaluators: {
relevance: relevanceEval(),
fluency: fluencyEval(),
completeness: completenessEval()
},
weights: {
relevance: 0.25,
fluency: 0.25,
completeness: 0.5
}
});

const result = await weightedEvaluator({ data: yourResponseData });
console.log(result.scoreResults);
```

## Contributing

Contributions are welcome! Please submit a pull request or open an issue to propose changes or additions.
67 changes: 67 additions & 0 deletions public-packages/evalz/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
{
"name": "evalz",
"version": "0.0.1--alpha.3",
"description": "Model graded evals with typescript",
"publishConfig": {
"access": "public"
},
"type": "module",
"main": "./dist/index.js",
"module": "./dist/index.js",
"exports": {
".": {
"require": "./dist/index.cjs",
"import": "./dist/index.js",
"default": "./dist/index.js",
"types": "./dist/index.d.ts"
}
},
"files": [
"dist/**"
],
"typings": "./dist/index.d.ts",
"scripts": {
"test": "bun test --coverage --verbose",
"build": "tsup",
"dev": "tsup --watch",
"lint": "TIMING=1 eslint src/**/*.ts* --fix",
"clean": "rm -rf .turbo && rm -rf node_modules && rm -rf dist",
"type-check": "tsc --noEmit"
},
"repository": {
"directory": "public-packages/edna",
"type": "git",
"url": "git+https://github.com/hack-dance/island-ai.git"
},
"keywords": [
"llm",
"structured output",
"streaming",
"evals",
"openai",
"zod"
],
"license": "MIT",
"author": "Dimitri Kennedy <[email protected]> (https://hack.dance)",
"homepage": "https://island.novy.work",
"dependencies": {
"zod-stream": "*"
},
"peerDependencies": {
"openai": ">=4.24.1",
"zod": ">=3.22.4"
},
"devDependencies": {
"@repo/eslint-config": "workspace:*",
"@repo/typescript-config": "workspace:*",
"zod-stream": "*",
"@turbo/gen": "^1.10.12",
"@types/node": "^20.5.2",
"@types/eslint": "^8.44.7",
"eslint": "^8.53.0",
"tsup": "^8.0.1",
"typescript": "^5.2.2",
"ramda": "^0.29.0",
"zod": "3.22.4"
}
}
15 changes: 15 additions & 0 deletions public-packages/evalz/src/constants/prompts.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import { ResultsType } from "@/types"

export const CUSTOM_EVALUATOR_IDENTITY =
"You are an AI evaluator tasked with scoring a language model's responses. You'll be presented with a 'prompt:' and 'response:' pair (and optionally an 'expectedResponse') and should evaluate based on the criteria provided in the subsequent system prompts. Provide only a numerical score in the range defined, not a descriptive response and no other prose."

export const RESPONSE_TYPE_EVALUATOR_SCORE =
"Your task is to provide a numerical score ranging from 0 to 1 based on the criteria in the subsequent system prompts. The score should precisely reflect the performance of the language model's response. Do not provide any text explanation or feedback, only the numerical score."

export const RESPONSE_TYPE_EVALUATOR_BINARY =
"Your task is to provide a binary score of either 0 or 1 based on the criteria in the subsequent system prompts. This should precisely reflect the language model's performance. Do not provide any text explanation or feedback, only a singular digit: 1 or 0."

export const RESULTS_TYPE_PROMPT: Record<ResultsType, string> = {
score: RESPONSE_TYPE_EVALUATOR_SCORE,
binary: RESPONSE_TYPE_EVALUATOR_BINARY
}
Loading
Loading