Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(weave): Implement enhaced feedback structure and mvp filter/query layer #2865

Merged
merged 22 commits into from
Nov 6, 2024

Conversation

tssweeney
Copy link
Collaborator

@tssweeney tssweeney commented Nov 5, 2024

This PR lays the groundwork for the next leg of feedback types in our system. Specifically, we have two "classes" of feedback: runnables and annotations. "Runnables" are feedbacks that are generated by running a program (think: Op, Configured Action, Scorer), while "Annotations" are feedbacks created by humans with specific types (aka human in the loop, aka custom columns, etc...).

There were three problems to solve with this emerging data model:

  1. How do we store additional metadata about these feedbacks linking them to other objects in our system.
  2. How do we group / collect feedbacks that belong to the same logical "column" or concept?
  3. Given the above, how can we filter/sort without reading large JSON dumped columns?

After much iteration and discussion, the solution that seemed most suitable is as follows:

  1. feedback_type how has 2 special prefixes: wandb.runnable and wandb.annotation, where the total type should be wandb.runnable.RUNNABLE_NAME or wandb.annotation.ANNOTATION_NAME. here, RUNNABLE_NAME or ANNOTATION_NAME are the name (aka object_id) components of the backing Object or Op. This is the most common group key and indexed in Clickhouse already.
  2. I have added 4 new columns to the feedback table. Note, I originally had these as fields in the payload itself, but this would result in more complex, heavy lookups & more ridged structure over the payload itself. This approach allows us to put our foreign keys in columns that can be indexed in the future if needed.:
    • annotation_ref: The ref pointing to the annotation definition for this feedback.
    • runnable_ref: The ref pointing to the runnable definition for this feedback.
    • call_ref: The ref pointing to the resulting call associated with generating this feedback.
    • trigger_ref: The ref pointing to the trigger definition which resulted in this feedback.
  3. When these types of feedback are entered into the DB, our server now will enforce that these ref values are filled out when required and matched the correct format. Moreover, the payloads themselves conform to a very simple structure:
class AnnotationPayloadSchema(BaseModel):
    value: Any


class RunnablePayloadSchema(BaseModel):
    output: Any
  1. Finally, I implement basic (not optimized yet) support for filter and sort on the calls query using the notation feedback[feedback_type].payload.json.selector. This allows us to specify the feedback type (while supporting dots) and match our other field access patterns.

With all of this together, we can have code like:

@weave.op
def my_scorer(x: int, output: int) -> int:
    expected = ["a", "b", "c", "d"][x]
    return {
        "model_output": output,
        "match": output == expected,
    }

@weave.op
def my_model(x: int) -> str:
    return [
        "a",
        "x",  # intentional "mistake"
        "c",
        "y",  # intentional "mistake"
    ][x]

ids = []
for x in range(4):
    _, c = my_model.call(x)
    ids.append(c.id)
    # Note: `_apply_scorer` is not user-facing (yet!) but will be made public during the eval api project.
    c._apply_scorer(my_scorer)

... then query ...

calls = client.server.calls_query_stream(
    tsi.CallsQueryReq(
        project_id=client._project_id(),
        filter=tsi.CallsFilter(op_names=[get_ref(my_model).uri()]),
        # Filter down to just correct matches
        query={
            "$expr": {
                "$eq": [
                    {
                        "$getField": "feedback.[wandb.runnable.my_scorer].payload.output.match"
                    },
                    {"$literal": "true"},
                ]
            }
        },
        # Sort by the model output desc
        sort_by=[
            {
                "field": "feedback.[wandb.runnable.my_scorer].payload.output.model_output",
                "direction": "desc",
            }
        ],
    )
)

This can easily be extended to support different aggregation logic and specific version selectors.

@tssweeney tssweeney requested a review from a team as a code owner November 5, 2024 03:22
@circle-job-mirror
Copy link

circle-job-mirror bot commented Nov 5, 2024

assert feedback["payload"]["name"] == "score"
assert feedback["payload"]["op_ref"] == get_ref(score).uri()
assert feedback["payload"]["results"] == True
assert feedback["feedback_type"] == "wandb.runnable.score"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok to change as the UI/query layer does not consume it yet.

@@ -39,9 +38,8 @@ def my_score(input_x: int, model_output: int) -> int:

assert len(calls) == 2
feedback = calls[0].summary["weave"]["feedback"][0]
assert feedback["feedback_type"] == SCORE_TYPE_NAME
assert feedback["feedback_type"] == "wandb.runnable.my_score"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, safe to change now that we have a good format

# We're using "beta.1" to indicate that this is a pre-release version.
from typing import TypedDict

SCORE_TYPE_NAME = "wandb.score.beta.1"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we learned from this - no longer needed

Copy link
Member

@gtarpenning gtarpenning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new feedback query is going to be spicy in big projects, but looks good. The calls query builder is also feeling... clunky. Generally this makes sense, I wonder how much of the implementation we can abstract away from the user when adding, but still create an intuitive way for them to get the data out. It's possible that we might want to have some way of auto-constructing queries client-side, i'm imagining users not finding the following easy to use....
"$getField": "feedback.[wandb.runnable.my_scorer].payload.output.match"

weave/trace_server/calls_query_builder.py Outdated Show resolved Hide resolved
)
feedback_join_sql = f"""
LEFT JOIN feedback
ON (feedback.weave_ref = concat('weave-trace-internal:///', {_param_slot(project_param, 'String')}, '/call/', calls_merged.id))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to do this concat in the query vs outside and pass it in?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think so? but not sure. We have to do a concat either way since the last part is dynamic

weave/trace_server/feedback.py Outdated Show resolved Hide resolved
weave/trace_server/feedback.py Outdated Show resolved Hide resolved
weave/trace_server/orm.py Outdated Show resolved Hide resolved
@@ -686,6 +686,18 @@ class FeedbackCreateReq(BaseModel):
}
]
)
annotation_ref: Optional[str] = Field(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice if we could type this to a kind of ref, like objectRef, with a pydantic validator and then check its construction in the client.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@tssweeney tssweeney merged commit 87f3eef into master Nov 6, 2024
115 checks passed
@tssweeney tssweeney deleted the tim/enhanced_feedback_data_model branch November 6, 2024 02:31
@github-actions github-actions bot locked and limited conversation to collaborators Nov 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants