chore(weave): Class-based scorer runnable ref fix #2922

tssweeney · 2024-11-07T01:17:24Z

Correctly reports runnable refs when the scorer is an object.

Now that I am getting into the nitty-gritty of getting scorers to work, i realized we are not reporting the Evaluation scorer feedback ref correctly when the scorer is an object. Today, we are reporting the Scorer.score fn, not the parent Scorer object. This fixes and tests for that

circle-job-mirror · 2024-11-07T01:25:51Z

Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=e0d058f469bebf3673b85ef3664a5a101e85585c

tests/trace/test_evaluations.py

andrewtruong · 2024-11-07T03:23:23Z

tests/trace/test_evaluations.py

+        scorers=[scorer],
+    )
+    res = await eval.evaluate(predict)
+    calls = client.server.calls_query(


Do you need to go down to this level? The test might be more appropriate with client.calls()

I really don't like the calls API - especially how it gets feedback. This is much better for asserting the data I am interested in asserting

My preference is to fix the calls API. It feels wrong to reach into the server here. What's wrong with how it gets feedback?

I do this in countless tests right now. all i want is to assert the raw data, not worry aobut the higher level calls iterator or opaque feedback iterator stuff

Not friendly:

@property def feedback(self) -> RefFeedbackQuery: if not self.id: raise ValueError("Can't get feedback for call without ID") if self._feedback is None: project_parts = self.project_id.split("/") if len(project_parts) != 2: raise ValueError(f"Invalid project_id: {self.project_id}") entity, project = project_parts weave_ref = CallRef(entity, project, self.id) self._feedback = RefFeedbackQuery(weave_ref.uri()) return self._feedback

fwiw, i try to avoid using the WeaveClient directly unless i am explicitly trying to test functionality of the WeaveClient. I find it much more cumbersome

I won't block on this, but I think reaching into the server is not ideal. All of this code is at the "client level", so the checks should also be at the "client level". If the client sucks we should fix it

weave/trace/weave_client.py

andrewtruong · 2024-11-07T03:31:45Z

weave/trace/weave_client.py

            )

        return self.future_executor.defer(send_score_call)

    @trace_sentry.global_trace_sentry.watch()
-    def _add_score(
+    def _add_runnable_feedback(


what is a runnable? Is that defined somewhere?

runnable is a new class of feedback. see #2865

andrewtruong · 2024-11-07T03:32:18Z

weave/trace/weave_client.py

        self,
        *,
+        weave_ref_uri: str,


we are starting to different URIs in a lot of places. Maybe worth type narrowing to avoid errors

what do you mean?

Is there a difference in shape between weave_ref_uri and call_ref_uri? If we have many different types of URI floating around, can we help ourselves use the correct type of URI?

Hmmm, let's make sure we are on the same page:

The Feedback table now has a few key column (they all have the _ref suffix and are strings formatted like refs:

weave_ref

call_ref

trigger_ref

runnable_ref

annotation_ref

At runtime, I like to distinguish between runtime Refs (instances of Refs) and the string uri by using _ref_uri to make it super clear this is a string, not the runtime object.

Given that context, i am having a hard time groking your recommendation or callout. Can you elaborate?

Maybe I'm saying something silly or overkill

Here's what I observe:

We are using URIs in a lot of places in the SDK

They can have different shapes like (1), (2), (3), and a specific shape of URI maps to a specific type of Ref in our SDK.

Sometimes "ref" means "uri"... this is not consistent

So, should we make a subtype of str for the URIs to avoid passing the wrong URI?

For example:

T = TypeVar('T', bound='WeaveURI') class WeaveCallURI(str): def __new__(cls: Type[T], content: str) -> T: # validate it conforms to f"weave:///{self.entity}/{self.project}/call/{self.id}" return cast(T, super().__new__(cls, content))

weave/trace/weave_client.py

tests/trace/test_evaluations.py

andrewtruong · 2024-11-07T03:37:25Z

weave/trace/weave_client.py

+            runnable_ref=runnable_ref_uri,
+            call_ref=call_ref_uri,


these fields represent URIs so we should call them uris

changing this is outside the scope of this PR - it is the interface layer

are you referring to the variable or the req field?

The field -- it's called ref but it's being assigned a uri. I think it actually represents a URI?

yes, but that follows the existing pattern of the feedback table and the interface. Changing this would impact quite a few layers and break pattern

ok, unfortunate but I wont block here.

There are a few places where the nomenclature is kinda weird and I always get a little confused when I revisit them

andrewtruong

stamping as there are no blocking comments

tssweeney added 2 commits November 6, 2024 17:16

init

e999564

merged

1b095e9

tssweeney added 4 commits November 6, 2024 17:28

Done except tests

a5f88da

Merge branch 'master' into tim/fix_eval_scorer_refs

b704d95

fixed params

1806fc1

added tests

f3bc067

tssweeney marked this pull request as ready for review November 7, 2024 02:42

tssweeney requested a review from a team as a code owner November 7, 2024 02:42