Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(weave): Class-based scorer runnable ref fix #2922

Merged
merged 7 commits into from
Nov 7, 2024

Conversation

tssweeney
Copy link
Collaborator

@tssweeney tssweeney commented Nov 7, 2024

Correctly reports runnable refs when the scorer is an object.

Now that I am getting into the nitty-gritty of getting scorers to work, i realized we are not reporting the Evaluation scorer feedback ref correctly when the scorer is an object. Today, we are reporting the Scorer.score fn, not the parent Scorer object. This fixes and tests for that

@circle-job-mirror
Copy link

circle-job-mirror bot commented Nov 7, 2024

@tssweeney tssweeney marked this pull request as ready for review November 7, 2024 02:42
@tssweeney tssweeney requested a review from a team as a code owner November 7, 2024 02:42
scorers=[scorer],
)
res = await eval.evaluate(predict)
calls = client.server.calls_query(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to go down to this level? The test might be more appropriate with client.calls()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like the calls API - especially how it gets feedback. This is much better for asserting the data I am interested in asserting

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is to fix the calls API. It feels wrong to reach into the server here. What's wrong with how it gets feedback?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do this in countless tests right now. all i want is to assert the raw data, not worry aobut the higher level calls iterator or opaque feedback iterator stuff

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not friendly:

    @property
    def feedback(self) -> RefFeedbackQuery:
        if not self.id:
            raise ValueError("Can't get feedback for call without ID")
        if self._feedback is None:
            project_parts = self.project_id.split("/")
            if len(project_parts) != 2:
                raise ValueError(f"Invalid project_id: {self.project_id}")
            entity, project = project_parts
            weave_ref = CallRef(entity, project, self.id)
            self._feedback = RefFeedbackQuery(weave_ref.uri())
        return self._feedback

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, i try to avoid using the WeaveClient directly unless i am explicitly trying to test functionality of the WeaveClient. I find it much more cumbersome

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't block on this, but I think reaching into the server is not ideal. All of this code is at the "client level", so the checks should also be at the "client level". If the client sucks we should fix it

)

return self.future_executor.defer(send_score_call)

@trace_sentry.global_trace_sentry.watch()
def _add_score(
def _add_runnable_feedback(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a runnable? Is that defined somewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runnable is a new class of feedback. see #2865

self,
*,
weave_ref_uri: str,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are starting to different URIs in a lot of places. Maybe worth type narrowing to avoid errors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a difference in shape between weave_ref_uri and call_ref_uri? If we have many different types of URI floating around, can we help ourselves use the correct type of URI?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, let's make sure we are on the same page:

  • The Feedback table now has a few key column (they all have the _ref suffix and are strings formatted like refs:
    • weave_ref
    • call_ref
    • trigger_ref
    • runnable_ref
    • annotation_ref
  • At runtime, I like to distinguish between runtime Refs (instances of Refs) and the string uri by using _ref_uri to make it super clear this is a string, not the runtime object.

Given that context, i am having a hard time groking your recommendation or callout. Can you elaborate?

Copy link
Collaborator

@andrewtruong andrewtruong Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm saying something silly or overkill

Here's what I observe:

  1. We are using URIs in a lot of places in the SDK
  2. They can have different shapes like (1), (2), (3), and a specific shape of URI maps to a specific type of Ref in our SDK.
  3. Sometimes "ref" means "uri"... this is not consistent

So, should we make a subtype of str for the URIs to avoid passing the wrong URI?

For example:

T = TypeVar('T', bound='WeaveURI')

class WeaveCallURI(str):
    def __new__(cls: Type[T], content: str) -> T:
        # validate it conforms to f"weave:///{self.entity}/{self.project}/call/{self.id}"
       return cast(T, super().__new__(cls, content))

Comment on lines +1183 to +1184
runnable_ref=runnable_ref_uri,
call_ref=call_ref_uri,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these fields represent URIs so we should call them uris

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing this is outside the scope of this PR - it is the interface layer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you referring to the variable or the req field?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field -- it's called ref but it's being assigned a uri. I think it actually represents a URI?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but that follows the existing pattern of the feedback table and the interface. Changing this would impact quite a few layers and break pattern

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, unfortunate but I wont block here.

There are a few places where the nomenclature is kinda weird and I always get a little confused when I revisit them

Copy link
Collaborator

@andrewtruong andrewtruong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stamping as there are no blocking comments

@tssweeney tssweeney merged commit 6822779 into master Nov 7, 2024
115 checks passed
@tssweeney tssweeney deleted the tim/fix_eval_scorer_refs branch November 7, 2024 05:22
@github-actions github-actions bot locked and limited conversation to collaborators Nov 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants