-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(weave): Class-based scorer runnable ref fix #2922
Conversation
Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=e0d058f469bebf3673b85ef3664a5a101e85585c |
scorers=[scorer], | ||
) | ||
res = await eval.evaluate(predict) | ||
calls = client.server.calls_query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to go down to this level? The test might be more appropriate with client.calls()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like the calls API - especially how it gets feedback. This is much better for asserting the data I am interested in asserting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference is to fix the calls API. It feels wrong to reach into the server here. What's wrong with how it gets feedback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do this in countless tests right now. all i want is to assert the raw data, not worry aobut the higher level calls iterator or opaque feedback iterator stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not friendly:
@property
def feedback(self) -> RefFeedbackQuery:
if not self.id:
raise ValueError("Can't get feedback for call without ID")
if self._feedback is None:
project_parts = self.project_id.split("/")
if len(project_parts) != 2:
raise ValueError(f"Invalid project_id: {self.project_id}")
entity, project = project_parts
weave_ref = CallRef(entity, project, self.id)
self._feedback = RefFeedbackQuery(weave_ref.uri())
return self._feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, i try to avoid using the WeaveClient directly unless i am explicitly trying to test functionality of the WeaveClient. I find it much more cumbersome
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't block on this, but I think reaching into the server is not ideal. All of this code is at the "client level", so the checks should also be at the "client level". If the client sucks we should fix it
) | ||
|
||
return self.future_executor.defer(send_score_call) | ||
|
||
@trace_sentry.global_trace_sentry.watch() | ||
def _add_score( | ||
def _add_runnable_feedback( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is a runnable
? Is that defined somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runnable is a new class of feedback. see #2865
self, | ||
*, | ||
weave_ref_uri: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are starting to different URIs in a lot of places. Maybe worth type narrowing to avoid errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a difference in shape between weave_ref_uri
and call_ref_uri
? If we have many different types of URI floating around, can we help ourselves use the correct type of URI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, let's make sure we are on the same page:
- The Feedback table now has a few key column (they all have the
_ref
suffix and are strings formatted like refs:weave_ref
call_ref
trigger_ref
runnable_ref
annotation_ref
- At runtime, I like to distinguish between runtime Refs (instances of Refs) and the string uri by using
_ref_uri
to make it super clear this is a string, not the runtime object.
Given that context, i am having a hard time groking your recommendation or callout. Can you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm saying something silly or overkill
Here's what I observe:
- We are using URIs in a lot of places in the SDK
- They can have different shapes like (1), (2), (3), and a specific shape of URI maps to a specific type of Ref in our SDK.
- Sometimes "ref" means "uri"... this is not consistent
So, should we make a subtype of str for the URIs to avoid passing the wrong URI?
For example:
T = TypeVar('T', bound='WeaveURI')
class WeaveCallURI(str):
def __new__(cls: Type[T], content: str) -> T:
# validate it conforms to f"weave:///{self.entity}/{self.project}/call/{self.id}"
return cast(T, super().__new__(cls, content))
runnable_ref=runnable_ref_uri, | ||
call_ref=call_ref_uri, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these fields represent URIs so we should call them uris
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing this is outside the scope of this PR - it is the interface layer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you referring to the variable or the req field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field -- it's called ref
but it's being assigned a uri
. I think it actually represents a URI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but that follows the existing pattern of the feedback table and the interface. Changing this would impact quite a few layers and break pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, unfortunate but I wont block here.
There are a few places where the nomenclature is kinda weird and I always get a little confused when I revisit them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stamping as there are no blocking comments
Correctly reports runnable refs when the scorer is an object.
Now that I am getting into the nitty-gritty of getting scorers to work, i realized we are not reporting the Evaluation scorer feedback ref correctly when the scorer is an object. Today, we are reporting the Scorer.score fn, not the parent Scorer object. This fixes and tests for that