-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor dataset diff and compute metric #381
Conversation
f445bf6
to
9165a2c
Compare
9165a2c
to
921260c
Compare
python/lance/_lib.pyx
Outdated
|
||
Parameters | ||
---------- | ||
metric_func: FileSystemDataset -> pd.DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Callalble[[FileSystmeDataset], pd.DataFrame]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
python/lance/_lib.pyx
Outdated
def compute_metric(self, metric_func: Callable[[Dataset], pd.DataFrame], | ||
versions: list[int] = None) -> pd.DataFrame: | ||
""" | ||
Compute a metric across versions for this dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main concerns are the stability / maturity of these two APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed, moved to lance/__init__.py
and not as a Dataset method
python/lance/_lib.pyx
Outdated
metric_func: FileSystemDataset -> pd.DataFrame | ||
Function to compute some arbitrary metric for a given version | ||
versions: list of int, default None | ||
Compute for specified versions. Compute for all versions if None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the default behavior to just compute metrics for latest version
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think you'd really need this method then. If you just wanted to compute on latest version, metric_func(ds)
is simpler than compute_metrics(ds, metric_func)
. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ds.compute_metric(func)
andds.diff(v1, v2)
to dataset interface