-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Copy plots on Card save #330
ENH Copy plots on Card save #330
Conversation
@BenjaminBossan @adrinjalali Here's the draft PR, there's not much because I wanted to make sure I was thinking of this correctly. From what I understood from the thread on the issue, we're wanting to copy the files to the repo path whenever Also, I've been trying to think of how best to test this. Do y'all have any ideas? Sorry, this PR is so small. I just wanted to make sure I was heading in the right direction before jumping into the deep end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The copying should probably happen when the user calls save
on a Card
object. And instead of path_in_repo
, we should probably call it sth like relative_path
or something, and then copy all plots when save
is called, to a path relative to where the path to save
is given. WDYT?
skops/card/_model_card.py
Outdated
if path_in_repo is not None: | ||
shutil.copy(plot_path, path_in_repo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be in the above section, and modify the plot_path
variable accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So just to clarify, we shouldn't be doing this copy when add_plot
is called. It should only be done in Card.save()
. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes 👍🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrinjalali Hmm, after thinking through this some I'm unsure if copying on model card save is the best. If we do it when a user calls add_plot()
then we'd already have the path of the plot in the context. From what I understand of how .save()
is working we'd have to loop through all the generated content, find the plots, and then save them.
That feels a little extra to me since we could just save the copy when a user adds the plot. What are your thoughts? Am I misunderstanding how .save()
is working?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that save
would do that in an iteration. But from the user's experience, to me it makes more sense. This way, users only provide a relative path to the destination of the model card, and at the end we save them all. It should also provide a more consistent experience since they can't mess up addresses and at the end not have them in the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Adrin that copying on save
has some advantages that make it worth the extra effort. On top of what was said, it would help to avoid the problem of a user adding a plot, then overwriting the plot file, but that change not being reflected in the final model card.
Implementation-wise, I agree with Thomas that looping through all sections again is a bit wasteful. However, I think this feature can be implemented without a second loop. For that, I would add an argument to _generate_card
to optionally copy the figures. This argument is passed down to _generate_content
, which can take care of the copying. This way, we don't iterate through all sections twice and avoid duplicating the logic (which can be problematic besides performance, e.g. the current implementation does not consider invisible sections).
Another implementation detail, instead of if hasattr(section, "path")
, I would have probably done if isinstance(section, PlotSection)
, but I'm open to be convinced otherwise :)
I'd be interested in what @BenjaminBossan thinks here too though. |
@BenjaminBossan I've moved the logic into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing the adjustments.
I wonder if we can simplify the implementation a bit. Right now, the code considers the possibility that plot_path != destination_path
but I wonder if that's relevant. If I store my model card into foo/bar/
, would I ever want my plots to be copied to spam/eggs/
? I can't think of a use case, but maybe I miss something?
If this is indeed not required, the implementation could be simplified by only having one new argument to _generate_content
, which is the destination path. The path of the figure is already known from plot_section.path
, so everything required for copying would be there.
From the user point of view, I'd also be unsure what the point of the plot_path
argument is in Card.save
, I think it needs a bit more explanation. Personally, I would also change the argument to be something like copy_files: bool
. If True
, it copies all the required files (which, at this point, are only the plots, but who knows what the future brings) to the same directory as the README.md
. WDYT?
PS: It would be nice to extend the tests so that we have 2 plots, one with a relative path (as is right now) and one with an absolute path, to ensure that both work.
+1 |
@BenjaminBossan Thanks for the feedback! I refactored this to copy the files into the same path as the README. I also went ahead and added a second plot to the test. (Side note it looks like there are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing the suggested changes. I think there is still some work to do on the tests and the implementation could be simplified. Please take a look.
Sorry, I accidentally submitted "Approve" when I wanted to submit "Request changes", I don't know how to change it now -__- |
Still need to get the copying working with absolute paths.
@lazarust Thanks for working on the test, but it looks like something is broken now. Could you please check? |
@BenjaminBossan Should be working now! |
Thanks for fixing the test. The one failing CI job is unrelated, so please ignore it. From the comments of my last review, not everything was addressed, right? Will you do that? |
@BenjaminBossan Sorry about that, I've removed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for working in my suggestion and merging main to fix the CI issue. I found a few very minor things to improve, but after this it's good to merge.
skops/card/_model_card.py
Outdated
@@ -1410,13 +1425,18 @@ def save(self, path: str | Path) -> None: | |||
path: str, or Path | |||
Filepath to save your card. | |||
|
|||
plot_path: str | |||
Filepath to save the plots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate on this? E.g. mention under what circumstances users may want to use this parameter.
Notes | ||
----- | ||
The keys in model card metadata can be seen `here | ||
<https://huggingface.co/docs/hub/models-cards#model-card-metadata>`__. | ||
""" | ||
with open(path, "w", encoding="utf-8") as f: | ||
f.write("\n".join(self._generate_card())) | ||
if not isinstance(path, Path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bit of redundancy here: You cast path
to a Path
object here but _generate_card
and _generate_content
both accept str
and the latter does Path(destination_path)
. So either you just pass path
without conversion, or those methods should not accept str
and don't perform a 2nd round of conversion.
skops/card/_model_card.py
Outdated
f.write("\n".join(self._generate_card())) | ||
if not isinstance(path, Path): | ||
path = Path(path) | ||
f.write("\n".join(self._generate_card(path.parent if copy_files else None))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit, but this is more readable IMO:
f.write("\n".join(self._generate_card(path.parent if copy_files else None))) | |
destination_path = path.parent if copy_files else None | |
f.write("\n".join(self._generate_card(destination_path=destination_path))) |
@BenjaminBossan Sorry this took so long for me to get back to! This should be ready for you to take another look! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, we're always happy to have your contributions.
Reference Issues/PRs
Fixes #300
What does this implement/fix? Explain your changes.
Implements copying files to the path of the repository.
Any other comments?