Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Copy plots on Card save #330

Merged
merged 20 commits into from
May 17, 2023

Conversation

lazarust
Copy link
Contributor

Reference Issues/PRs

Fixes #300

What does this implement/fix? Explain your changes.

Implements copying files to the path of the repository.

Any other comments?

@lazarust
Copy link
Contributor Author

@BenjaminBossan @adrinjalali Here's the draft PR, there's not much because I wanted to make sure I was thinking of this correctly.

From what I understood from the thread on the issue, we're wanting to copy the files to the repo path whenever add_plot method is called. Is that understanding correct?

Also, I've been trying to think of how best to test this. Do y'all have any ideas?

Sorry, this PR is so small. I just wanted to make sure I was heading in the right direction before jumping into the deep end.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copying should probably happen when the user calls save on a Card object. And instead of path_in_repo, we should probably call it sth like relative_path or something, and then copy all plots when save is called, to a path relative to where the path to save is given. WDYT?

Comment on lines 1068 to 1069
if path_in_repo is not None:
shutil.copy(plot_path, path_in_repo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be in the above section, and modify the plot_path variable accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to clarify, we shouldn't be doing this copy when add_plot is called. It should only be done in Card.save(). Is that right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes 👍🏼

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrinjalali Hmm, after thinking through this some I'm unsure if copying on model card save is the best. If we do it when a user calls add_plot() then we'd already have the path of the plot in the context. From what I understand of how .save() is working we'd have to loop through all the generated content, find the plots, and then save them.

That feels a little extra to me since we could just save the copy when a user adds the plot. What are your thoughts? Am I misunderstanding how .save() is working?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that save would do that in an iteration. But from the user's experience, to me it makes more sense. This way, users only provide a relative path to the destination of the model card, and at the end we save them all. It should also provide a more consistent experience since they can't mess up addresses and at the end not have them in the repo.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Adrin that copying on save has some advantages that make it worth the extra effort. On top of what was said, it would help to avoid the problem of a user adding a plot, then overwriting the plot file, but that change not being reflected in the final model card.

Implementation-wise, I agree with Thomas that looping through all sections again is a bit wasteful. However, I think this feature can be implemented without a second loop. For that, I would add an argument to _generate_card to optionally copy the figures. This argument is passed down to _generate_content, which can take care of the copying. This way, we don't iterate through all sections twice and avoid duplicating the logic (which can be problematic besides performance, e.g. the current implementation does not consider invisible sections).

Another implementation detail, instead of if hasattr(section, "path"), I would have probably done if isinstance(section, PlotSection), but I'm open to be convinced otherwise :)

@lazarust lazarust changed the title WIP copy files to path_in_repo ENH Copy plots on Card save Apr 5, 2023
@adrinjalali
Copy link
Member

I'd be interested in what @BenjaminBossan thinks here too though.

@lazarust lazarust marked this pull request as ready for review April 7, 2023 00:21
@lazarust
Copy link
Contributor Author

lazarust commented Apr 7, 2023

@BenjaminBossan I've moved the logic into _generate_content and this is ready to review! Thanks! 👍🏽

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing the adjustments.

I wonder if we can simplify the implementation a bit. Right now, the code considers the possibility that plot_path != destination_path but I wonder if that's relevant. If I store my model card into foo/bar/, would I ever want my plots to be copied to spam/eggs/? I can't think of a use case, but maybe I miss something?

If this is indeed not required, the implementation could be simplified by only having one new argument to _generate_content, which is the destination path. The path of the figure is already known from plot_section.path, so everything required for copying would be there.

From the user point of view, I'd also be unsure what the point of the plot_path argument is in Card.save, I think it needs a bit more explanation. Personally, I would also change the argument to be something like copy_files: bool. If True, it copies all the required files (which, at this point, are only the plots, but who knows what the future brings) to the same directory as the README.md. WDYT?

PS: It would be nice to extend the tests so that we have 2 plots, one with a relative path (as is right now) and one with an absolute path, to ensure that both work.

@adrinjalali
Copy link
Member

I can't think of a use case, but maybe I miss something?

+1

@lazarust lazarust requested a review from BenjaminBossan April 16, 2023 01:29
@lazarust
Copy link
Contributor Author

@BenjaminBossan Thanks for the feedback! I refactored this to copy the files into the same path as the README. I also went ahead and added a second plot to the test.

(Side note it looks like there are 34 warnings when running pytest. Is that something that needs fixing? If so I can start working on a PR)

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing the suggested changes. I think there is still some work to do on the tests and the implementation could be simplified. Please take a look.

@BenjaminBossan
Copy link
Collaborator

Sorry, I accidentally submitted "Approve" when I wanted to submit "Request changes", I don't know how to change it now -__-

lazarust and others added 2 commits April 27, 2023 14:52
@BenjaminBossan
Copy link
Collaborator

@lazarust Thanks for working on the test, but it looks like something is broken now. Could you please check?

@lazarust
Copy link
Contributor Author

lazarust commented May 3, 2023

@BenjaminBossan Should be working now!

@BenjaminBossan
Copy link
Collaborator

Thanks for fixing the test. The one failing CI job is unrelated, so please ignore it.

From the comments of my last review, not everything was addressed, right? Will you do that?

@lazarust lazarust requested a review from BenjaminBossan May 5, 2023 00:53
@lazarust
Copy link
Contributor Author

lazarust commented May 5, 2023

@BenjaminBossan Sorry about that, I've removed the copy_files parameter from everywhere other than Card.save()

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for working in my suggestion and merging main to fix the CI issue. I found a few very minor things to improve, but after this it's good to merge.

@@ -1410,13 +1425,18 @@ def save(self, path: str | Path) -> None:
path: str, or Path
Filepath to save your card.

plot_path: str
Filepath to save the plots.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate on this? E.g. mention under what circumstances users may want to use this parameter.

Notes
-----
The keys in model card metadata can be seen `here
<https://huggingface.co/docs/hub/models-cards#model-card-metadata>`__.
"""
with open(path, "w", encoding="utf-8") as f:
f.write("\n".join(self._generate_card()))
if not isinstance(path, Path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bit of redundancy here: You cast path to a Path object here but _generate_card and _generate_content both accept str and the latter does Path(destination_path). So either you just pass path without conversion, or those methods should not accept str and don't perform a 2nd round of conversion.

f.write("\n".join(self._generate_card()))
if not isinstance(path, Path):
path = Path(path)
f.write("\n".join(self._generate_card(path.parent if copy_files else None)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit, but this is more readable IMO:

Suggested change
f.write("\n".join(self._generate_card(path.parent if copy_files else None)))
destination_path = path.parent if copy_files else None
f.write("\n".join(self._generate_card(destination_path=destination_path)))

@lazarust
Copy link
Contributor Author

@BenjaminBossan Sorry this took so long for me to get back to! This should be ready for you to take another look!

@lazarust lazarust requested a review from BenjaminBossan May 17, 2023 00:32
Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, we're always happy to have your contributions.

@BenjaminBossan BenjaminBossan merged commit 5d503af into skops-dev:main May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Annoyance with adding figures to model cards
3 participants