Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fittable #140

Open
wants to merge 53 commits into
base: main
Choose a base branch
from
Open

Add fittable #140

wants to merge 53 commits into from

Conversation

stephantul
Copy link
Member

No description provided.

Copy link

codecov bot commented Dec 23, 2024

Codecov Report

Attention: Patch coverage is 97.79736% with 10 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
model2vec/inference/model.py 90.47% 6 Missing ⚠️
model2vec/train/classifier.py 97.38% 4 Missing ⚠️
Files with missing lines Coverage Δ
model2vec/inference/__init__.py 100.00% <100.00%> (ø)
model2vec/model.py 94.55% <100.00%> (ø)
model2vec/train/__init__.py 100.00% <100.00%> (ø)
model2vec/train/base.py 100.00% <100.00%> (ø)
model2vec/utils.py 92.53% <ø> (ø)
tests/conftest.py 100.00% <100.00%> (ø)
tests/test_inference.py 100.00% <100.00%> (ø)
tests/test_trainable.py 100.00% <100.00%> (ø)
model2vec/train/classifier.py 97.38% <97.38%> (ø)
model2vec/inference/model.py 90.47% <90.47%> (ø)

... and 1 file with indirect coverage changes

@stephantul stephantul marked this pull request as ready for review January 3, 2025 20:22
@stephantul stephantul requested a review from Pringled January 3, 2025 20:22
Copy link
Member

@Pringled Pringled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Some minor comments and suggestions.

model2vec/train/classifier.py Show resolved Hide resolved
model2vec/train/base.py Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
model2vec/train/classifier.py Show resolved Hide resolved
model2vec/train/classifier.py Outdated Show resolved Hide resolved
model2vec/train/classifier.py Outdated Show resolved Hide resolved
@stephantul stephantul requested a review from Pringled January 24, 2025 18:49
Copy link
Contributor

@davidberenstein1957 davidberenstein1957 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks super useful. left some comments. Also, perhaps we can add some reference to multi-label usage somewhere?

model2vec/model.py Outdated Show resolved Hide resolved
model2vec/model.py Outdated Show resolved Hide resolved
"""Save the model to a folder."""
save_pipeline(self, path)

def push_to_hub(self, repo_id: str, token: str | None = None, private: bool = False) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a modelcard and perhaps tags or a library reference, this helps a lot with visibility, usability and findability.

https://huggingface.co/docs/hub/model-cards#specifying-a-library

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually already happens because we push the underlying static model to the hub, which has a model card. This model card template is specified in the root of the code.

self.head = head

@classmethod
def from_pretrained(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we load it from the Hub? perhaps we should align the arguments a bit with the transformers naming given you've also adopted from_pretrained?

For example using pretrained_model_name_or_path. https://huggingface.co/docs/transformers/v4.48.0/en/model_doc/auto#transformers.AutoTokenizer.from_pretrained

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_pretrained loads from the hub. The arguments mimic the ones from StaticModel and, although they don't match transformers exactly, we're wary of introducing breaking changes.

model2vec/inference/model.py Show resolved Hide resolved
model2vec/train/README.md Outdated Show resolved Hide resolved
model2vec/train/classifier.py Show resolved Hide resolved
model2vec/train/classifier.py Outdated Show resolved Hide resolved
model2vec/train/classifier.py Show resolved Hide resolved
model2vec/train/classifier.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants