Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type hints for PyTorch BERT. #23541

Closed
wants to merge 8 commits into from

Conversation

coledie
Copy link

@coledie coledie commented May 20, 2023

What does this PR do?

Add type hints for PyTorch BERT.

Fixes #16059 for PyTorch BERT

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good, thank you! I have one question about the strip_accents argument, but other than that this looks perfect.

src/transformers/models/bert/tokenization_bert.py Outdated Show resolved Hide resolved
src/transformers/models/bert/tokenization_bert.py Outdated Show resolved Hide resolved
src/transformers/models/bert/tokenization_bert.py Outdated Show resolved Hide resolved
src/transformers/models/bert/tokenization_bert.py Outdated Show resolved Hide resolved
src/transformers/models/bert/tokenization_bert_fast.py Outdated Show resolved Hide resolved
@Rocketknight1
Copy link
Member

Looks good! The last thing you'll need to do is pip install transformers[quality] followed by make style in the transformers directory. This runs our code formatting tools to make sure everything follows our style guidelines. Once you do that and commit any changes, the tests should pass!

If you run make style and you get an error, it may indicate some part of your code that has issues that our code style tools can't correct - if that happens, take a look and try to see what's wrong, and reply here if you can't figure it out!

@coledie
Copy link
Author

coledie commented May 24, 2023

It looks like I've got one last error that I cannot figure out, the relevant line of code appears several times in many files without issue otherwise. Could you help with this?

@coledie
Copy link
Author

coledie commented Jun 2, 2023

@Rocketknight1

@Rocketknight1
Copy link
Member

Ah, thanks for the ping! Investigating now

@Rocketknight1
Copy link
Member

Rocketknight1 commented Jun 2, 2023

I've investigated and there's an issue in our copy checking code, specifically the is_copy_consistent function. This isn't your fault, and I'll need to file a PR to fix it!

(For internal transformers reference): The issue is triggered when a function is copied from with a single-line header, and there is a change that causes its header to now be multi-line (e.g. adding type hints and causing black to wrap the line). The is_copy_consistent function builds a replacement function from the first line of the target function followed by the subsequent lines of the original function, which creates a mangled header if the original header has changed to multi-line but the target has not:

def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
    ) -> List[int]:

@sgugger do you want me to make a PR to fix it?

@sgugger
Copy link
Collaborator

sgugger commented Jun 2, 2023

Lol no, fixing this is really not a priority. The copies can be manually updated.
The only situation this can appear is this one, and it's rare enough that we can deal with it I think.

@Rocketknight1
Copy link
Member

Understood! @coledie you might have to do some manual copying to make this work, in that case. Search the repository for the string # Copied from transformers.models.bert.tokenization_bert_fast.BertTokenizerFast.build_inputs_with_special_tokens. This will locate functions that are copied from the BERT function and that our repository tools keep in sync with it. If you manually copy your new build_inputs_with_special_tokens header over the existing headers in those functions and then rerun make fixup or make fix-copies, everything should work and the CI should pass.

If you have any issues, let me know and I can make the changes for you!

@coledie
Copy link
Author

coledie commented Jun 14, 2023

@Rocketknight1 Looks like it is working since test failures are unrelated?

@huggingface huggingface deleted a comment from github-actions bot Jul 10, 2023
@github-actions
Copy link

github-actions bot commented Aug 3, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Aug 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add missing type hints
4 participants