Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SplitDelimiterBehavior to Punctuation constructor #657

Merged
merged 1 commit into from
Aug 13, 2021

Conversation

vladdy
Copy link
Contributor

@vladdy vladdy commented Mar 17, 2021

This change allows to customize SplitDelimiterBehavior for the Punctuation pre-tokenizer. For backwards compatibility, SplitDelimiterBehavior::Isolated is set the default. Python and NodeJS bindings were updated accordingly.

Resolves: #642

@vladdy vladdy force-pushed the add-split-delimeter branch from 6162e2c to bc5bebc Compare March 18, 2021 19:36
@n1t0
Copy link
Member

n1t0 commented Mar 18, 2021

Thank you for taking care of this @vladdy! Let me know if you need any help!

@vladdy vladdy force-pushed the add-split-delimeter branch from bc5bebc to 8b79d82 Compare March 18, 2021 20:26
@n1t0
Copy link
Member

n1t0 commented May 20, 2021

Hey @vladdy! Do you want me to take over from here? If you'd like to continue but need some help with the node bindings or anything else, I'd be happy to provide you with some guidance. Let me know!

@vladdy
Copy link
Contributor Author

vladdy commented May 21, 2021

Sorry, got a bit busy at work! I'll try to get more progress within a few next days and let you know if I have any questions.

@vladdy vladdy force-pushed the add-split-delimeter branch from 8b79d82 to 03240ab Compare May 30, 2021 21:41
@vladdy vladdy force-pushed the add-split-delimeter branch from 03240ab to f77b67e Compare June 15, 2021 01:17
@vladdy vladdy marked this pull request as ready for review June 15, 2021 01:21
@vladdy
Copy link
Contributor Author

vladdy commented Jun 15, 2021

@n1t0 Thank you for your patience! This is ready for the review. Let me know if you want me to change/add anything.

If you are fine with the proposed changes, please approve running the workflows so that I could check their results.

@vladdy vladdy force-pushed the add-split-delimeter branch from f77b67e to 14072f8 Compare June 15, 2021 17:43
Copy link
Member

@n1t0 n1t0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vladdy Thank you for taking care of this! From a quick glance, it looks great! I approved the workflows, and there just seems to be a typo.

Can you also modify the CHANGELOG.md file under bindings/python with this modification?

bindings/python/tests/bindings/test_pre_tokenizers.py Outdated Show resolved Hide resolved
@vladdy vladdy force-pushed the add-split-delimeter branch 7 times, most recently from f28177e to 6de3d23 Compare June 17, 2021 11:34
@vladdy
Copy link
Contributor Author

vladdy commented Jun 17, 2021

@n1t0 I've added a CHANGELOG entry under new 0.10.4 version. Please let me know if you had any other plans about it.

@vladdy vladdy requested a review from n1t0 June 17, 2021 11:35
bindings/python/CHANGELOG.md Outdated Show resolved Hide resolved
@vladdy vladdy force-pushed the add-split-delimeter branch from 6de3d23 to 5ffd30f Compare June 17, 2021 12:30
@vladdy vladdy requested a review from n1t0 June 17, 2021 12:31
@vladdy
Copy link
Contributor Author

vladdy commented Jun 25, 2021

@n1t0, just checking if you still need me to change anything else here.

Copy link
Member

@n1t0 n1t0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vladdy! Everything looks good

@n1t0 n1t0 merged commit e2bf8da into huggingface:master Aug 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add SplitDelimiterBehavior to Punctuation constructor
2 participants