forked from huggingface/transformers
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add test checking the offsets for an input splitted into words for different add_prefix_space
and trim_offsets
args
#1
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Fix doc examples: cannot import name * remove copy because of some necessary minor changes (maybe add copy to the individual methods instead) * Keep copy with some modifications Co-authored-by: ydshieh <[email protected]>
Co-authored-by: ydshieh <[email protected]>
* Wip on metadata update * Most of the script * Add a job to auto-update the transformers metadata * Style
* Mention no images added to repository * Update CONTRIBUTING.md Co-authored-by: NielsRogge <[email protected]> Co-authored-by: NielsRogge <[email protected]>
* avoid tf.tile in embeddings * remove more tf.tile in embeddings * clean Co-authored-by: ydshieh <[email protected]>
* First draft * Improve docstring + clean up tests * Remove unused code * Add check in case one doesn't provide a preprocessor
* Convert Trainer doc page to MarkDown * Fix repo consistency * Fix the doc build test job
* Adding some slow test to check for perceiver at least from a high level. * Re-enabling fast tests for Perceiver ImageClassification. * Perceiver might try to run without Tokenizer (Fast doesn't exist) and with FeatureExtractor some text only pipelines. * Oops. * Adding a comment for `update_config_with_model_class`. * Remove `model_architecture` to get `tiny_config`. * Finalize rebase. * Smarter way to handle undefined FastTokenizer. * Remove old code. * Addressing some nits. * Don't instantiate `None`.
…face#13410) * use jax and jnp instead of numpy in data_loader * return batches as np.ndarray
* Adding support for multiple mask tokens. - Original implem: huggingface#10222 Co-authored-by: njafer <[email protected]> * In order to accomodate optionally multimodal models like Perceiver we add information to the tasks to specify tasks where we know for sure if we need the tokenizer/feature_extractor or not. * Adding info in the documentation about multi masks. + marked as experimental. * Add a copy() to prevent overriding the same tensor over and over. * Fixup. * Adding small test for multi mask with real values.. Co-authored-by: njafer <[email protected]>
…ingface#14722) * Fix broken links to distillation on index page of documentation * Fix broken link for distillation in main README * Run make fixup
* Fake new model * Fix doc-building test job * Is this the problem? * Another try * Typo * Clean up * Can we do without -e ? * Clean setup
Co-authored-by: ydshieh <[email protected]>
* Initial commit for Keras model cards * Revert accidental change * make style * make style * make style * Fix PR comments * Move repo creation to __init__ * Fixes to README.md creation * Partial progress for proper card creation on `push_to_hub` * Proper card creation from `push_to_hub` plus fixes for malformed model cards * Fixes for model card creation outside the callback * Adding a model card creation test * Putting the model card creation test in the right file. Good job, Matt. * make style * Fix model card test temp dir usage * Fix model card creation when no optimizer present * Fixes for when training history not present * Fix accidental edit to test_modeling_common
* Fix code examples * Fix code example
* Fix docs * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * Code quality Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Lysandre <[email protected]>
* PoC for conserving old links * Do the same for other links * remap the redirects section * add instructions on how to move sections * improve Co-authored-by: Stas Bekman <[email protected]>
* speed up canine and mluke * speed up mbart and mbart50 toks * upload files
…ngface#14959) * rename classes * clean up more namings * remove bogus file * Apply suggestions from code review * Apply suggestions from code review * replace more names * more regex replace * make style * correct * correct more * make style * finish * correct more in wav2vec2 * make style * improve freeze_extractor * add aliases * add tf aliases
the absl workaround hasn't been needed since 2019-04 abseil/abseil-py#99 so it should be safe to remove it.
* Fixing a pathological case for slow tokenizers * Update src/transformers/tokenization_utils.py
huggingface#14881) * [AutoProcessor] Correct AutoProcessor and automatically add processor class * up * up * up * up * up * up * up * up * continue tomorrow * up * up * up * make processor class private * fix loop
…uggingface#14980) * [Generate] correct encoder_outputs are passed without attention_mask * Apply suggestions from code review * up
…ingface#14988) * Adding `num_return_sequences` support for text2text generation. Co-Authored-By: Enze <[email protected]> * Update tests/test_pipelines_text2text_generation.py Co-authored-by: Sylvain Gugger <[email protected]> * Update tests/test_pipelines_text2text_generation.py Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Enze <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>
* Enabling `tokenizers` upgrade. * Moved ugly comment. * Tokenizers==0.11.1 needs an update to keep borrow checker happy in highly contiguous calls. * Support both 0.11.1 and 0.11.0
…uggingface#14994) * Allow training to resume even if RNG states are not properly loaded * Proper f-string
* Map model_type and doc pages names * Add script * Fix typo * Quality * Manual check for Auto Co-authored-by: Lysandre <[email protected]>
Backward compatibility broken in huggingface#14988
* Enabling `truncation_side` for Slow and Fast tokenizer. Co-Authored-by: Niels Rogge <[email protected]> * Disable failing tests. * Layout xlm. * assert -> assertEqual. Co-authored-by: Niels Rogge <[email protected]>
* Naive ASR chunking * Fixing batching for ASR. Co-authored-by: Nicolas Patry <[email protected]>
Co-authored-by: ydshieh <[email protected]>
* Update parallelism.mdx * Update parallelism.mdx * Update parallelism.mdx * Update parallelism.mdx * Update parallelism.mdx * Update parallelism.mdx * Update parallelism.mdx * Update parallelism.mdx
@SaulLu should be good now, no ? |
Indeed 😄 , I will close this PR. In any case I had to reopen a new one to be between one of my fork branches and the Hugging Face repo (the new PR is here). |
SaulLu
pushed a commit
that referenced
this pull request
Feb 15, 2022
…5416) * added classes to get started with constrained beam search * in progress, think i can directly force tokens now but not yet with the round robin * think now i have total control, now need to code the bank selection * technically works as desired, need to optimize and fix design choices leading to undersirable outputs * complete PR #1 without disjunctive decoding * removed incorrect tests * Delete k.txt * Delete test.py * Delete test.sh * revert changes to test scripts * genutils * full implementation with testing, no disjunctive yet * shifted docs * passing all tests realistically ran locally * removing accidentally included print statements * fixed source of error in initial PR test * fixing the get_device() vs device trap * fixed documentation docstrings about constrained_beam_search * fixed tests having failing for Speech2TextModel's floating point inputs * fix cuda long tensor * added examples and testing for them and founx & fixed a bug in beam_search and constrained_beam_search * deleted accidentally added test halting code with assert False * code reformat * Update tests/test_generation_utils.py Co-authored-by: Patrick von Platen <[email protected]> * Update tests/test_generation_utils.py Co-authored-by: Patrick von Platen <[email protected]> * Update tests/test_generation_utils.py Co-authored-by: Patrick von Platen <[email protected]> * Update tests/test_generation_utils.py Co-authored-by: Patrick von Platen <[email protected]> * Update tests/test_generation_utils.py * fixing based on comments on PR * took out the testing code that should but work fails without the beam search moditification ; style changes * fixing comments issues * docstrings for ConstraintListState * typo in PhrsalConstraint docstring * docstrings improvements Co-authored-by: Patrick von Platen <[email protected]>
SaulLu
pushed a commit
that referenced
this pull request
Mar 30, 2022
) * added classes to get started with constrained beam search * in progress, think i can directly force tokens now but not yet with the round robin * think now i have total control, now need to code the bank selection * technically works as desired, need to optimize and fix design choices leading to undersirable outputs * complete PR #1 without disjunctive decoding * removed incorrect tests * Delete k.txt * Delete test.py * Delete test.sh * revert changes to test scripts * genutils * full implementation with testing, no disjunctive yet * shifted docs * passing all tests realistically ran locally * removing accidentally included print statements * fixed source of error in initial PR test * fixing the get_device() vs device trap * fixed documentation docstrings about constrained_beam_search * fixed tests having failing for Speech2TextModel's floating point inputs * fix cuda long tensor * added examples and testing for them and founx & fixed a bug in beam_search and constrained_beam_search * deleted accidentally added test halting code with assert False * code reformat * Update tests/test_generation_utils.py Co-authored-by: Patrick von Platen <[email protected]> * Update tests/test_generation_utils.py Co-authored-by: Patrick von Platen <[email protected]> * Update tests/test_generation_utils.py Co-authored-by: Patrick von Platen <[email protected]> * Update tests/test_generation_utils.py Co-authored-by: Patrick von Platen <[email protected]> * Update tests/test_generation_utils.py * fixing based on comments on PR * took out the testing code that should but work fails without the beam search moditification ; style changes * fixing comments issues * docstrings for ConstraintListState * typo in PhrsalConstraint docstring * docstrings improvements * finished adding what is sort of an opinionated implementation of disjunctive generation, but it revealed errors in inner beam search logic during testing. * fixed bug found in constrained beam search that used beam_idx that were not global across all the batches * disjunctive constraint working 100% correctly * passing all tests * Accidentally included mlruns * Update src/transformers/generation_beam_constraints.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/transformers/generation_beam_constraints.py Co-authored-by: Patrick von Platen <[email protected]> * complete overhaul of type complexities and other nits * strict type checks in generate() * fixing second round of feedback by narsil * fixed failing generation test because of type check overhaul * generation test fail fix * fixing test fails Co-authored-by: Patrick von Platen <[email protected]>
SaulLu
pushed a commit
that referenced
this pull request
May 31, 2022
Improve get_added_vocabulary_hacking
SaulLu
pushed a commit
that referenced
this pull request
Jul 18, 2022
* chore: initial commit Copied the torch implementation of regnets and porting the code to tf step by step. Also introduced an output layer which was needed for regnets. * chore: porting the rest of the modules to tensorflow did not change the documentation yet, yet to try the playground on the model * Fix initilizations (#1) * fix: code structure in few cases. * fix: code structure to align tf models. * fix: layer naming, bn layer still remains. * chore: change default epsilon and momentum in bn. * chore: styling nits. * fix: cross-loading bn params. * fix: regnet tf model, integration passing. * add: tests for TF regnet. * fix: code quality related issues. * chore: added rest of the files. * minor additions.. * fix: repo consistency. * fix: regnet tf tests. * chore: reorganize dummy_tf_objects for regnet. * chore: remove checkpoint var. * chore: remov unnecessary files. * chore: run make style. * Update docs/source/en/model_doc/regnet.mdx Co-authored-by: Sylvain Gugger <[email protected]> * chore: PR feedback I. * fix: pt test. thanks to @ydshieh. * New adaptive pooler (huggingface#3) * feat: new adaptive pooler Co-authored-by: @Rocketknight1 * chore: remove image_size argument. Co-authored-by: matt <[email protected]> Co-authored-by: matt <[email protected]> * Empty-Commit * chore: remove image_size comment. * chore: remove playground_tf.py * chore: minor changes related to spacing. * chore: make style. * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: amyeroberts <[email protected]> * chore: refactored __init__. * chore: copied from -> taken from./g * adaptive pool -> global avg pool, channel check. * chore: move channel check to stem. * pr comments - minor refactor and add regnets to doc tests. * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: NielsRogge <[email protected]> * minor fix in the xlayer. * Empty-Commit * chore: removed from_pt=True. Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: matt <[email protected]> Co-authored-by: amyeroberts <[email protected]> Co-authored-by: NielsRogge <[email protected]>
SaulLu
pushed a commit
that referenced
this pull request
Sep 9, 2022
proposal of a fix for the MarkupLM fast tokenizer
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR shows a test that will not pass until we have a version of the tokenizer library that includes this change.
cc @LysandreJik for visibility