Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Built-in tokenization feature for Text objects #503

Closed
1 of 2 tasks
mynameisvinn opened this issue Jan 28, 2021 · 1 comment
Closed
1 of 2 tasks

[FEATURE] Built-in tokenization feature for Text objects #503

mynameisvinn opened this issue Jan 28, 2021 · 1 comment
Labels
feature-discussion open discussion on feature request help wanted Extra attention is needed

Comments

@mynameisvinn
Copy link
Contributor

mynameisvinn commented Jan 28, 2021

🚨🚨 Feature Request

  • Related to an existing Issue
  • A new implementation (Improvement, Extension)
    Text objects should have a built-in method to convert strings to tokens, according to a user-specified tokenizer.

Is your feature request related to a problem?

Partially. Text objects currently use Bert tokenizers by default.

If your feature will improve HUB

A common use case is streaming data from a Hub Dataset to a Hugging Face transformer. This feature would remove friction from that process.

Description of the possible solution

A solution would improve the current syntax:

ds = hub.Dataset(tag, shape=(10,), schema=schema, mode="w", tokenizer=some_tokenizer) 
@mynameisvinn mynameisvinn changed the title [FEATURE] Automatically convert Text to tokens [FEATURE] Built-in tokenization feature for Text objects Jan 28, 2021
@mynameisvinn mynameisvinn added feature-discussion open discussion on feature request help wanted Extra attention is needed labels Jan 31, 2021
@mynameisvinn
Copy link
Contributor Author

Closing this feature request due to inactivity and lack of interest. Will revive it if more users request it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-discussion open discussion on feature request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant